Class: PdMetrics

Inherits:
Object
  • Object
show all
Defined in:
lib/pd_metrics.rb,
lib/pd_metrics/version.rb

Defined Under Namespace

Modules: NumericExtensions Classes: Counter, Gauge, Histogram, NumericMetric

Constant Summary collapse

VERSION =
"1.0.0"

Class Method Summary collapse

Class Method Details

.gauge(namespace, key, value, tags = {}, additional_data = {}) ⇒ Object

Captures the current value for a metric.

Unlike a counter, this value cannot be combined with itself in a meaningful way, so only the last reported value with a certain sampling frequency (normally every 10 seconds) is recorded in DataDog.

You can use this method to capture metrics that change over time, like amount of memory used. Usually, this sampling occurs at a regular frequency via a timer.

PdMetrics.gauge('ruby', 'live_objects', ObjectSpace.live_objects)

The following line will be printed in SumoLogic for each call to gauge.

ruby #live_objects=30873|

Additionally, the following metric will be available in DataDog.

ruby.live_objects


127
128
129
130
131
# File 'lib/pd_metrics.rb', line 127

def self.gauge(namespace, key, value, tags = {}, additional_data = {})
  gauge_data = tags || {}
  gauge_data[key] = value.gauge
  send_event(namespace, gauge_data, additional_data)
end

.histogram(namespace, key, value, tags = {}, additional_data = {}) ⇒ Object

Captures statistical metrics for a set of values within a given timeframe. This is very similar to the time method, but it’s genericized for use in arbitrary values.

An example usage would be calculating the size of JSON payloads received by an API. You could use a counter, but that wouldn’t tease out what the average and median payload sizes are.

PdMetrics.histogram('api', 'payload_size', payload.size, account: 'Netflix')

The following line will be printed in SumoLogic for every payload.

api #account=Netflix|#payload_size=1234
api #account=Netflix|#payload_size=0

Additionally, DataDog will have the following metrics available. Note, these metrics are captured every 10 seconds, so they likely represent multiple requests within that time window.

api.payload_size.count
api.payload_size.avg
api.payload_size.median
api.payload_size.max
api.payload_size.95percentile


158
159
160
161
162
# File 'lib/pd_metrics.rb', line 158

def self.histogram(namespace, key, value, tags = {}, additional_data = {})
  histogram_data = tags || {}
  histogram_data[key] = value.histogram
  send_event(namespace, histogram_data, additional_data)
end

.incr(namespace, key, increment_by = 1, tags = {}, additional_data = {}) ⇒ Object

Captures an increase/decrease in a counter.

You can use this to capture metrics that should be added together when viewed on a graph.

PdMetrics.incr('logins', 'success')
PdMetrics.incr('emails', 'bytes_received', email_bytes.size, account: 'Netflix')

That will produce the following line in SumoLogic.

logins #success=1
emails #account=Netflix|#bytes_received=1234|

Additionally, the following metrics will be defined in DataDog

logins.success
emails.bytes_received


101
102
103
104
105
# File 'lib/pd_metrics.rb', line 101

def self.incr(namespace, key, increment_by = 1, tags = {}, additional_data = {})
  incr_data = tags || {}
  incr_data[key] = increment_by.counter
  send_event(namespace, incr_data, additional_data)
end

.send_event(namespace, metrics_and_tags = {}, additional_data = {}) ⇒ Object

Logs an event to metric backend. In general, you can log any key value pairs.

PdMetrics.send_event('api', account: 'Netflix', wait_delta: 0.01, run_delta: 0.1)

This will result in the following line being logged in SumoLogic. No data will be sent to DataDog.

api #account=Netflix|#run_delta=0.1|#wait_delta=0.01|

In order to support aggregated graphs in DataDog, you’ll need to mark the type of any numerical metrics you want aggregated.

PdMetrics.send_event('api', wait_delta: (0.01).histogram, run_delta: (0.1).histogram)

This extra bit of detail is needed to let DataDog know how to aggregate multiple events in a single timeslice.

counter - adds together multiple data points. Use this for things like visits, errors, etc.
gauge - takes the last value. Use this for things like free memory, connections to database, etc.
histogram - derives count, avg, median, max, min, 95th percentile from a single value. Use this for this like latency, bytes written, etc.

Note that when Datadog metrics are supplied, any non-metric data is passed to DataDog as tags. Depending on how many tags you have, this can be counterproductive in DataDog. To have additional data logged only to Sumologic, pass it in the additional_data paramter.

PdMetrics.send_event('api', {wait_delta: (0.01).histogram, run_delta: (0.1).histogram}, account: 'Netflix')


39
40
41
42
43
44
45
46
# File 'lib/pd_metrics.rb', line 39

def self.send_event(namespace, metrics_and_tags = {}, additional_data = {})
  logger.debug { "send_event #{namespace} #{metrics_and_tags.inspect} #{additional_data.inspect}" }
  metrics_and_tags ||= {}
  additional_data ||= {}

  send_datadog_format(namespace, metrics_and_tags)
  send_sumologic_format(namespace, metrics_and_tags, additional_data)
end

.time(namespace, key, tags = {}, additional_data = {}) ⇒ Object

Captures timing metrics for a block of Ruby code.

PdMetrics.time('api', 'receive_email', account: 'Netflix') do
  # process the email
end

Assuming the request took 2 seconds to process, the following log message will be written in SumoLogic.

api #account=Netflix|#receive_email=2.0|#failed=false|

Additionally, the following histogram metrics will be captured in DataDog

api.receive_email.count
api.receive_email.avg
api.receive_email.median
api.receive_email.max
api.receive_email.95percentile

In addition to capturing latency of the request, the success or failure of the block of code is captured as well. It is considered failed if an exception is thrown.



70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/pd_metrics.rb', line 70

def self.time(namespace, key, tags = {}, additional_data = {})
  failed = false
  start = Time.now
  yield
rescue
  failed = true
  raise
ensure
  timing_data = tags || {}
  timing_data[key] = (Time.now - start).histogram
  timing_data['failed'] = failed
  send_event(namespace, timing_data)
end