Metrics collection

This package exposes functions and utilities to record metrics in CommCare. These metrics are exported / exposed to the configured metrics providers. Supported providers are:

  • Datadog

  • Prometheus

Providers are enabled using the METRICS_PROVIDER setting. Multiple providers can be enabled concurrently:

METRICS_PROVIDERS = [
    'corehq.util.metrics.prometheus.PrometheusMetrics',
    'corehq.util.metrics.datadog.DatadogMetrics',
]

If no metrics providers are configured CommCare will log all metrics to the commcare.metrics logger at the DEBUG level.

Metric tagging

Metrics may be tagged by passing a dictionary of tag names and values. Tags should be used to add dimensions to a metric e.g. request type, response status.

Tags should not originate from unbounded sources or sources with high dimensionality such as timestamps, user IDs, request IDs etc. Ideally a tag should not have more than 10 possible values.

Read more about tagging:

Metric Types

Counter metric

A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

Do not use a counter to expose a value that can decrease. For example, do not use a counter for the number of currently running processes; instead use a gauge.

metrics_counter('commcare.case_import.count', 1, tags={'domain': domain})

Gauge metric

A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

Gauges are typically used for measured values like temperatures or current memory usage, but also “counts” that can go up and down, like the number of concurrent requests.

metrics_gauge('commcare.case_import.queue_length', queue_length)

For regular reporting of a gauge metric there is the metrics_gauge_task function:

corehq.util.metrics.metrics_gauge_task(name, fn, run_every, multiprocess_mode='all')[source]

Helper for easily registering gauges to run periodically

To update a gauge on a schedule based on the result of a function just add to your app’s tasks.py:

my_calculation = metrics_gauge_task(
    'commcare.my.metric', my_calculation_function, run_every=crontab(minute=0)
)
kwargs:

multiprocess_mode: See PrometheusMetrics._gauge for documentation.

Histogram metric

A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets.

metrics_histogram(
    'commcare.case_import.duration', timer_duration,
    bucket_tag='size', buckets=[10, 50, 200, 1000], bucket_unit='s',
    tags={'domain': domain}
)

Histograms are recorded differently in the different providers.

DatadogMetrics._histogram(name: str, value: float, bucket_tag: str, buckets: List[int], bucket_unit: str = '', tags: Dict[str, str] = None, documentation: str = '')[source]

This implementation of histogram uses tagging to record the buckets. It does not use the Datadog Histogram metric type.

The metric itself will be incremented by 1 on each call. The value passed to metrics_histogram will be used to create the bucket tag.

For example:

h = metrics_histogram(
    'commcare.request.duration', 1.4,
    bucket_tag='duration', buckets=[1,2,3], bucket_units='ms',
    tags=tags
)

# resulting metrics
# commcare.request.duration:1|c|#duration:lt_2ms

For more explanation about why this implementation was chosen see:

PrometheusMetrics._histogram(name: str, value: float, bucket_tag: str, buckets: List[int], bucket_unit: str = '', tags: Dict[str, str] = None, documentation: str = '')[source]

A cumulative histogram with a base metric name of <name> exposes multiple time series during a scrape:

  • cumulative counters for the observation buckets, exposed as <name>_bucket{le=”<upper inclusive bound>”}

  • the total sum of all observed values, exposed as <name>_sum

  • the count of events that have been observed, exposed as <name>_count (identical to <name>_bucket{le=”+Inf”} above)

For example

h = metrics_histogram(
    'commcare.request_duration', 1.4,
    bucket_tag='duration', buckets=[1,2,3], bucket_units='ms',
    tags=tags
)

# resulting metrics
# commcare_request_duration_bucket{...tags..., le="1.0"} 0.0
# commcare_request_duration_bucket{...tags..., le="2.0"} 1.0
# commcare_request_duration_bucket{...tags..., le="3.0"} 1.0
# commcare_request_duration_bucket{...tags..., le="+Inf"} 1.0
# commcare_request_duration_sum{...tags...} 1.4
# commcare_request_duration_count{...tags...} 1.0

See https://prometheus.io/docs/concepts/metric_types/#histogram

Utilities

corehq.util.metrics.create_metrics_event(title: str, text: str, alert_type: str = 'info', tags: Dict[str, str] = None, aggregation_key: str = None)[source]

Send an event record to the monitoring provider.

Currently only implemented by the Datadog provider.

Parameters
  • title – Title of the event

  • text – Event body

  • alert_type – Event type. One of ‘success’, ‘info’, ‘warning’, ‘error’

  • tags – Event tags

  • aggregation_key – Key to use to group multiple events

corehq.util.metrics.metrics_gauge_task(name, fn, run_every, multiprocess_mode='all')[source]

Helper for easily registering gauges to run periodically

To update a gauge on a schedule based on the result of a function just add to your app’s tasks.py:

my_calculation = metrics_gauge_task(
    'commcare.my.metric', my_calculation_function, run_every=crontab(minute=0)
)
kwargs:

multiprocess_mode: See PrometheusMetrics._gauge for documentation.

corehq.util.metrics.metrics_histogram_timer(metric: str, timing_buckets: Iterable[int], tags: Dict[str, str] = None, bucket_tag: str = 'duration', callback: Callable = None)[source]

Create a context manager that times and reports to the metric providers as a histogram

Example Usage:

timer = metrics_histogram_timer('commcare.some.special.metric', tags={
    'type': type,
], timing_buckets=(.001, .01, .1, 1, 10, 100))
with timer:
    some_special_thing()

This will result it a call to metrics_histogram with the timer value.

Note: Histograms are implemented differently by each provider. See documentation for details.

Parameters
  • metric – Name of the metric (must start with ‘commcare.’)

  • tags – metric tags to include

  • timing_buckets – sequence of numbers representing time thresholds, in seconds

  • bucket_tag – The name of the bucket tag to use (if used by the underlying provider)

  • callback – a callable which will be called when exiting the context manager with a single argument of the timer duratio

Returns

A context manager that will perform the specified timing and send the specified metric

class corehq.util.metrics.metrics_track_errors(name)[source]

Record when something succeeds or errors in the configured metrics provider

Eg: This code will log to commcare.myfunction.succeeded when it completes successfully, and to commcare.myfunction.failed when an exception is raised.

@metrics_track_errors('myfunction')
def myfunction():
    pass

Other Notes

  • All metrics must use the prefix ‘commcare.’