ESQuery is a library for building elasticsearch queries in a friendly, more readable manner.

Basic usage

There should be a file and subclass of ESQuery for each index we have.

Each method returns a new object, so you can chain calls together like SQLAlchemy. Here’s an example usage:

q = (FormsES()
     .fields(['xmlns', 'domain', 'app_id'])
     .sort('received_on', desc=False)
     .terms_aggregation('babies.count', 'babies_saved'))
result =
total_docs =
hits = result.hits

Generally useful filters and queries should be abstracted away for re-use, but you can always add your own like so:

q.filter({"some_arbitrary_filter": {...}})
q.set_query({"fancy_query": {...}})

For debugging or more helpful error messages, you can use query.dumps() and query.pprint(), both of which use json.dumps() and are suitable for pasting in to ES Head or Marvel or whatever


Filters are implemented as standalone functions, so they can be composed and nested q.OR(web_users(), mobile_users()). Filters can be passed to the query.filter method: q.filter(web_users())

There is some syntactic sugar that lets you skip this boilerplate and just call the filter as if it were a method on the query class: q.web_users() In order to be available for this shorthand, filters are added to the builtin_filters property of the main query class. I know that’s a bit confusing, but it seemed like the best way to make filters available in both contexts.

Generic filters applicable to all indices are available in (But most/all can also be accessed as a query method, if appropriate)

Filtering Specific Indices

There is a file for each elasticsearch index (if not, feel free to add one). This file provides filters specific to that index, as well as an appropriately-directed ESQuery subclass with references to these filters.

These index-specific query classes also have default filters to exclude things like inactive users or deleted docs. These things should nearly always be excluded, but if necessary, you can remove these with remove_default_filters.

Running against production

Since the ESQuery library is read-only, it’s mostly safe to run against production. You can define alternate elasticsearch hosts in your localsettings file in the ELASTICSEARCH_DEBUG_HOSTS dictionary and pass in this host name as the debug_host to the constructor:

>>> CaseES(debug_host='prod').domain('dimagi').count()


  • es_query - the entire query, filters, query, pagination
  • filters - a list of the individual filters
  • query - the query, used for searching, not filtering
  • field - a field on the document. User docs have a ‘domain’ field.
  • lt/gt - less/greater than
  • lte/gte - less/greater than or equal to
class, debug_host=None, es_instance_alias='default')[source]

This query builder only outputs the following query structure:

    "query": {
        "filtered": {
            "filter": {
                "and": [
            "query": <query>
    <size, sort, other params>

Add the passed-in aggregation to the query


A list of callables that return filters. These will all be available as instance methods, so you can do self.term(field, value) instead of self.filter(filters.term(field, value))


Performs a minimal query to get the count of matching documents


Returns the JSON query that will be sent to elasticsearch.


Turn off _source retrieval. Mostly useful if you just want the doc_ids


Restrict the fields returned from elasticsearch

Deprecated. Use source instead.


Add the passed-in filter to the query. All filtering goes through this class.


Return a list of the filters used in this query, suitable if you want to reproduce a query with additional filtering.


Performs a minimal query to get the ids of the matching documents


pretty prints the JSON query that will be sent to elasticsearch.


Remove a specific default filter by passing in its name.


Sensible defaults are provided. Use this if you don’t want ‘em


Actually run the query. Returns an ESQuerySet object.


Run the query against the scroll api. Returns an iterator yielding each document that matches the query.

search_string_query(search_string, default_fields=None)[source]

Accepts a user-defined search string


Set the query. Most stuff we want is better done with filters, but if you actually want Levenshtein distance or prefix querying...


To be used with get_sorting_block, which interprets datatables sorting


Restrict number of results returned. Analagous to SQL limit.

sort(field, desc=False, reset_sort=True)[source]

Order the results by field.

source(include, exclude=None)[source]

Restrict the output of _source in the queryset. This can be used to return an object in a queryset


Pagination. Analagous to SQL offset.


modeled after django’s QuerySet.values

class, query)[source]
The object returned from
  • ESQuerySet.raw is the raw response from elasticsearch
  • ESQuerySet.query is the ESQuery object

Return just the docs ids from the response.


Return the docs from the response.

static normalize_result(query, result)[source]

Return the doc from an item in the query response.


Return the total number of docs matching the query.

class, debug_host=None, es_instance_alias='default')[source]

Query logic specific to CommCareHQ

Available Filters

The following filters are available on any ESQuery instance - you can chain any of these on your query.

Note also that the term filter accepts either a list or a single element. Simple filters which match against a field are based on this filter, so those will also accept lists. That means you can do form_query.xmlns(XMLNS1) or form_query.xmlns([XMLNS1, XMLNS2, ...]).

Contributing: Additions to this file should be added to the builtin_filters method on either ESQuery or HQESQuery, as appropriate (is it an HQ thing?).*filters)[source]

Filter docs to match all of the filters passed in[source]

Exclude docs matching the filter passed in*filters)[source]

Filter docs to match any of the filters passed in, gt=None, gte=None, lt=None, lte=None)[source]

Range filter that accepts datetime objects as arguments[source]

Filter by doc_id. Also accepts a list of doc ids[source]

Filter by doc_type. Also accepts a list[source]

Filter by domain.[source]

Only return docs with a missing or null value for field[source]

Only return docs which have a value for field, exist=True, null=True)[source]

Only return docs missing a value for field, filter_)[source]

Query nested documents which normally can’t be queried directly[source]

Only return docs with a real, non-null value for field, gt=None, gte=None, lt=None, lte=None)[source]

Filter field by a range. Pass in some sensible combination of gt (greater than), gte (greater than or equal to), lt, and lte., value)[source]

Filter docs by a field ‘value’ can be a singleton or a list.

Available Queries

Queries are used for actual searching - things like relevancy scores, Levenstein distance, and partial matches.

View the elasticsearch documentation to see what other options are available, and put ‘em here if you end up using any of ‘em., filter_)[source]

Filtered query for performing both filtering and querying at once[source]

No-op query used because a default must be specified, query, *args, **kwargs)[source]

Creates a nested query for use with nested documents

Keyword arguments such as score_mode and others can be added., filter_, *args, **kwargs)[source]

Creates a nested query for use with nested documents

Keyword arguments such as score_mode and others can be added., default_fields=None)[source]

Allows users to use advanced query syntax, but if search_string does not use the ES query string syntax, default to doing an infix search for each term. (This may later change to some kind of fuzzy matching).

This is also available via the main ESQuery class.

Aggregate Queries

Aggregations are a replacement for Facets

Here is an example used to calculate how many new pregnancy cases each user has opened in a certain date range.

res = (CaseES()
       .date_range('opened_on', gte=startdate, lte=enddate))
       .aggregation(TermsAggregation('by_user', 'opened_by')

buckets = res.aggregations.by_user.buckets

There’s a bit of magic happening here - you can access the raw json data from this aggregation via res.aggregation('by_user') if you’d prefer to skip it.

The res object has a aggregations property, which returns a namedtuple pointing to the wrapped aggregation results. The name provided at instantiation is used here (by_user in this example).

The wrapped aggregation_result object has a result property containing the aggregation data, as well as utilties for parsing that data into something more useful. For example, the TermsAggregation result also has a counts_by_bucket method that returns a {bucket: count} dictionary, which is normally what you want.

As of this writing, there’s not much else developed, but it’s pretty easy to add support for other aggregation types and more results processing


Note that a range includes the “start” value and excludes the “end” value. i.e. start <= X < end

  • start – range start
  • end – range end
  • key – optional key name for the range
class, field)

Alias for field number 1


Alias for field number 0

class, datefield, interval, timezone=None)[source]

Aggregate by date range. This can answer questions like “how many forms were created each day?”.

This class can be instantiated by the ESQuery.date_histogram method.

  • name – what do you want to call this aggregation
  • datefield – the document’s date field to look at
  • interval – the date interval to use: “year”, “quarter”, “month”, “week”, “day”, “hour”, “minute”, “second”
  • timezone – do bucketing using this time zone instead of UTC
class, field, script=None)[source]

Extended stats aggregation that computes an extended stats aggregation by field

class, filter)[source]

Bucket aggregation that creates a single bucket for the specified filter

  • name – aggregation name
  • filter – filter body
class, filters=None)[source]

Bucket aggregation that creates a bucket for each filter specified using the filter name.

Parameters:name – aggregation name
add_filter(name, filter)[source]
  • name – filter name
  • filter – filter body
class, field)[source]

Bucket aggregation that returns the minumum value of a field

  • name – aggregation name
  • field – name of the field to min
class, field)[source]

A field data based single bucket aggregation, that creates a bucket of all documents in the current document set context that are missing a field value (effectively, missing a field or having the configured NULL value set).

  • name – aggregation name
  • field – name of the field to bucket on
class, path)[source]

A special single bucket aggregation that enables aggregating nested documents.

Parameters:path – Path to nested document
class, terms, inner_most_aggregation=None)[source]

Helper to run nested term-based queries (equivalent to SQL group-by clauses). This is not at all related to the ES ‘nested aggregation’. The final aggregation defaults to a count of documents, though can also be used to sum a different field of the document.

Example usage:

# counting all forms submitted in a domain grouped by app id and user id NestedTermAggregationsHelper(

base_query=FormES().domain(domain_name), terms=[

AggregationTerm(‘app_id’, ‘app_id’), AggregationTerm(‘user_id’, ‘form.meta.userID’),



# summing the balances of ledger values, grouped by the entry id NestedTermAggregationsHelper(

base_query=LedgerES().domain(domain).section(section_id), terms=[

AggregationTerm(‘entry_id’, ‘entry_id’),

], inner_most_aggregation=SumAggregation(‘balance’, ‘balance’),


class, field, ranges=None, keyed=True)[source]

Bucket aggregation that creates one bucket for each range :param name: the aggregation name :param field: the field to perform the range aggregations on :param ranges: list of AggregationRange objects :param keyed: set to True to have the results returned by key instead of as a list

(see RangeResult.normalized_buckets)
class, field, script=None)[source]

Stats aggregation that computes a stats aggregation by field

  • name – aggregation name
  • field – name of the field to collect stats on
  • script – an optional field to allow you to script the computed field
class, field)[source]

Bucket aggregation that sums a field

  • name – aggregation name
  • field – name of the field to sum
class, field, size=None)[source]

Bucket aggregation that aggregates by field

  • name – aggregation name
  • field – name of the field to bucket on
  • size
class, field=None, is_ascending=True, size=1, include=None)[source]

A top_hits metric aggregator keeps track of the most relevant document being aggregated This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

  • name – Aggregation name
  • field – This is the field to sort the top hits by. If None, defaults to sorting by score.
  • is_ascending – Whether to sort the hits in ascending or descending order.
  • size – The number of hits to include. Defaults to 1.
  • include – An array of fields to include in the hit. Defaults to returning the whole document.


class, debug_host=None, es_instance_alias='default')[source]
index = 'apps'[source][source][source][source][source]


Here’s an example adapted from the case list report - it gets a list of the ids of all unknown users, web users, and demo users on a domain.

from import users as user_es

user_filters = [

query = (user_es.UserES()

owner_ids = query.get_ids()
class, debug_host=None, es_instance_alias='default')[source]
default_filters = {'active': {'term': {'is_active': True}}, 'not_deleted': {'term': {'base_doc': 'couchuser'}}}
index = 'users'

Include inactive users, which would normally be filtered out.


Return only AdminUsers. Admin users are mock users created from xform submissions with unknown user ids whose username is “admin”.[source], gte=None, lt=None, lte=None)[source][source]

Matches users whose username is demo_user[source][source], gte=None, lt=None, lte=None)[source][source][source][source][source]

Return only UnknownUsers. Unknown users are mock users created from xform submissions with unknown user ids.[source][source][source]


Here’s an example getting pregnancy cases that are either still open or were closed after May 1st.

from import cases as case_es

q = (case_es.CaseES()
         case_es.closed_range(, 05, 01))))
class, debug_host=None, es_instance_alias='default')[source]
index = 'cases', gte=None, lt=None, lte=None)[source]

Restricts cases returned to those with actions during the range[source][source], gte=None, lt=None, lte=None)[source][source], gte=None, lt=None, lte=None)[source]'open_case', gt=None, gte=None, lt=None, lte=None)[source][source], gte=None, lt=None, lte=None)[source][source][source], gte=None, lt=None, lte=None)[source], gte=None, lt=None, lte=None)[source][source][source]


class, debug_host=None, es_instance_alias='default')[source]
default_filters = {'has_domain': {'not': {'missing': {'field': 'domain'}}}, 'is_xform_instance': {'term': {'doc_type': 'xforminstance'}}, 'has_xmlns': {'not': {'missing': {'field': 'xmlns'}}}, 'has_user': {'not': {'missing': {'field': 'form.meta.userID'}}}}
index = 'forms'

Include only archived forms, which are normally excluded

user_aggregation()[source][source], gte=None, lt=None, lte=None)[source], gte=None, lt=None, lte=None)[source], gte=None, lt=None, lte=None)[source][source][source][source][source]


Here’s an example generating a histogram of domain creations (that’s a type of faceted query), filtered by a provided list of domains and a report date range.

from import DomainES

domains_after_date = (DomainES()
                      .created(gte=datespan.startdate, lte=datespan.enddate)
                      .date_histogram('date', 'date_created', interval)
histo_data =
class, debug_host=None, es_instance_alias='default')[source]
default_filters = {'not_snapshot': {'not': {'term': {'is_snapshot': True}}}}
index = 'domains'

Normally snapshots are excluded, instead, return only snapshots[source][source][source], gte=None, lt=None, lte=None)[source][source][source][source][source][source], gte=None, lt=None, lte=None)[source][source][source]


class, debug_host=None, es_instance_alias='default')[source]
index = 'sms'
user_aggregation()[source][source][source][source][source][source], gte=None, lt=None, lte=None)[source][source][source][source][source][source]