User Configurable Reporting

An overview of the design, API and data structures used here.

The docs on reporting, pillows, and change feeds, are useful background.

Data Flow 

Reporting is handled in multiple stages. Here is the basic workflow.

Raw data (form or case) → [Data source config] → Row in database table → [Report config] → Report in HQ

Both the data source config and report config are JSON documents that live in the database. The data source config determines how raw data (forms and cases) gets mapped to rows in an intermediary table, while the report config(s) determine how that report table gets turned into an interactive report in the UI.

A UCR table is created when a new data source is created. The table’s structure is updated whenever the UCR is “rebuilt”, which happens when the data source config is edited. Rebuilds can also be kicked off manually via either rebuild_indicator_table or the UI. Rebuilding happens asynchronously. Data in the table is refreshed continuously by pillows.

Data Sources 

Each data source configuration maps a filtered set of the raw data to indicators. A data source configuration consists of two primary sections:

A filter that determines whether the data is relevant for the data source
A list of indicators in that data source

In addition to these properties there are a number of relatively self-explanatory fields on a data source such as the table_id and display_name, and a few more nuanced ones. The full list of available fields is summarized in the following table:

Field	Description
filter	Determines whether the data is relevant for the data source
indicators	List of indicators to save
table_id	A unique ID for the table
display_name	A display name for the table that shows up in UIs
base_item_expression	Used for making tables off of repeat or list data
named_expressions	A list of named expressions that can be referenced in other filters and indicators
named_filters	A list of named filters that can be referenced in other filters and indicators

Data Source Filtering 

When setting up a data source configuration, filtering defines what data applies to a given set of indicators. Some example uses of filtering on a data source include:

Restricting the data by document type (e.g. cases or forms). This is a built-in filter.
Limiting the data to a particular case or form type
Excluding demo user data
Excluding closed cases
Only showing data that meets a domain-specific condition (e.g. pregnancy cases opened for women over 30 years of age)

Filter type overview 

There are currently four supported filter types. However, these can be used together to produce arbitrarily complicated expressions.

Filter Type	Description
boolean_expression	A expression / logic statement (more below)
and	An “and” of other filters - true if all are true
or	An “or” of other filters - true if any are true
not	A “not” or inverse expression on a filter

To understand the boolean_expression type, we must first explain expressions.

Expressions 

An expression is a way of representing a set of operations that should return a single value. Expressions can basically be thought of as functions that take in a document and return a value:

Expression: function(document) → value

In normal math/python notation, the following are all valid expressions on a doc (which is presumed to be a dict object:

"hello"
7
doc["name"]
doc["child"]["age"]
doc["age"] < 21
"legal" if doc["age"] > 21 else "underage"

In user configurable reports the following expression types are currently supported (note that this can and likely will be extended in the future):

Expression Type	Description	Example
identity	Just returns whatever is passed in	`doc`
constant	A constant	`"hello"`, `4`, `2014-12-20`
property_name	A reference to the property in a document	`doc["name"]`
property_path	A nested reference to a property in a document	`doc["child"]["age"]`
conditional	An if/else expression	`"legal" if doc["age"] > 21 else "minor"`
switch	A switch statement	`if doc["age"] == 21: "legal"` `elif doc["age"] == 60: "senior"` `else: ...`
array_index	An index into an array	`doc[1]`
split_string	Splitting a string and grabbing a specific element from it by index	`doc["foobar"].split(' ')[0]`
iterator	Combine multiple expressions into a list	`[doc.name, doc.age, doc.gender]`
related_doc	A way to reference something in another document	`form.case.owner_id`
root_doc	A way to reference the root document explicitly (only needed when making a data source from repeat/child data)	`repeat.parent.name`
ancestor_location	A way to retrieve the ancestor of a particular type from a location
nested	A way to chain any two expressions together	`f1(f2(doc))`
dict	A way to emit a dictionary of key/value pairs	`{"name": "test", "value": f(doc)}`
add_days	A way to add days to a date	`my_date + timedelta(days=15)`
add_months	A way to add months to a date	`my_date + relative delta(months=15)`
month_start_date	First day in the month of a date	`2015-01-20` -> `2015-01-01`
month_end_date	Last day in the month of a date	`2015-01-20` -> `2015-01-31`
diff_days	A way to get duration in days between two dates	`(to_date - from-date).days`
evaluator	A way to do arithmetic operations	`a + b*c / d`
base_iteration_number	Used with `base_item_expression` ` <#saving-multiple-ro ws-per-caseform>`__ - a way to get the current iteration number (starting from 0).	`loop.index`

Following expressions act on a list of objects or a list of lists (for e.g. on a repeat list) and return another list or value. These expressions can be combined to do complex aggregations on list data.

Expression Type	Description	Example
filter_items	Filter a list of items to make a new list	`[1, 2, 3, -1, -2, -3]` -> `[1, 2, 3]` (filter numbers greater than zero)
map_items	Map one list to another list	`[{'name': 'a', sex: 'f'}, {'name': 'b', gender: 'm'}]` -> `['a', 'b']` (list of names from list of child data)
sort_items	Sort a list based on an expression	`[{'name': 'a', age: 5}, {'name': 'b', age: 3}]` -> `[{'name': 'b', age: 3}, {'name': 'a', age: 5}]` (sort child data by age)
reduce_items	Aggregate a list of items into one value	sum on `[1, 2, 3]` -> `6`
flatten	Flatten multiple lists of items into one list	`[[1, 2], [4, 5]]` -> `[1, 2, 4, 5]`

JSON snippets for expressions 

Here are JSON snippets for the various expression types. Hopefully they are self-explanatory.

Constant Expression 

class corehq.apps.userreports.expressions.specs.ConstantGetterSpec(_obj=None, **kwargs)[source]

There are two formats for constant expressions. The simplified format is simply the constant itself. For example "hello", or 5.

The complete format is as follows. This expression returns the constant "hello":

{
    "type": "constant",
    "constant": "hello"
}

Property Name Expression 

class corehq.apps.userreports.expressions.specs.PropertyNameGetterSpec(_obj=None, **kwargs)[source]

This expression returns doc["age"]:

{
    "type": "property_name",
    "property_name": "age"
}

An optional "datatype" attribute may be specified, which will attempt to cast the property to the given data type. The options are “date”, “datetime”, “string”, “integer”, and “decimal”. If no datatype is specified, “string” will be used.

Property Path Expression 

class corehq.apps.userreports.expressions.specs.PropertyPathGetterSpec(_obj=None, **kwargs)[source]

This expression returns doc["child"]["age"]:

{
    "type": "property_path",
    "property_path": ["child", "age"]
}

An optional "datatype" attribute may be specified, which will attempt to cast the property to the given data type. The options are “date”, “datetime”, “string”, “integer”, and “decimal”. If no datatype is specified, “string” will be used.

Jsonpath Expression 

class corehq.apps.userreports.expressions.specs.JsonpathExpressionSpec(_obj=None, **kwargs)[source]

This will execute the jsonpath expression against the current doc and emit the result.

{
    "type": "jsonpath",
    "jsonpath": "form..case.name",
}

Given the following doc:

{
    "form": {
        "case": {"name": "a"},
        "nested": {
            "case": {"name": "b"},
        },
        "list": [
            {"case": {"name": "c"}},
            {
                "nested": {
                    "case": {"name": "d"}
                }
            }
        ]
    }
}

This above expression will evaluate to ["a", "b", "c", "d"]. Another example is form.list[0].case.name which will evaluate to "c".

See also the jsonpath evaluator function.

For more information consult the following resources:

Conditional Expression 

class corehq.apps.userreports.expressions.specs.ConditionalExpressionSpec(_obj=None, **kwargs)[source]

This expression returns "legal" if doc["age"] > 21 else "underage":

{
    "type": "conditional",
    "test": {
        "operator": "gt",
        "expression": {
            "type": "property_name",
            "property_name": "age",
            "datatype": "integer"
        },
        "type": "boolean_expression",
        "property_value": 21
    },
    "expression_if_true": {
        "type": "constant",
        "constant": "legal"
    },
    "expression_if_false": {
        "type": "constant",
        "constant": "underage"
    }
}

Note that this expression contains other expressions inside it! This is why expressions are powerful. (It also contains a filter, but we haven’t covered those yet - if you find the "test" section confusing, keep reading…)

Note also that it’s important to make sure that you are comparing values of the same type. In this example, the expression that retrieves the age property from the document also casts the value to an integer. If this datatype is not specified, the expression will compare a string to the 21 value, which will not produce the expected results!

Switch Expression 

class corehq.apps.userreports.expressions.specs.SwitchExpressionSpec(_obj=None, **kwargs)[source]

This expression returns the value of the expression for the case that matches the switch on expression. Note that case values may only be strings at this time.

{
    "type": "switch",
    "switch_on": {
        "type": "property_name",
        "property_name": "district"
    },
    "cases": {
        "north": {
            "type": "constant",
            "constant": 4000
        },
        "south": {
            "type": "constant",
            "constant": 2500
        },
        "east": {
            "type": "constant",
            "constant": 3300
        },
        "west": {
            "type": "constant",
            "constant": 65
        },
    },
    "default": {
        "type": "constant",
        "constant": 0
    }
}

Coalesce Expression 

class corehq.apps.userreports.expressions.specs.CoalesceExpressionSpec(_obj=None, **kwargs)[source]

This expression returns the value of the expression provided, or the value of the default_expression if the expression provided evaluates to a null or blank string.

{
    "type": "coalesce",
    "expression": {
        "type": "property_name",
        "property_name": "district"
    },
    "default_expression": {
        "type": "constant",
        "constant": "default_district"
    }
}

Array Index Expression 

class corehq.apps.userreports.expressions.specs.ArrayIndexExpressionSpec(_obj=None, **kwargs)[source]

This expression returns doc["siblings"][0]:

{
    "type": "array_index",
    "array_expression": {
        "type": "property_name",
        "property_name": "siblings"
    },
    "index_expression": {
        "type": "constant",
        "constant": 0
    }
}

It will return nothing if the siblings property is not a list, the index isn’t a number, or the indexed item doesn’t exist.

Split String Expression 

class corehq.apps.userreports.expressions.specs.SplitStringExpressionSpec(_obj=None, **kwargs)[source]

This expression returns (doc["foo bar"]).split(' ')[0]:

{
    "type": "split_string",
    "string_expression": {
        "type": "property_name",
        "property_name": "multiple_value_string"
    },
    "index_expression": {
        "type": "constant",
        "constant": 0
    },
    "delimiter": ","
}

The delimiter is optional and is defaulted to a space. It will return nothing if the string_expression is not a string, or if the index isn’t a number or the indexed item doesn’t exist. The index_expression is also optional. Without it, the expression will return the list of elements.

Iterator Expression 

class corehq.apps.userreports.expressions.specs.IteratorExpressionSpec(_obj=None, **kwargs)[source]

{
    "type": "iterator",
    "expressions": [
        {
            "type": "property_name",
            "property_name": "p1"
        },
        {
            "type": "property_name",
            "property_name": "p2"
        },
        {
            "type": "property_name",
            "property_name": "p3"
        },
    ],
    "test": {}
}

This will emit [doc.p1, doc.p2, doc.p3]. You can add a test attribute to filter rows from what is emitted - if you don’t specify this then the iterator will include one row per expression it contains regardless of what is passed in. This can be used/combined with the base_item_expression to emit multiple rows per document.

Base iteration number expressions 

class corehq.apps.userreports.expressions.specs.IterationNumberExpressionSpec(_obj=None, **kwargs)[source]

These are very simple expressions with no config. They return the index of the repeat item starting from 0 when used with a base_item_expression.

{
    "type": "base_iteration_number"
}

Ancestor location expression 

class corehq.apps.locations.ucr_expressions.AncestorLocationExpression(_obj=None, **kwargs)[source]

This is used to return a json object representing the ancestor of the given type of the given location. For instance, if we had locations configured with a hierarchy like country -> state -> county -> city, we could pass the location id of Cambridge and a location type of state to this expression to get the Massachusetts location.

{
    "type": "ancestor_location",
    "location_id": {
        "type": "property_name",
        "name": "owner_id"
    },
    "location_type": {
        "type": "constant",
        "constant": "state"
    }
}

If no such location exists, returns null.

Optionally you can specifiy location_property to return a single property of the location.

{
    "type": "ancestor_location",
    "location_id": {
        "type": "property_name",
        "name": "owner_id"
    },
    "location_type": {
        "type": "constant",
        "constant": "state"
    },
    "location_property": "site_code"
}

Nested expressions 

class corehq.apps.userreports.expressions.specs.NestedExpressionSpec(_obj=None, **kwargs)[source]

These can be used to nest expressions. This can be used, e.g. to pull a specific property out of an item in a list of objects.

The following nested expression is the equivalent of a property_path expression to ["outer", "inner"] and demonstrates the functionality. More examples can be found in the practical examples.

{
    "type": "nested",
    "argument_expression": {
        "type": "property_name",
        "property_name": "outer"
    },
    "value_expression": {
        "type": "property_name",
        "property_name": "inner"
    }
}

Dict expressions 

class corehq.apps.userreports.expressions.specs.DictExpressionSpec(_obj=None, **kwargs)[source]

These can be used to create dictionaries of key/value pairs. This is only useful as an intermediate structure in another expression since the result of the expression is a dictionary that cannot be saved to the database.

See the practical examples for a way this can be used in a base_item_expression to emit multiple rows for a single form/case based on different properties.

Here is a simple example that demonstrates the structure. The keys of properties must be text, and the values must be valid expressions (or constants):

{
    "type": "dict",
    "properties": {
        "name": "a constant name",
        "value": {
            "type": "property_name",
            "property_name": "prop"
        },
        "value2": {
            "type": "property_name",
            "property_name": "prop2"
        }
    }
}

“Add Days” expressions 

class corehq.apps.userreports.expressions.date_specs.AddDaysExpressionSpec(_obj=None, **kwargs)[source]

Below is a simple example that demonstrates the structure. The expression below will add 28 days to a property called “dob”. The date_expression and count_expression can be any valid expressions, or simply constants.

{
    "type": "add_days",
    "date_expression": {
        "type": "property_name",
        "property_name": "dob",
    },
    "count_expression": 28
}

“Add Hours” expressions 

class corehq.apps.userreports.expressions.date_specs.AddHoursExpressionSpec(_obj=None, **kwargs)[source]

Below is a simple example that demonstrates the structure. The expression below will add 12 hours to a property called “visit_date”. The date_expression and count_expression can be any valid expressions, or simply constants.

{
    "type": "add_hours",
    "date_expression": {
        "type": "property_name",
        "property_name": "visit_date",
    },
    "count_expression": 12
}

“Add Months” expressions 

class corehq.apps.userreports.expressions.date_specs.AddMonthsExpressionSpec(_obj=None, **kwargs)[source]

add_months offsets given date by given number of calendar months. If offset results in an invalid day (for e.g. Feb 30, April 31), the day of resulting date will be adjusted to last day of the resulting calendar month.

The date_expression and months_expression can be any valid expressions, or simply constants, including negative numbers.

{
    "type": "add_months",
    "date_expression": {
        "type": "property_name",
        "property_name": "dob",
    },
    "months_expression": 28
}

“Diff Days” expressions 

class corehq.apps.userreports.expressions.date_specs.DiffDaysExpressionSpec(_obj=None, **kwargs)[source]

diff_days returns number of days between dates specified by from_date_expression and to_date_expression. The from_date_expression and to_date_expression can be any valid expressions, or simply constants.

{
    "type": "diff_days",
    "from_date_expression": {
        "type": "property_name",
        "property_name": "dob",
    },
    "to_date_expression": "2016-02-01"
}

“Month Start Date” and “Month End Date” expressions 

class corehq.apps.userreports.expressions.date_specs.MonthStartDateExpressionSpec(_obj=None, **kwargs)[source]

month_start_date returns date of first day in the month of given date and month_end_date returns date of last day in the month of given date.

The date_expression can be any valid expression, or simply constant

{
    "type": "month_start_date",
    "date_expression": {
        "type": "property_name",
        "property_name": "dob",
    },
}

“Evaluator” expression 

class corehq.apps.userreports.expressions.specs.EvalExpressionSpec(_obj=None, **kwargs)[source]

evaluator expression can be used to evaluate statements that contain arithmetic (and simple python like statements). It evaluates the statement specified by statement which can contain variables as defined in context_variables.

{
    "type": "evaluator",
    "statement": "a + b - c + 6",
    "context_variables": {
        "a": 1,
        "b": 20,
        "c": {
            "type": "property_name",
            "property_name": "the_number_two"
        }
    }
}

This returns 25 (1 + 20 - 2 + 6).

statement

The expression statement to be evaluated.

context_variables

A dictionary of Expressions where keys are names of variables used in the statement and values are expressions to generate those variables.

Variable types must be one of:

str

int

float

bool

date

datetime

If context_variables is omitted, the current context of the expression will be used.

Expression limitations

Only a single expression is permitted.

Available operators:

math operators (except the power operator)

modulus

negation

comparison operators

logical operators

In addition, expressions can perform the following operations:

index: case[‘name’]

slice: cases[0:2]

if statements: 1 if case.name == ‘bob’ else 0

list comprehension: [i for i in range(3)]

dict, list, set construction: {“a”: 1, “b”: set(cases), “c”: list(range(4))}

Available Functions

Only the following functions are available in the evaluation context:

context()

Get the current evaluation context. See also root_context().

date(value, fmt=None)

Parse a string value as a date or timestamp. If fmt is not supplied the string is assumed to be in ISO 8601 format.

Parameters:

fmt – If supplied, use this format specification to parse the date. See the Python documentation for Format Codes.

float(value)

Convert value to a floating point number.

int(value)

Convert value to an int. Value can be a number or a string representation of a number.
jsonpath(expr, as_list=False, context=None)
Evaluate a jsonpath expression.

See also Jsonpath Expression.
jsonpath("form.case.name")
jsonpath("name", context=jsonpath("form.case"))
jsonpath("form..case", as_list=True)
Parameters:

expr – The jsonpath expression.

as_list – When set to True, always return the full list of matches, even if it is emtpy. If set to False then the return value will be None if no matches are found. If a single match is found the matched value will be returned. If more than one match is found, they will all be returned as a list.

context – Optional context for evaluation. If not supplied the full context of the evaluator will be used.

Returns:

See as_list.
named(name, context=None)
Call a named expression. See also Named Expressions.
named("my-named-expression")
named("my-named-expression", context=form.case)
rand()

Generate a random number between 0 and 1

randint(max)

Generate a random integer between 0 and max

range(start[, stop][, skip])

Produces a sequence of integers from start (inclusive) to stop (exclusive) by step. Note that for performance reasons this is limited to 100 items or less. See range.

root_context()

Get the root context of the evaluation. Similar to the root_doc expression.

See also context().

round(value, ndigits=None)

Round a number to the nearest integer or ndigits after the decimal point. See round.

str(value)

Convert value to a string.
timedelta_to_seconds(delta)
Convert a TimeDelta object into seconds. This is useful for getting the number of seconds between two dates.
timedelta_to_seconds(time_end - time_start)
today()

Return the current UTC date.

‘Get Reporting Groups’ expression 

class corehq.apps.userreports.expressions.specs.ReportingGroupsExpressionSpec(_obj=None, **kwargs)[source]

get_reporting_groups will return an array of the reporting groups that are assigned to a provided user ID. The array will contain one document per reporting group.

{
    "type": "get_reporting_groups",
    "user_id_expression": {
        "type": "property_path",
        "property_path": ["form", "meta", "userID"]
    }
}

Filter, Sort, Map and Reduce Expressions 

We have following expressions that act on a list of objects or list of lists. The list to operate on is specified by items_expression. This can be any valid expression that returns a list. If the items_expression doesn’t return a valid list, these might either fail or return one of empty list or None value.

map_items Expression 

class corehq.apps.userreports.expressions.list_specs.MapItemsExpressionSpec(_obj=None, **kwargs)[source]

map_items performs a calculation specified by map_expression on each item of the list specified by items_expression and returns a list of the calculation results. The map_expression is evaluated relative to each item in the list and not relative to the parent document from which the list is specified. For e.g. if items_expression is a path to repeat-list of children in a form document, map_expression is a path relative to the repeat item.

items_expression can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify a datatype of array if the expression could return a single element.

map_expression can be any valid expression relative to the items in above list.

{
    "type": "map_items",
    "items_expression": {
        "datatype": "array",
        "type": "property_path",
        "property_path": ["form", "child_repeat"]
    },
    "map_expression": {
        "type": "property_path",
        "property_path": ["age"]
    }
}

Above returns list of ages. Note that the property_path in map_expression is relative to the repeat item rather than to the form.

filter_items Expression 

class corehq.apps.userreports.expressions.list_specs.FilterItemsExpressionSpec(_obj=None, **kwargs)[source]

filter_items performs filtering on given list and returns a new list. If the boolean expression specified by filter_expression evaluates to a True value, the item is included in the new list and if not, is not included in the new list.

items_expression can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify a datatype of array if the expression could return a single element.

filter_expression can be any valid boolean expression relative to the items in above list.

{
    "type": "filter_items",
    "items_expression": {
        "datatype": "array",
        "type": "property_name",
        "property_name": "family_repeat"
    },
    "filter_expression": {
       "type": "boolean_expression",
        "expression": {
            "type": "property_name",
            "property_name": "gender"
        },
        "operator": "eq",
        "property_value": "female"
    }
}

sort_items Expression 

class corehq.apps.userreports.expressions.list_specs.SortItemsExpressionSpec(_obj=None, **kwargs)[source]

sort_items returns a sorted list of items based on sort value of each item.The sort value of an item is specified by sort_expression. By default, list will be in ascending order. Order can be changed by adding optional order expression with one of DESC (for descending) or ASC (for ascending) If a sort-value of an item is None, the item will appear in the start of list. If sort-values of any two items can’t be compared, an empty list is returned.

items_expression can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify a datatype of array if the expression could return a single element.

sort_expression can be any valid expression relative to the items in above list, that returns a value to be used as sort value.

{
    "type": "sort_items",
    "items_expression": {
        "datatype": "array",
        "type": "property_path",
        "property_path": ["form", "child_repeat"]
    },
    "sort_expression": {
        "type": "property_path",
        "property_path": ["age"]
    }
}

reduce_items Expression 

class corehq.apps.userreports.expressions.list_specs.ReduceItemsExpressionSpec(_obj=None, **kwargs)[source]

reduce_items returns aggregate value of the list specified by aggregation_fn.

items_expression can be any valid expression that returns a list. If this doesn’t evaluate to a list, aggregation_fn will be applied on an empty list. It may be necessary to specify a datatype of array if the expression could return a single element.

aggregation_fn is one of following supported functions names.

Function Name	Example
`count`	`['a', 'b']` -> 2
`sum`	`[1, 2, 4]` -> 7
`min`	`[2, 5, 1]` -> 1
`max`	`[2, 5, 1]` -> 5
`first_item`	`['a', 'b']` -> ‘a’
`last_item`	`['a', 'b']` -> ‘b’
`join`	`['a', 'b']` -> ‘ab’

{
    "type": "reduce_items",
    "items_expression": {
        "datatype": "array",
        "type": "property_name",
        "property_name": "family_repeat"
    },
    "aggregation_fn": "count"
}

This returns number of family members

flatten expression 

class corehq.apps.userreports.expressions.list_specs.FlattenExpressionSpec(_obj=None, **kwargs)[source]

flatten takes list of list of objects specified by items_expression and returns one list of all objects.

items_expression is any valid expression that returns a list of lists. It this doesn’t evaluate to a list of lists an empty list is returned. It may be necessary to specify a datatype of array if the expression could return a single element.

{
    "type": "flatten",
    "items_expression": {},
}

Named Expressions 

class corehq.apps.userreports.expressions.specs.NamedExpressionSpec(_obj=None, **kwargs)[source]

Last, but certainly not least, are named expressions. These are special expressions that can be defined once in a data source and then used throughout other filters and indicators in that data source. This allows you to write out a very complicated expression a single time, but still use it in multiple places with a simple syntax.

Named expressions are defined in a special section of the data source. To reference a named expression, you just specify the type of "named" and the name as follows:

{
    "type": "named",
    "name": "my_expression"
}

This assumes that your named expression section of your data source includes a snippet like the following:

{
    "my_expression": {
        "type": "property_name",
        "property_name": "test"
    }
}

This is just a simple example - the value that "my_expression" takes on can be as complicated as you want and it can also reference other named expressions as long as it doesn’t reference itself of create a recursive cycle.

See also the named evaluator function.

Boolean Expression Filters 

A boolean_expression filter combines an expression, an operator, and a property value (a constant), to produce a statement that is either True or False. Note: in the future the constant value may be replaced with a second expression to be more general, however currently only constant property values are supported.

Here is a sample JSON format for simple boolean_expression filter:

{
    "type": "boolean_expression",
    "expression": {
        "type": "property_name",
        "property_name": "age",
        "datatype": "integer"
    },
    "operator": "gt",
    "property_value": 21
}

This is equivalent to the python statement: doc["age"] > 21

Operators 

The following operators are currently supported:

Operator	Description	Value type	Example
`eq`	is equal	constant	`doc["age"] == 21`
`not_eq`	is not equal	constant	`doc["age"] != 21`
`in`	single value is in a list	list	`doc["color"] in ["red", "blue"]`
`in_multi`	a value is in a multi select	list	`selected(doc["color"], "red")`
`any_in_multi`	one of a list of values in in a multiselect	list	`selected(doc["color"], ["red", "blue"])`
`lt`	is less than	number	`doc["age"] < 21`
`lte`	is less than or equal	number	`doc["age"] <= 21`
`gt`	is greater than	number	`doc["age"] > 21`
`gte`	is greater than or equal	number	`doc["age"] >= 21`
`regex`	matches regular expression	string	`re.search("^([Rr]ed\|[Bb]lue)$", doc["color"])`

Compound filters 

Compound filters build on top of boolean_expression filters to create boolean logic. These can be combined to support arbitrarily complicated boolean logic on data. There are three types of filters, and, or, and not filters. The JSON representation of these is below. Hopefully these are self explanatory.

“And” Filters 

The following filter represents the statement: doc["age"] < 21 and doc["nationality"] == "american":

{
    "type": "and",
    "filters": [
        {
            "type": "boolean_expression",
            "expression": {
                "type": "property_name",
                "property_name": "age",
                "datatype": "integer"
            },
            "operator": "lt",
            "property_value": 21
        },
        {
            "type": "boolean_expression",
            "expression": {
                "type": "property_name",
                "property_name": "nationality",
            },
            "operator": "eq",
            "property_value": "american"
        }
    ]
}

“Or” Filters 

The following filter represents the statement: doc["age"] > 21 or doc["nationality"] == "european":

{
    "type": "or",
    "filters": [
        {
            "type": "boolean_expression",
            "expression": {
                "type": "property_name",
                "property_name": "age",
                "datatype": "integer",
            },
            "operator": "gt",
            "property_value": 21
        },
        {
            "type": "boolean_expression",
            "expression": {
                "type": "property_name",
                "property_name": "nationality",
            },
            "operator": "eq",
            "property_value": "european"
        }
    ]
}

“Not” Filters 

The following filter represents the statement: !(doc["nationality"] == "european"):

{
    "type": "not",
    "filter": [
        {
            "type": "boolean_expression",
            "expression": {
                "type": "property_name",
                "property_name": "nationality",
            },
            "operator": "eq",
            "property_value": "european"
        }
    ]
}

Note that this could be represented more simply using a single filter with the ``not_eq`` operator, but “not” filters can represent more complex logic than operators generally, since the filter itself can be another compound filter.

Practical Examples 

See practical examples for some practical examples showing various filter types.

Indicators 

Now that we know how to filter the data in our data source, we are still left with a very important problem: how do we know what data to save? This is where indicators come in. Indicators are the data outputs - what gets computed and put in a column in the database.

A typical data source will include many indicators (data that will later be included in the report). This section will focus on defining a single indicator. Single indicators can then be combined in a list to fully define a data source.

The overall set of possible indicators is theoretically any function that can take in a single document (form or case) and output a value. However the set of indicators that are configurable is more limited than that.

Indicator Properties 

All indicator definitions have the following properties:

Property	Description
type	A specified type for the indicator. It must be one of the types listed below.
column_id	The database column where the indicator will be saved.
display_name	A display name for the indicator (not widely used, currently).
comment	A string describing the indicator

Additionally, specific indicator types have other type-specific properties. These are covered below.

Indicator types 

The following primary indicator types are supported:

Indicator Type	Description
boolean	Save `1` if a filter is true, otherwise `0`.
expression	Save the output of an expression.
choice_list	Save multiple columns, one for each of a predefined set of choices
ledger_balances	Save a column for each product specified, containing ledger data

Note/todo: there are also other supported formats, but they are just shortcuts around the functionality of these ones they are left out of the current docs.

Boolean indicators 

Now we see again the power of our filter framework defined above! Boolean indicators take any arbitrarily complicated filter expression and save a 1 to the database if the expression is true, otherwise a 0. Here is an example boolean indicator which will save 1 if a form has a question with ID is_pregnant with a value of "yes":

{
    "type": "boolean",
    "column_id": "col",
    "filter": {
        "type": "boolean_expression",
        "expression": {
            "type": "property_path",
            "property_path": ["form", "is_pregnant"],
        },
        "operator": "eq",
        "property_value": "yes"
    }
}

Expression indicators 

Similar to the boolean indicators - expression indicators leverage the expression structure defined above to create arbitrarily complex indicators. Expressions can store arbitrary values from documents (as opposed to boolean indicators which just store 0‘s and 1‘s). Because of this they require a few additional properties in the definition:

Property	Description
datatype	The datatype of the indicator. Current valid choices are: “date”, “datetime”, “string”, “decimal”, “integer”, and “small_integer”.
is_nullable	Whether the database column should allow null values.
is_primary_key	Whether the database column should be (part of?) the primary key. (TODO: this needs to be confirmed)
create_index	Creates an index on this column. Only applicable if using the SQL backend
expression	Any expression.
transform	(optional) transform to be applied to the result of the expression. (see “Report Columns > Transforms” section below)

Here is a sample expression indicator that just saves the “age” property to an integer column in the database:

{
    "type": "expression",
    "expression": {
        "type": "property_name",
        "property_name": "age"
    },
    "column_id": "age",
    "datatype": "integer",
    "display_name": "age of patient"
}

Choice list indicators 

Choice list indicators take a single choice column (select or multiselect) and expand it into multiple columns where each column represents a different choice. These can support both single-select and multi-select quesitons.

A sample spec is below:

{
    "type": "choice_list",
    "column_id": "col",
    "display_name": "the category",
    "property_name": "category",
    "choices": [
        "bug",
        "feature",
        "app",
        "schedule"
    ],
    "select_style": "single"
}

Ledger Balance Indicators 

Ledger Balance indicators take a list of product codes and a ledger section, and produce a column for each product code, saving the value found in the corresponding ledger.

Property	Description
ledger_section	The ledger section to use for this indicator, for example, “stock”
product_codes	A list of the products to include in the indicator. This will be used in conjunction with the `column_id` to produce each column name.
case_id_expression	An expression used to get the case where each ledger is found. If not specified, it will use the row’s doc id.

{
    "type": "ledger_balances",
    "column_id": "soh",
    "display_name": "Stock On Hand",
    "ledger_section": "stock",
    "product_codes": ["aspirin", "bandaids", "gauze"],
    "case_id_expression": {
        "type": "property_name",
        "property_name": "_id"
    }
}

This spec would produce the following columns in the data source:

soh_aspirin	soh_bandaids	soh_gauze
20	11	5
67	32	9

If the ledger you’re using is a due list and you wish to save the dates instead of integers, you can change the “type” from “ledger_balances” to “due_list_date”.

Practical notes for creating indicators 

These are some practical notes for how to choose what indicators to create.

Fractions 

All indicators output single values. Though fractional indicators are common, these should be modeled as two separate indicators (for numerator and denominator) and the relationship should be handled in the report UI config layer.

Saving Multiple Rows per Case/Form 

You can save multiple rows per case/form by specifying a root level base_item_expression that describes how to get the repeat data from the main document. You can also use the root_doc expression type to reference parent properties and the base_iteration_number expression type to reference the current index of the item. This can be combined with the iterator expression type to do complex data source transforms. This is not described in detail, but the following sample (which creates a table off of a repeat element called “time_logs” can be used as a guide). There are also additional examples in the practical examples:

{
    "domain": "user-reports",
    "doc_type": "DataSourceConfiguration",
    "referenced_doc_type": "XFormInstance",
    "table_id": "sample-repeat",
    "display_name": "Time Logged",
    "base_item_expression": {
        "type": "property_path",
        "property_path": ["form", "time_logs"]
    },
    "configured_filter": {
    },
    "configured_indicators": [
        {
            "type": "expression",
            "expression": {
                "type": "property_name",
                "property_name": "start_time"
            },
            "column_id": "start_time",
            "datatype": "datetime",
            "display_name": "start time"
        },
        {
            "type": "expression",
            "expression": {
                "type": "property_name",
                "property_name": "end_time"
            },
            "column_id": "end_time",
            "datatype": "datetime",
            "display_name": "end time"
        },
        {
            "type": "expression",
            "expression": {
                "type": "property_name",
                "property_name": "person"
            },
            "column_id": "person",
            "datatype": "string",
            "display_name": "person"
        },
        {
            "type": "expression",
            "expression": {
                "type": "root_doc",
                "expression": {
                    "type": "property_name",
                    "property_name": "name"
                }
            },
            "column_id": "name",
            "datatype": "string",
            "display_name": "name of ticket"
        }
    ]
}

Data Cleaning and Validation 

Note this is only available for “static” data sources that are created in the HQ repository.

When creating a data source it can be valuable to have strict validation on the type of data that can be inserted. The attribute validations at the top level of the configuration can use UCR expressions to determine if the data is invalid. If an expression is deemed invalid, then the relevant error is stored in the InvalidUCRData model.

{
    "domain": "user-reports",
    "doc_type": "DataSourceConfiguration",
    "referenced_doc_type": "XFormInstance",
    "table_id": "sample-repeat",
    "base_item_expression": {},
    "validations": [{
         "name": "is_starred_valid",
         "error_message": "is_starred has unexpected value",
         "expression": {
             "type": "boolean_expression",
             "expression": {
                 "type": "property_name",
                 "property_name": "is_starred"
             },
             "operator": "in",
             "property_value": ["yes", "no"]
         }
    }],
    "configured_filter": { },
    "configured_indicators": [ ]
}

Report Configurations 

A report configuration takes data from a data source and renders it in the UI. A report configuration consists of a few different sections:

Report Filters - These map to filters that show up in the UI, and should translate to queries that can be made to limit the returned data.
Aggregation - This defines what each row of the report will be. It is a list of columns forming the primary key of each row.
Report Columns - Columns define the report columns that show up from the data source, as well as any aggregation information needed.
Charts - Definition of charts to display on the report.
Sort Expression - How the rows in the report are ordered.
Distinct On - Pick distinct rows from result based on columns.

Samples 

Here are some sample configurations that can be used as a reference until we have better documentation.

Report Filters 

The documentation for report filters is still in progress. Apologies for brevity below.

A note about report filters versus data source filters

Report filters are completely different from data source filters. Data source filters limit the global set of data that ends up in the table, whereas report filters allow you to select values to limit the data returned by a query.

Numeric Filters 

Numeric filters allow users to filter the rows in the report by comparing a column to some constant that the user specifies when viewing the report. Numeric filters are only intended to be used with numeric (integer or decimal type) columns. Supported operators are =, ≠, <, ≤, >, and ≥.

ex:

{
  "type": "numeric",
  "slug": "number_of_children_slug",
  "field": "number_of_children",
  "display": "Number of Children"
}

Date filters 

Date filters allow you filter on a date. They will show a datepicker in the UI.

{
  "type": "date",
  "slug": "modified_on",
  "field": "modified_on",
  "display": "Modified on",
  "required": false
}

Date filters have an optional compare_as_string option that allows the date filter to be compared against an indicator of data type string. You shouldn’t ever need to use this option (make your column a date or datetime type instead), but it exists because the report builder needs it.

Quarter filters 

Quarter filters are similar to date filters, but a choice is restricted only to the particular quarter of the year. They will show inputs for year and quarter in the UI.

{
  "type": "quarter",
  "slug": "modified_on",
  "field": "modified_on",
  "display": "Modified on",
  "required": false
}

Pre-Filters 

Pre-filters offer the kind of functionality you get from data source filters. This makes it easier to use one data source for many reports, especially if some of those reports just need the data source to be filtered slightly differently. Pre-filters do not need to be configured by app builders in report modules; fields with pre-filters will not be listed in the report module among the other fields that can be filtered.

A pre-filter’s type is set to “pre”:

{
  "type": "pre",
  "field": "at_risk_field",
  "slug": "at_risk_slug",
  "datatype": "string",
  "pre_value": "yes"
}

If pre_value is scalar (i.e. datatype is “string”, “integer”, etc.), the filter will use the “equals” operator. If pre_value is null, the filter will use “is null”. If pre_value is an array, the filter will use the “in” operator. e.g.

{
  "type": "pre",
  "field": "at_risk_field",
  "slug": "at_risk_slug",
  "datatype": "array",
  "pre_value": ["yes", "maybe"]
}

(If pre_value is an array and datatype is not “array”, it is assumed that datatype refers to the data type of the items in the array.)

You can optionally specify the operator that the prevalue filter uses by adding a pre_operator argument. e.g.

{
  "type": "pre",
  "field": "at_risk_field",
  "slug": "at_risk_slug",
  "datatype": "array",
  "pre_value": ["maybe", "yes"],
  "pre_operator": "between"
}

Note that instead of using eq, gt, etc, you will need to use =, >, etc.

Dynamic choice lists 

Dynamic choice lists provide a select widget that will generate a list of options dynamically.

The default behavior is simply to show all possible values for a column, however you can also specify a choice_provider to customize this behavior (see below).

Simple example assuming “village” is a name:

{
  "type": "dynamic_choice_list",
  "slug": "village",
  "field": "village",
  "display": "Village",
  "datatype": "string"
}

Choice providers 

Currently the supported choice_providers are supported:

Field	Description
location	Select a location by name
user	Select a user
owner	Select a possible case owner owner (user, group, or location)

Location choice providers also support three additional configuration options:

“include_descendants” - Include descendants of the selected locations in the results. Defaults to false.
“show_full_path” - Display the full path to the location in the filter. Defaults to false. The default behavior shows all locations as a flat alphabetical list.
“location_type” - Includes locations of this type only. Default is to not filter on location type.

Example assuming “village” is a location ID, which is converted to names using the location choice_provider:

{
  "type": "dynamic_choice_list",
  "slug": "village",
  "field": "location_id",
  "display": "Village",
  "datatype": "string",
  "choice_provider": {
      "type": "location",
      "include_descendants": true,
      "show_full_path": true,
      "location_type": "district"
  }
}

Choice lists 

Choice lists allow manual configuration of a fixed, specified number of choices and let you change what they look like in the UI.

{
  "type": "choice_list",
  "slug": "role",
  "field": "role",
  "choices": [
    {"value": "doctor", "display": "Doctor"},
    {"value": "nurse"}
  ]
}

Drilldown by Location 

This filter allows selection of a location for filtering by drilling down from top level.

{
  "type": "location_drilldown",
  "slug": "by_location",
  "field": "district_id",
  "include_descendants": true,
  "max_drilldown_levels": 3
}

“include_descendants” - Include descendant locations in the results. Defaults to false.
“max_drilldown_levels” - Maximum allowed drilldown levels. Defaults to 99

Internationalization 

Report builders may specify translations for the filter display value. Also see the sections on internationalization in the Report Column and the translations transform.

{
    "type": "choice_list",
    "slug": "state",
    "display": {"en": "State", "fr": "État"},
}

Report Columns 

Reports are made up of columns. The currently supported column types ares:

field which represents a single value
percent which combines two values in to a percent
aggregate_date which aggregates data by month
expanded which expands a select question into multiple columns
expression which can do calculations on data in other columns

Field columns 

Field columns have a type of "field". Here’s an example field column that shows the owner name from an associated owner_id:

{
    "type": "field",
    "field": "owner_id",
    "column_id": "owner_id",
    "display": "Owner Name",
    "format": "default",
    "transform": {
        "type": "custom",
        "custom_type": "owner_display"
    },
    "aggregation": "simple"
}

Percent columns 

Percent columns have a type of "percent". They must specify a numerator and denominator as separate field columns. Here’s an example percent column that shows the percentage of pregnant women who had danger signs.

{
  "type": "percent",
  "column_id": "pct_danger_signs",
  "display": "Percent with Danger Signs",
  "format": "both",
  "denominator": {
    "type": "field",
    "aggregation": "sum",
    "field": "is_pregnant",
    "column_id": "is_pregnant"
  },
  "numerator": {
    "type": "field",
    "aggregation": "sum",
    "field": "has_danger_signs",
    "column_id": "has_danger_signs"
  }
}

Formats 

The following percentage formats are supported.

Format	Description	example
percent	A whole number percentage (the default format)	33%
fraction	A fraction	1/3
both	Percentage and fraction	33% (1/3)
numeric_percent	Percentage as a number	33
decimal	Fraction as a decimal number	.333

AggregateDateColumn 

AggregateDate columns allow for aggregating data by month over a given date field. They have a type of "aggregate_date". Unlike regular fields, you do not specify how aggregation happens, it is automatically grouped by month.

Here’s an example of an aggregate date column that aggregates the received_on property for each month (allowing you to count/sum things that happened in that month).

{
   "column_id": "received_on",
   "field": "received_on",
   "type": "aggregate_date",
   "display": "Month"
 }

AggregateDate supports an optional “format” parameter, which accepts the same format string as Date formatting. If you don’t specify a format, the default will be “%Y-%m”, which will show as, for example, “2008-09”.

Keep in mind that the only variables available for formatting are year and month, but that still gives you a fair range, e.g.

format	Example result
“%Y-%m”	“2008-09”
“%B, %Y”	“September, 2008”
“%b (%y)”	“Sep (08)”

IntegerBucketsColumn 

Bucket columns allow you to define a series of ranges with corresponding names, then group together rows where a specific field’s value falls within those ranges. These ranges are inclusive, since they are implemented using the between operator. It is the user’s responsibility to make sure the ranges do not overlap; if a value falls into multiple ranges, it is undefined behavior which bucket it will be assigned to.

Here’s an example that groups children based on their age at the time of registration:

{
    "display": "age_range",
    "column_id": "age_range",
    "type": "integer_buckets",
    "field": "age_at_registration",
    "ranges": {
         "infant": [0, 11],
         "toddler": [12, 35],
         "preschooler": [36, 60]
    },
    "else_": "older"
}

The "ranges" attribute maps conditional expressions to labels. If the field’s value does not fall into any of these ranges, the row will receive the "else_" value, if provided.

SumWhenColumn and SumWhenTemplateColumn 

Note: SumWhenColumn usage is limited to static reports, and SumWhenTemplateColumn usage is behind a feature flag.

Sum When columns allow you to aggregate data based on arbitrary conditions.

The SumWhenColumn allows any expression.

The SumWhenTemplateColumn is used in conjunction with a subclass of SumWhenTemplateSpec. The template defines an expression and typically accepts binds. An example:

Example using sum_when:

{
    "display": "under_six_month_olds",
    "column_id": "under_six_month_olds",
    "type": "sum_when",
    "field": "age_at_registration",
    "whens": [
         ["age_at_registration < 6", 1],
    ],
    "else_": 0
}

Equivalent example using sum_when_template:

{
    "display": "under_x_month_olds",
    "column_id": "under_x_month_olds",
    "type": "sum_when_template",
    "field": "age_at_registration",
    "whens": [
         {
             "type": "under_x_months",
             "binds": [6],
             "then": 1
         }
    ],
    "else_": 0
}

Expanded Columns 

Expanded columns have a type of "expanded". Expanded columns will be “expanded” into a new column for each distinct value in this column of the data source. For example:

If you have a data source like this:

+---------|----------|-------------+
| Patient | district | test_result |
+---------|----------|-------------+
| Joe     | North    | positive    |
| Bob     | North    | positive    |
| Fred    | South    | negative    |
+---------|----------|-------------+

and a report configuration like this:

aggregation columns:
["district"]

columns:
[
  {
    "type": "field",
    "field": "district",
    "column_id": "district",
    "format": "default",
    "aggregation": "simple"
  },
  {
    "type": "expanded",
    "field": "test_result",
    "column_id": "test_result",
    "format": "default"
  }
]

Then you will get a report like this:

+----------|----------------------|----------------------+
| district | test_result-positive | test_result-negative |
+----------|----------------------|----------------------+
| North    | 2                    | 0                    |
| South    | 0                    | 1                    |
+----------|----------------------|----------------------+

Expanded columns have an optional parameter "max_expansion" (defaults to 10) which limits the number of columns that can be created. WARNING: Only override the default if you are confident that there will be no adverse performance implications for the server.

Expression columns 

Expression columns can be used to do just-in-time calculations on the data coming out of reports. They allow you to use any UCR expression on the data in the report row. These can be referenced according to the column_ids from the other defined column. They can support advanced use cases like doing math on two different report columns, or doing conditional logic based on the contents of another column.

A simple example is below, which assumes another called “number” in the report and shows how you could make a column that is 10 times that column.

{
    "type": "expression",
    "column_id": "by_tens",
    "display": "Counting by tens",
    "expression": {
        "type": "evaluator",
        "statement": "a * b",
        "context_variables": {
            "a": {
                "type": "property_name",
                "property_name": "number"
            },
            "b": 10
        }
    }
}

Expression columns cannot be used in aggregations or filters. If you need to group by a derived value, then you must add that directly to your data source.

The “aggregation” column property 

The aggregation column property defines how the column should be aggregated. If the report is not doing any aggregation, or if the column is one of the aggregation columns this should always be "simple" (see Aggregation below for more information on aggregation).

The following table documents the other aggregation options, which can be used in aggregate reports.

Format	Description
simple	No aggregation
avg	Average (statistical mean) of the values
count_unique	Count the unique values found
count	Count all rows
min	Choose the minimum value
max	Choose the maximum value
sum	Sum the values

Column IDs 

Column IDs in percentage fields must be unique for the whole report. If you use a field in a normal column and in a percent column you must assign unique column_id values to it in order for the report to process both.

Calculating Column Totals 

To sum a column and include the result in a totals row at the bottom of the report, set the calculate_total value in the column configuration to true.

Not supported for the following column types: - expression

Internationalization 

Report columns can be translated into multiple languages. To translate values in a given column check out the translations transform below. To specify translations for a column header, use an object as the display value in the configuration instead of a string. For example:

{
    "type": "field",
    "field": "owner_id",
    "column_id": "owner_id",
    "display": {
        "en": "Owner Name",
        "he": "שם"
    },
    "format": "default",
    "transform": {
        "type": "custom",
        "custom_type": "owner_display"
    },
    "aggregation": "simple"
}

The value displayed to the user is determined as follows: - If a display value is specified for the users language, that value will appear in the report. - If the users language is not present, display the "en" value. - If "en" is not present, show an arbitrary translation from the display object. - If display is a string, and not an object, the report shows the string.

Valid display languages are any of the two or three letter language codes available on the user settings page.

Aggregation 

Aggregation in reports is done using a list of columns to aggregate on. This defines how indicator data will be aggregated into rows in the report. The columns represent what will be grouped in the report, and should be the column_ids of valid report columns. In most simple reports you will only have one level of aggregation. See examples below.

No aggregation 

Note that if you use is_primary_key in any of your columns, you must include all primary key columns here.

["doc_id"]

Aggregate by ‘username’ column 

["username"]

Aggregate by two columns 

["column1", "column2"]

Transforms 

Transforms can be used in two places - either to manipulate the value of a column just before it gets saved to a data source, or to transform the value returned by a column just before it reaches the user in a report. Here’s an example of a transform used in a report config ‘field’ column:

{
    "type": "field",
    "field": "owner_id",
    "column_id": "owner_id",
    "display": "Owner Name",
    "format": "default",
    "transform": {
        "type": "custom",
        "custom_type": "owner_display"
    },
    "aggregation": "simple"
}

The currently supported transform types are shown below:

Translations and arbitrary mappings 

The translations transform can be used to give human readable strings:

{
    "type": "translation",
    "translations": {
        "lmp": "Last Menstrual Period",
        "edd": "Estimated Date of Delivery"
    }
}

And for translations:

{
    "type": "translation",
    "translations": {
        "lmp": {
            "en": "Last Menstrual Period",
            "es": "Fecha Última Menstruación",
        },
        "edd": {
            "en": "Estimated Date of Delivery",
            "es": "Fecha Estimada de Parto",
        }
    }
}

To use this in a mobile ucr, set the 'mobile_or_web' property to 'mobile'

{
    "type": "translation",
    "mobile_or_web": "mobile",
    "translations": {
        "lmp": "Last Menstrual Period",
        "edd": "Estimated Date of Delivery"
    }
}

Displaying Readable User Name (instead of user ID)

This takes a user_id value and changes it to HQ’s best guess at the user’s display name, using their first and last name, if available, then falling back to their username.

{
    "type": "custom",
    "custom_type": "user_display_including_name"
}

Displaying username instead of user ID 

{
    "type": "custom",
    "custom_type": "user_display"
}

Displaying username minus @domain.commcarehq.org instead of user ID 

{
    "type": "custom",
    "custom_type": "user_without_domain_display"
}

Displaying owner name instead of owner ID 

{
    "type": "custom",
    "custom_type": "owner_display"
}

Displaying month name instead of month index 

{
    "type": "custom",
    "custom_type": "month_display"
}

Rounding decimals 

Rounds decimal and floating point numbers to two decimal places.

{
    "type": "custom",
    "custom_type": "short_decimal_display"
}

Generic number formatting 

Rounds numbers using Python’s built in formatting.

See below for a few simple examples. Read the docs for complex ones. The input to the format string will be a number not a string.

If the format string is not valid or the input is not a number then the original input will be returned.

Round to the nearest whole number 

{
    "type": "number_format",
    "format_string": "{0:.0f}"
}

Rich text formatting with Markdown 

This can be used to do some rich text formatting, using [Markdown](https://www.markdownguide.org/).

There is no configuration required, it will assume the input is valid, markdown-ready text.

{
    "type": "markdown"
}

This transform works for report columns only. Using it in a data source will add HTML markup, but it will not be displayed properly in HQ.

Always round to 3 decimal places 

{
    "type": "number_format",
    "format_string": "{0:.3f}"
}

Date formatting 

Formats dates with the given format string. See here for an explanation of format string behavior. If there is an error formatting the date, the transform is not applied to that value.

{
   "type": "date_format",
   "format": "%Y-%m-%d %H:%M"
}

Converting an ethiopian date string to a gregorian date 

Converts a string in the YYYY-MM-DD format to a gregorian date. For example, 2009-09-11 is converted to date(2017, 5, 19). If it is unable to convert the date, it will return an empty string.

{
   "type": "custom",
   "custom_type": "ethiopian_date_to_gregorian_date"
}

Converting a gregorian date string to an ethiopian date 

Converts a string in the YYYY-MM-DD format to an ethiopian date. For example, 2017-05-19 is converted to date(2009, 09, 11). If it is unable to convert the date, it will return an empty string.

{
   "type": "custom",
   "custom_type": "gregorian_date_to_ethiopian_date"
}

Charts 

There are currently three types of charts supported. Pie charts, and two types of bar charts.

Pie charts 

A pie chart takes two inputs and makes a pie chart. Here are the inputs:

Field	Description
aggregation_colu mn	The column you want to group - typically a column from a select question
value_column	The column you want to sum - often just a count

Here’s a sample spec:

{
    "type": "pie",
    "title": "Remote status",
    "aggregation_column": "remote",
    "value_column": "count"
}

Aggregate multibar charts 

An aggregate multibar chart is used to aggregate across two columns (typically both of which are select questions). It takes three inputs:

Field	Description
primary_aggregation	The primary aggregation. These will be the x-axis on the chart.
secondary_aggregati on	The secondary aggregation. These will be the slices of the bar (or individual bars in “grouped” format)
value_column	The column you want to sum - often just a count

Here’s a sample spec:

{
    "type": "multibar-aggregate",
    "title": "Applicants by type and location",
    "primary_aggregation": "remote",
    "secondary_aggregation": "applicant_type",
    "value_column": "count"
}

Multibar charts 

A multibar chart takes a single x-axis column (typically a user, date, or select question) and any number of y-axis columns (typically indicators or counts) and makes a bar chart from them.

Field	Description
x_axis_column	This will be the x-axis on the chart.
y_axis_columns	These are the columns to use for the secondary axis. These will be the slices of the bar (or individual bars in “grouped” format).

Here’s a sample spec:

{
    "type": "multibar",
    "title": "HIV Mismatch by Clinic",
    "x_axis_column": "clinic",
    "y_axis_columns": [
        {
            "column_id": "diagnoses_match_no",
            "display": "No match"
        },
        {
            "column_id": "diagnoses_match_yes",
            "display": "Match"
        }
    ]
}

Sort Expression 

A sort order for the report rows can be specified. Multiple fields, in either ascending or descending order, may be specified. Example:

Field should refer to report column IDs, not database fields.

[
  {
    "field": "district",
    "order": "DESC"
  },
  {
    "field": "date_of_data_collection",
    "order": "ASC"
  }
]

Distinct On 

Can be used to limit the rows in a report based on a single column or set of columns. The top most row is picked in case of duplicates.

This is different from aggregation in sense that this is done after fetching the rows, whereas aggregation is done before selecting the rows.

This is used in combination with a sort expression to have predictable results.

Please note that the columns used in distinct on clause should also be present in the sort expression as the first set of columns in the same order.

Pick distinct by a single column 

Sort expression should have column1 and then other columns if needed

[
  {
    "field": "column1",
    "order": "DESC"
  },
  {
    "field": "column2",
    "order": "ASC"
  }
]

and distinct on would be

["column1"]

Pick distinct result based on two columns 

Sort expression should have column1 and column2 in same order, More columns can be added after these if needed

[
  {
    "field": "column1",
    "order": "DESC"
  },
  {
    "field": "column2",
    "order": "ASC"
  }
]

and distinct on would be

["column1", "column2"]

Mobile UCR 

Mobile UCR is a beta feature that enables you to make application modules and charts linked to UCRs on mobile. It also allows you to send down UCR data from a report as a fixture which can be used in standard case lists and forms throughout the mobile application.

The documentation for Mobile UCR is very sparse right now.

Filters 

On mobile UCR, filters can be automatically applied to the mobile reports based on hardcoded or user-specific data, or can be displayed to the user.

The documentation of mobile UCR filters is incomplete. However some are documented below.

Custom Calendar Month 

When configuring a report within a module, you can filter a date field by the ‘CustomMonthFilter’. The choice includes the following options: - Start of Month (a number between 1 and 28) - Period (a number between 0 and n with 0 representing the current month).

Each custom calendar month will be “Start of the Month” to (“Start of the Month” - 1). For example, if the start of the month is set to 21, then the period will be the 21th of the month -> 20th of the next month.

Examples: Assume it was May 15: Period 0, day 21, you would sync April 21-May 15th Period 1, day 21, you would sync March 21-April 20th Period 2, day 21, you would sync February 21 -March 20th

Assume it was May 20: Period 0, day 21, you would sync April 21-May 20th Period 1, day 21, you would sync March 21-April 20th Period 2, day 21, you would sync February 21-March 20th

Assume it was May 21: Period 0, day 21, you would sync May 21-May 21th Period 1, day 21, you would sync April 21-May 20th Period 2, day 21, you would sync March 21-April 20th

Export 

A UCR data source can be exported, to back an excel dashboard, for instance. The URL for exporting data takes the form https://www.commcarehq.org/a/[domain]/configurable_reports/data_sources/export/[data source id]/

The export supports a “$format” parameter which can be any of the following options: html, csv, xlsx, xls. The default format is csv.

This export can also be filtered to restrict the results returned. The filtering options are all based on the field names:

URL parameter	Value	Description
{field_name}	{exact value}	require an exact match
{field_name}-range	{start}..{end}	return results in range
{field_name}-lastndays	{number}	restrict to the last n days

This is configured in export_data_source and tested in test_export. It should be pretty straightforward to add support for additional filter types.

Export example 

Let’s say you want to restrict the results to only cases owned by a particular user, opened in the last 90 days, and with a child between 12 and 24 months old as an xlsx file. The querystring might look like this:

?$format=xlsx&owner_id=48l069n24myxk08hl563&opened_on-lastndays=90&child_age-range=12..24

Practical Notes 

Some rough notes for working with user configurable reports.

Getting Started 

The easiest way to get started is to start with sample data and reports.

Create a simple app and submit a few forms. You can then use report builder to create a report. Start at a/DOMAIN/reports/builder/select_source/ and create a report based on your form, either a form list or form summary.

When your report is created, clicking “Edit” will bring you to the report builder editor. An individual report can be viewed in the UCR editor by changing the report builder URL, /a/DOMAIN/reports/builder/edit/REPORT_ID/ to the UCR URL, /a/DOMAIN/configurable_reports/reports/edit/REPORT_ID/. In this view, you can examine the columns, filters, and aggregation columns that report builder created.

The UCR config UI also includes pages to add new data sources, imports reports, etc., all based at /a/DOMAIN/configurable_reports/. If you add a new report via the UCR UI and copy in the columns, filters, etc. from a report builder report, that new report will then automatically open in the UCR UI when you edit it. You can also take an existing report builder report and set my_report.report_meta.created_by_builder to false to force it to open in the UCR UI in the future.

Two example UCRs, a case-based UCR for the dimagi domain and a form-based UCR for the gsid domain, are checked into source code. Their data source specs and report specs are in corehq/apps/userreports/examples/.

The tests are also a good source of documentation for the various filter and indicator formats that are supported.

When editing data sources, you can check the progress of rebuilding using my_datasource.meta.build.finished

Static data sources 

As well as being able to define data sources via the UI which are stored in the database you can also define static data sources which live as JSON documents in the source repository.

These are mainly useful for custom reports.

They conform to a slightly different style:

{
    "domains": ["live-domain", "test-domain"],
    "config": {
        ... put the normal data source configuration here
    }
}

Having defined the data source you need to use the static_ucr_data_source_paths extension point to make CommCare aware of your data source. Now when the static data source pillow is run it will pick up the data source and rebuild it.

Alternatively, the legacy method is to add the path to the data source file to the STATIC_DATA_SOURCES setting in settings.py.

Changes to the data source require restarting the pillow which will rebuild the SQL table. Alternately you can use the UI to rebuild the data source (requires Celery to be running).

Static configurable reports 

Configurable reports can also be defined in the source repository. Static configurable reports have the following style:

{
    "domains": ["my-domain"],
    "data_source_table": "my_table",
    "report_id": "my-report",
    "config": {
        ... put the normal report configuration here
    }
}

Having defined the report you need to use the static_ucr_report_paths extension point to make CommCare aware of your report.

Alternatively, the legacy method is to add the path to the data source file to the STATIC_UCR_REPORTS setting in settings.py.

Custom configurable reports 

Sometimes a client’s needs for a rendered report are outside of the scope of the framework. To render the report using a custom Django template or with custom Excel formatting, define a subclass of ConfigurableReportView and override the necessary functions. Then include the python path to the class in the field custom_configurable_report of the static report and don’t forget to include the static report in STATIC_DATA_SOURCES in settings.py.

Extending User Configurable Reports 

When building a custom report for a client, you may find that you want to extend UCR with custom functionality. The UCR framework allows developers to write custom expressions, and register them with the framework. To do so:

Define a function that returns an expression object

def custom_expression(spec, evaluation_context):
    ...

Extend the custom_ucr_expressions extension point:

from corehq.apps.userreports.extension_points import custom_ucr_expressions

@custom_ucr_expressions.extend()
def ucr_expressions():
    return [
        ('expression_name', 'path.to.custom_expression'),
    ]

Scaling UCR 

Profiling data sources 

You can use ./manage.py profile_data_source <domain> <data source id> <doc id> to profile a datasource on a particular doc. It will give you information such as functions that take the longest and number of database queries it initiates.

Faster Reporting 

If reports are slow, then you can add create_index to the data source to any columns that have filters applied to them.

Asynchronous Indicators 

If you have an expensive data source and the changes come in faster than the pillow can process them, you can specify asynchronous: true in the data source. This flag puts the document id in an intermediary table when a change happens which is later processed by a celery queue. If multiple changes are submitted before this can be processed, a new entry is not created, so it will be processed once. This moves the bottle neck from kafka/pillows to celery.

The main benefit of this is that documents will be processed only once even if many changes come in at a time. This makes this approach ideal datasources that don’t require ‘live’ data or where the source documents change very frequently.

It is also possible achieve greater parallelization than is currently available via pillows since multiple Celery workers can process the changes.

A diagram of this workflow can be found here

Inspecting database tables 

The easiest way to inspect the database tables is to use the sql command line utility.

This can be done by runnning ./manage.py dbshell or using psql.

The naming convention for tables is: config_report_[domain name]_[table id]_[hash].

In postgres, you can see all tables by typing \dt and use sql commands to inspect the appropriate tables.