User Configurable Reporting
An overview of the design, API and data structures used here.
The docs on reporting, pillows, and change feeds, are useful background.
Data Flow
Reporting is handled in multiple stages. Here is the basic workflow.
Raw data (form or case) → [Data source config] → Row in database table → [Report config] → Report in HQ
Both the data source config and report config are JSON documents that live in the database. The data source config determines how raw data (forms and cases) gets mapped to rows in an intermediary table, while the report config(s) determine how that report table gets turned into an interactive report in the UI.
A UCR table is created when a new data source is created.
The table’s structure is updated whenever the UCR is “rebuilt”, which happens when the data source config is edited.
Rebuilds can also be kicked off manually via either rebuild_indicator_table
or the UI.
Rebuilding happens asynchronously. Data in the table is refreshed continuously by pillows.
Data Sources
Each data source configuration maps a filtered set of the raw data to indicators. A data source configuration consists of two primary sections:
A filter that determines whether the data is relevant for the data source
A list of indicators in that data source
In addition to these properties there are a number of relatively
self-explanatory fields on a data source such as the table_id
and
display_name
, and a few more nuanced ones. The full list of
available fields is summarized in the following table:
Field |
Description |
---|---|
filter |
Determines whether the data is relevant for the data source |
indicators |
List of indicators to save |
table_id |
A unique ID for the table |
display_name |
A display name for the table that shows up in UIs |
base_item_expression |
Used for making tables off of repeat or list data |
named_expressions |
A list of named expressions that can be referenced in other filters and indicators |
named_filters |
A list of named filters that can be referenced in other filters and indicators |
Data Source Filtering
When setting up a data source configuration, filtering defines what data applies to a given set of indicators. Some example uses of filtering on a data source include:
Restricting the data by document type (e.g. cases or forms). This is a built-in filter.
Limiting the data to a particular case or form type
Excluding demo user data
Excluding closed cases
Only showing data that meets a domain-specific condition (e.g. pregnancy cases opened for women over 30 years of age)
Filter type overview
There are currently four supported filter types. However, these can be used together to produce arbitrarily complicated expressions.
Filter Type |
Description |
---|---|
boolean_expression |
A expression / logic statement (more below) |
and |
An “and” of other filters - true if all are true |
or |
An “or” of other filters - true if any are true |
not |
A “not” or inverse expression on a filter |
To understand the boolean_expression
type, we must first explain
expressions.
Expressions
An expression is a way of representing a set of operations that should return a single value. Expressions can basically be thought of as functions that take in a document and return a value:
Expression: function(document) → value
In normal math/python notation, the following are all valid expressions
on a doc
(which is presumed to be a dict
object:
"hello"
7
doc["name"]
doc["child"]["age"]
doc["age"] < 21
"legal" if doc["age"] > 21 else "underage"
In user configurable reports the following expression types are currently supported (note that this can and likely will be extended in the future):
Expression Type |
Description |
Example |
---|---|---|
identity |
Just returns whatever is passed in |
|
constant |
A constant |
|
property_name |
A reference to the property in a document |
|
property_path |
A nested reference to a property in a document |
|
conditional |
An if/else expression |
|
switch |
A switch statement |
|
array_index |
An index into an array |
|
split_string |
Splitting a string and grabbing a specific element from it by index |
|
iterator |
Combine multiple expressions into a list |
|
related_doc |
A way to reference something in another document |
|
root_doc |
A way to reference the root document explicitly (only needed when making a data source from repeat/child data) |
|
ancestor_location |
A way to retrieve the ancestor of a particular type from a location |
|
nested |
A way to chain any two expressions together |
|
dict |
A way to emit a dictionary of key/value pairs |
|
add_days |
A way to add days to a date |
|
add_months |
A way to add months to a date |
|
month_start_date |
First day in the month of a date |
|
month_end_date |
Last day in the month of a date |
|
diff_days |
A way to get duration in days between two dates |
|
evaluator |
A way to do arithmetic operations |
|
base_iteration_number |
Used with
|
|
Following expressions act on a list of objects or a list of lists (for e.g. on a repeat list) and return another list or value. These expressions can be combined to do complex aggregations on list data.
Expression Type |
Description |
Example |
---|---|---|
filter_items |
Filter a list of items to make a new list |
|
map_items |
Map one list to another list |
|
sort_items |
Sort a list based on an expression |
|
reduce_items |
Aggregate a list of items into one value |
sum on
|
flatten |
Flatten multiple lists of items into one list |
|
JSON snippets for expressions
Here are JSON snippets for the various expression types. Hopefully they are self-explanatory.
Constant Expression
- class corehq.apps.userreports.expressions.specs.ConstantGetterSpec(_obj=None, **kwargs)[source]
There are two formats for constant expressions. The simplified format is simply the constant itself. For example
"hello"
, or5
.The complete format is as follows. This expression returns the constant
"hello"
:{ "type": "constant", "constant": "hello" }
Property Name Expression
- class corehq.apps.userreports.expressions.specs.PropertyNameGetterSpec(_obj=None, **kwargs)[source]
This expression returns
doc["age"]
:{ "type": "property_name", "property_name": "age" }
An optional
"datatype"
attribute may be specified, which will attempt to cast the property to the given data type. The options are “date”, “datetime”, “string”, “integer”, and “decimal”. If no datatype is specified, “string” will be used.
Property Path Expression
- class corehq.apps.userreports.expressions.specs.PropertyPathGetterSpec(_obj=None, **kwargs)[source]
This expression returns
doc["child"]["age"]
:{ "type": "property_path", "property_path": ["child", "age"] }
An optional
"datatype"
attribute may be specified, which will attempt to cast the property to the given data type. The options are “date”, “datetime”, “string”, “integer”, and “decimal”. If no datatype is specified, “string” will be used.
Jsonpath Expression
- class corehq.apps.userreports.expressions.specs.JsonpathExpressionSpec(_obj=None, **kwargs)[source]
This will execute the jsonpath expression against the current doc and emit the result.
{ "type": "jsonpath", "jsonpath": "form..case.name", }
Given the following doc:
{ "form": { "case": {"name": "a"}, "nested": { "case": {"name": "b"}, }, "list": [ {"case": {"name": "c"}}, { "nested": { "case": {"name": "d"} } } ] } }
This above expression will evaluate to
["a", "b", "c", "d"]
. Another example isform.list[0].case.name
which will evaluate to"c"
.See also the
jsonpath
evaluator function.For more information consult the following resources:
Conditional Expression
- class corehq.apps.userreports.expressions.specs.ConditionalExpressionSpec(_obj=None, **kwargs)[source]
This expression returns
"legal" if doc["age"] > 21 else "underage"
:{ "type": "conditional", "test": { "operator": "gt", "expression": { "type": "property_name", "property_name": "age", "datatype": "integer" }, "type": "boolean_expression", "property_value": 21 }, "expression_if_true": { "type": "constant", "constant": "legal" }, "expression_if_false": { "type": "constant", "constant": "underage" } }
Note that this expression contains other expressions inside it! This is why expressions are powerful. (It also contains a filter, but we haven’t covered those yet - if you find the
"test"
section confusing, keep reading…)Note also that it’s important to make sure that you are comparing values of the same type. In this example, the expression that retrieves the age property from the document also casts the value to an integer. If this datatype is not specified, the expression will compare a string to the
21
value, which will not produce the expected results!
Switch Expression
- class corehq.apps.userreports.expressions.specs.SwitchExpressionSpec(_obj=None, **kwargs)[source]
This expression returns the value of the expression for the case that matches the switch on expression. Note that case values may only be strings at this time.
{ "type": "switch", "switch_on": { "type": "property_name", "property_name": "district" }, "cases": { "north": { "type": "constant", "constant": 4000 }, "south": { "type": "constant", "constant": 2500 }, "east": { "type": "constant", "constant": 3300 }, "west": { "type": "constant", "constant": 65 }, }, "default": { "type": "constant", "constant": 0 } }
Coalesce Expression
- class corehq.apps.userreports.expressions.specs.CoalesceExpressionSpec(_obj=None, **kwargs)[source]
This expression returns the value of the expression provided, or the value of the default_expression if the expression provided evaluates to a null or blank string.
{ "type": "coalesce", "expression": { "type": "property_name", "property_name": "district" }, "default_expression": { "type": "constant", "constant": "default_district" } }
Array Index Expression
- class corehq.apps.userreports.expressions.specs.ArrayIndexExpressionSpec(_obj=None, **kwargs)[source]
This expression returns
doc["siblings"][0]
:{ "type": "array_index", "array_expression": { "type": "property_name", "property_name": "siblings" }, "index_expression": { "type": "constant", "constant": 0 } }
It will return nothing if the siblings property is not a list, the index isn’t a number, or the indexed item doesn’t exist.
Split String Expression
- class corehq.apps.userreports.expressions.specs.SplitStringExpressionSpec(_obj=None, **kwargs)[source]
This expression returns
(doc["foo bar"]).split(' ')[0]
:{ "type": "split_string", "string_expression": { "type": "property_name", "property_name": "multiple_value_string" }, "index_expression": { "type": "constant", "constant": 0 }, "delimiter": "," }
The delimiter is optional and is defaulted to a space. It will return nothing if the string_expression is not a string, or if the index isn’t a number or the indexed item doesn’t exist. The index_expression is also optional. Without it, the expression will return the list of elements.
Iterator Expression
- class corehq.apps.userreports.expressions.specs.IteratorExpressionSpec(_obj=None, **kwargs)[source]
{ "type": "iterator", "expressions": [ { "type": "property_name", "property_name": "p1" }, { "type": "property_name", "property_name": "p2" }, { "type": "property_name", "property_name": "p3" }, ], "test": {} }
This will emit
[doc.p1, doc.p2, doc.p3]
. You can add atest
attribute to filter rows from what is emitted - if you don’t specify this then the iterator will include one row per expression it contains regardless of what is passed in. This can be used/combined with thebase_item_expression
to emit multiple rows per document.
Base iteration number expressions
Ancestor location expression
- class corehq.apps.locations.ucr_expressions.AncestorLocationExpression(_obj=None, **kwargs)[source]
This is used to return a json object representing the ancestor of the given type of the given location. For instance, if we had locations configured with a hierarchy like
country -> state -> county -> city
, we could pass the location id of Cambridge and a location type of state to this expression to get the Massachusetts location.{ "type": "ancestor_location", "location_id": { "type": "property_name", "name": "owner_id" }, "location_type": { "type": "constant", "constant": "state" } }
If no such location exists, returns null.
Optionally you can specifiy
location_property
to return a single property of the location.{ "type": "ancestor_location", "location_id": { "type": "property_name", "name": "owner_id" }, "location_type": { "type": "constant", "constant": "state" }, "location_property": "site_code" }
Nested expressions
- class corehq.apps.userreports.expressions.specs.NestedExpressionSpec(_obj=None, **kwargs)[source]
These can be used to nest expressions. This can be used, e.g. to pull a specific property out of an item in a list of objects.
The following nested expression is the equivalent of a
property_path
expression to["outer", "inner"]
and demonstrates the functionality. More examples can be found in the practical examples.{ "type": "nested", "argument_expression": { "type": "property_name", "property_name": "outer" }, "value_expression": { "type": "property_name", "property_name": "inner" } }
Dict expressions
- class corehq.apps.userreports.expressions.specs.DictExpressionSpec(_obj=None, **kwargs)[source]
These can be used to create dictionaries of key/value pairs. This is only useful as an intermediate structure in another expression since the result of the expression is a dictionary that cannot be saved to the database.
See the practical examples for a way this can be used in a
base_item_expression
to emit multiple rows for a single form/case based on different properties.Here is a simple example that demonstrates the structure. The keys of
properties
must be text, and the values must be valid expressions (or constants):{ "type": "dict", "properties": { "name": "a constant name", "value": { "type": "property_name", "property_name": "prop" }, "value2": { "type": "property_name", "property_name": "prop2" } } }
“Add Days” expressions
- class corehq.apps.userreports.expressions.date_specs.AddDaysExpressionSpec(_obj=None, **kwargs)[source]
Below is a simple example that demonstrates the structure. The expression below will add 28 days to a property called “dob”. The date_expression and count_expression can be any valid expressions, or simply constants.
{ "type": "add_days", "date_expression": { "type": "property_name", "property_name": "dob", }, "count_expression": 28 }
“Add Hours” expressions
- class corehq.apps.userreports.expressions.date_specs.AddHoursExpressionSpec(_obj=None, **kwargs)[source]
Below is a simple example that demonstrates the structure. The expression below will add 12 hours to a property called “visit_date”. The date_expression and count_expression can be any valid expressions, or simply constants.
{ "type": "add_hours", "date_expression": { "type": "property_name", "property_name": "visit_date", }, "count_expression": 12 }
“Add Months” expressions
- class corehq.apps.userreports.expressions.date_specs.AddMonthsExpressionSpec(_obj=None, **kwargs)[source]
add_months
offsets given date by given number of calendar months. If offset results in an invalid day (for e.g. Feb 30, April 31), the day of resulting date will be adjusted to last day of the resulting calendar month.The date_expression and months_expression can be any valid expressions, or simply constants, including negative numbers.
{ "type": "add_months", "date_expression": { "type": "property_name", "property_name": "dob", }, "months_expression": 28 }
“Diff Days” expressions
- class corehq.apps.userreports.expressions.date_specs.DiffDaysExpressionSpec(_obj=None, **kwargs)[source]
diff_days
returns number of days between dates specified byfrom_date_expression
andto_date_expression
. The from_date_expression and to_date_expression can be any valid expressions, or simply constants.{ "type": "diff_days", "from_date_expression": { "type": "property_name", "property_name": "dob", }, "to_date_expression": "2016-02-01" }
“Month Start Date” and “Month End Date” expressions
- class corehq.apps.userreports.expressions.date_specs.MonthStartDateExpressionSpec(_obj=None, **kwargs)[source]
month_start_date
returns date of first day in the month of given date andmonth_end_date
returns date of last day in the month of given date.The
date_expression
can be any valid expression, or simply constant{ "type": "month_start_date", "date_expression": { "type": "property_name", "property_name": "dob", }, }
“Evaluator” expression
- class corehq.apps.userreports.expressions.specs.EvalExpressionSpec(_obj=None, **kwargs)[source]
evaluator
expression can be used to evaluate statements that contain arithmetic (and simple python like statements). It evaluates the statement specified bystatement
which can contain variables as defined incontext_variables
.{ "type": "evaluator", "statement": "a + b - c + 6", "context_variables": { "a": 1, "b": 20, "c": { "type": "property_name", "property_name": "the_number_two" } } }
This returns 25 (1 + 20 - 2 + 6).
statement
The expression statement to be evaluated.
context_variables
A dictionary of Expressions where keys are names of variables used in the
statement
and values are expressions to generate those variables.Variable types must be one of:
str
int
float
bool
date
datetime
If
context_variables
is omitted, the current context of the expression will be used.Expression limitations
Only a single expression is permitted.
Available operators:
math operators (except the power operator)
In addition, expressions can perform the following operations:
index: case[‘name’]
slice: cases[0:2]
if statements: 1 if case.name == ‘bob’ else 0
list comprehension: [i for i in range(3)]
dict, list, set construction: {“a”: 1, “b”: set(cases), “c”: list(range(4))}
Available Functions
Only the following functions are available in the evaluation context:
- context()
Get the current evaluation context. See also
root_context()
.
- date(value, fmt=None)
Parse a string value as a date or timestamp. If
fmt
is not supplied the string is assumed to be in ISO 8601 format.- Parameters:
fmt – If supplied, use this format specification to parse the date. See the Python documentation for Format Codes.
- float(value)
Convert
value
to a floating point number.
- int(value)
Convert
value
to an int. Value can be a number or a string representation of a number.
- jsonpath(expr, as_list=False, context=None)
Evaluate a jsonpath expression.
See also Jsonpath Expression.
jsonpath("form.case.name") jsonpath("name", context=jsonpath("form.case")) jsonpath("form..case", as_list=True)
- Parameters:
expr – The jsonpath expression.
as_list – When set to True, always return the full list of matches, even if it is emtpy. If set to False then the return value will be None if no matches are found. If a single match is found the matched value will be returned. If more than one match is found, they will all be returned as a list.
context – Optional context for evaluation. If not supplied the full context of the evaluator will be used.
- Returns:
See as_list.
- named(name, context=None)
Call a named expression. See also Named Expressions.
named("my-named-expression") named("my-named-expression", context=form.case)
- rand()
Generate a random number between 0 and 1
- randint(max)
Generate a random integer between 0 and
max
- range(start[, stop][, skip])
Produces a sequence of integers from start (inclusive) to stop (exclusive) by step. Note that for performance reasons this is limited to 100 items or less. See range.
- root_context()
Get the root context of the evaluation. Similar to the
root_doc
expression.See also
context()
.
- round(value, ndigits=None)
Round a number to the nearest integer or
ndigits
after the decimal point. See round.
- str(value)
Convert
value
to a string.
- timedelta_to_seconds(delta)
Convert a TimeDelta object into seconds. This is useful for getting the number of seconds between two dates.
timedelta_to_seconds(time_end - time_start)
- today()
Return the current UTC date.
See also Evaluator Examples.
‘Get Case Sharing Groups’ expression
- class corehq.apps.userreports.expressions.specs.CaseSharingGroupsExpressionSpec(_obj=None, **kwargs)[source]
get_case_sharing_groups
will return an array of the case sharing groups that are assigned to a provided user ID. The array will contain one document per case sharing group.{ "type": "get_case_sharing_groups", "user_id_expression": { "type": "property_path", "property_path": ["form", "meta", "userID"] } }
‘Get Reporting Groups’ expression
- class corehq.apps.userreports.expressions.specs.ReportingGroupsExpressionSpec(_obj=None, **kwargs)[source]
get_reporting_groups
will return an array of the reporting groups that are assigned to a provided user ID. The array will contain one document per reporting group.{ "type": "get_reporting_groups", "user_id_expression": { "type": "property_path", "property_path": ["form", "meta", "userID"] } }
Filter, Sort, Map and Reduce Expressions
We have following expressions that act on a list of objects or list of
lists. The list to operate on is specified by items_expression
. This
can be any valid expression that returns a list. If the
items_expression
doesn’t return a valid list, these might either
fail or return one of empty list or None
value.
map_items Expression
- class corehq.apps.userreports.expressions.list_specs.MapItemsExpressionSpec(_obj=None, **kwargs)[source]
map_items
performs a calculation specified bymap_expression
on each item of the list specified byitems_expression
and returns a list of the calculation results. Themap_expression
is evaluated relative to each item in the list and not relative to the parent document from which the list is specified. For e.g. ifitems_expression
is a path to repeat-list of children in a form document,map_expression
is a path relative to the repeat item.items_expression
can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify adatatype
ofarray
if the expression could return a single element.map_expression
can be any valid expression relative to the items in above list.{ "type": "map_items", "items_expression": { "datatype": "array", "type": "property_path", "property_path": ["form", "child_repeat"] }, "map_expression": { "type": "property_path", "property_path": ["age"] } }
Above returns list of ages. Note that the
property_path
inmap_expression
is relative to the repeat item rather than to the form.
filter_items Expression
- class corehq.apps.userreports.expressions.list_specs.FilterItemsExpressionSpec(_obj=None, **kwargs)[source]
filter_items
performs filtering on given list and returns a new list. If the boolean expression specified byfilter_expression
evaluates to aTrue
value, the item is included in the new list and if not, is not included in the new list.items_expression
can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify adatatype
ofarray
if the expression could return a single element.filter_expression
can be any valid boolean expression relative to the items in above list.{ "type": "filter_items", "items_expression": { "datatype": "array", "type": "property_name", "property_name": "family_repeat" }, "filter_expression": { "type": "boolean_expression", "expression": { "type": "property_name", "property_name": "gender" }, "operator": "eq", "property_value": "female" } }
sort_items Expression
- class corehq.apps.userreports.expressions.list_specs.SortItemsExpressionSpec(_obj=None, **kwargs)[source]
sort_items
returns a sorted list of items based on sort value of each item.The sort value of an item is specified bysort_expression
. By default, list will be in ascending order. Order can be changed by adding optionalorder
expression with one ofDESC
(for descending) orASC
(for ascending) If a sort-value of an item isNone
, the item will appear in the start of list. If sort-values of any two items can’t be compared, an empty list is returned.items_expression
can be any valid expression that returns a list. If this doesn’t evaluate to a list an empty list is returned. It may be necessary to specify adatatype
ofarray
if the expression could return a single element.sort_expression
can be any valid expression relative to the items in above list, that returns a value to be used as sort value.{ "type": "sort_items", "items_expression": { "datatype": "array", "type": "property_path", "property_path": ["form", "child_repeat"] }, "sort_expression": { "type": "property_path", "property_path": ["age"] } }
reduce_items Expression
- class corehq.apps.userreports.expressions.list_specs.ReduceItemsExpressionSpec(_obj=None, **kwargs)[source]
reduce_items
returns aggregate value of the list specified byaggregation_fn
.items_expression
can be any valid expression that returns a list. If this doesn’t evaluate to a list,aggregation_fn
will be applied on an empty list. It may be necessary to specify adatatype
ofarray
if the expression could return a single element.aggregation_fn
is one of following supported functions names.Function Name
Example
count
['a', 'b']
-> 2sum
[1, 2, 4]
-> 7min
[2, 5, 1]
-> 1max
[2, 5, 1]
-> 5first_item
['a', 'b']
-> ‘a’last_item
['a', 'b']
-> ‘b’join
['a', 'b']
-> ‘ab’{ "type": "reduce_items", "items_expression": { "datatype": "array", "type": "property_name", "property_name": "family_repeat" }, "aggregation_fn": "count" }
This returns number of family members
flatten expression
- class corehq.apps.userreports.expressions.list_specs.FlattenExpressionSpec(_obj=None, **kwargs)[source]
flatten
takes list of list of objects specified byitems_expression
and returns one list of all objects.items_expression
is any valid expression that returns a list of lists. It this doesn’t evaluate to a list of lists an empty list is returned. It may be necessary to specify adatatype
ofarray
if the expression could return a single element.{ "type": "flatten", "items_expression": {}, }
Named Expressions
- class corehq.apps.userreports.expressions.specs.NamedExpressionSpec(_obj=None, **kwargs)[source]
Last, but certainly not least, are named expressions. These are special expressions that can be defined once in a data source and then used throughout other filters and indicators in that data source. This allows you to write out a very complicated expression a single time, but still use it in multiple places with a simple syntax.
Named expressions are defined in a special section of the data source. To reference a named expression, you just specify the type of
"named"
and the name as follows:{ "type": "named", "name": "my_expression" }
This assumes that your named expression section of your data source includes a snippet like the following:
{ "my_expression": { "type": "property_name", "property_name": "test" } }
This is just a simple example - the value that
"my_expression"
takes on can be as complicated as you want and it can also reference other named expressions as long as it doesn’t reference itself of create a recursive cycle.See also the
named
evaluator function.
Boolean Expression Filters
A boolean_expression
filter combines an expression, an operator,
and a property value (a constant), to produce a statement that is
either True
or False
. Note: in the future the constant value
may be replaced with a second expression to be more general, however
currently only constant property values are supported.
Here is a sample JSON format for simple boolean_expression
filter:
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "age",
"datatype": "integer"
},
"operator": "gt",
"property_value": 21
}
This is equivalent to the python statement: doc["age"] > 21
Operators
The following operators are currently supported:
Operator |
Description |
Value type |
Example |
---|---|---|---|
|
is equal |
constant |
|
|
is not equal |
constant |
|
|
single value is in a list |
list |
|
|
a value is in a multi select |
list |
|
|
one of a list of values in in a multiselect |
list |
|
|
is less than |
number |
|
|
is less than or equal |
number |
|
|
is greater than |
number |
|
|
is greater than or equal |
number |
|
|
matches regular expression |
string |
|
Compound filters
Compound filters build on top of boolean_expression
filters to
create boolean logic. These can be combined to support arbitrarily
complicated boolean logic on data. There are three types of filters,
and, or, and not filters. The JSON representation of these is
below. Hopefully these are self explanatory.
“And” Filters
The following filter represents the statement:
doc["age"] < 21 and doc["nationality"] == "american"
:
{
"type": "and",
"filters": [
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "age",
"datatype": "integer"
},
"operator": "lt",
"property_value": 21
},
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "nationality",
},
"operator": "eq",
"property_value": "american"
}
]
}
“Or” Filters
The following filter represents the statement:
doc["age"] > 21 or doc["nationality"] == "european"
:
{
"type": "or",
"filters": [
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "age",
"datatype": "integer",
},
"operator": "gt",
"property_value": 21
},
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "nationality",
},
"operator": "eq",
"property_value": "european"
}
]
}
“Not” Filters
The following filter represents the statement:
!(doc["nationality"] == "european")
:
{
"type": "not",
"filter": [
{
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "nationality",
},
"operator": "eq",
"property_value": "european"
}
]
}
Note that this could be represented more simply using a single filter with the ``not_eq`` operator, but “not” filters can represent more complex logic than operators generally, since the filter itself can be another compound filter.
Practical Examples
See practical examples for some practical examples showing various filter types.
Indicators
Now that we know how to filter the data in our data source, we are still left with a very important problem: how do we know what data to save? This is where indicators come in. Indicators are the data outputs - what gets computed and put in a column in the database.
A typical data source will include many indicators (data that will later be included in the report). This section will focus on defining a single indicator. Single indicators can then be combined in a list to fully define a data source.
The overall set of possible indicators is theoretically any function that can take in a single document (form or case) and output a value. However the set of indicators that are configurable is more limited than that.
Indicator Properties
All indicator definitions have the following properties:
Property |
Description |
---|---|
type |
A specified type for the indicator. It must be one of the types listed below. |
column_id |
The database column where the indicator will be saved. |
display_name |
A display name for the indicator (not widely used, currently). |
comment |
A string describing the indicator |
Additionally, specific indicator types have other type-specific properties. These are covered below.
Indicator types
The following primary indicator types are supported:
Indicator Type |
Description |
---|---|
boolean |
Save |
expression |
Save the output of an expression. |
choice_list |
Save multiple columns, one for each of a predefined set of choices |
ledger_balances |
Save a column for each product specified, containing ledger data |
Note/todo: there are also other supported formats, but they are just shortcuts around the functionality of these ones they are left out of the current docs.
Boolean indicators
Now we see again the power of our filter framework defined above!
Boolean indicators take any arbitrarily complicated filter expression
and save a 1
to the database if the expression is true, otherwise a
0
. Here is an example boolean indicator which will save 1
if a
form has a question with ID is_pregnant
with a value of "yes"
:
{
"type": "boolean",
"column_id": "col",
"filter": {
"type": "boolean_expression",
"expression": {
"type": "property_path",
"property_path": ["form", "is_pregnant"],
},
"operator": "eq",
"property_value": "yes"
}
}
Expression indicators
Similar to the boolean indicators - expression indicators leverage the
expression structure defined above to create arbitrarily complex
indicators. Expressions can store arbitrary values from documents (as
opposed to boolean indicators which just store 0
‘s and 1
‘s).
Because of this they require a few additional properties in the
definition:
Property |
Description |
---|---|
datatype |
The datatype of the indicator. Current valid choices are: “date”, “datetime”, “string”, “decimal”, “integer”, and “small_integer”. |
is_nullable |
Whether the database column should allow null values. |
is_primary_key |
Whether the database column should be (part of?) the primary key. (TODO: this needs to be confirmed) |
create_index |
Creates an index on this column. Only applicable if using the SQL backend |
expression |
Any expression. |
transform |
(optional) transform to be applied to the result of the expression. (see “Report Columns > Transforms” section below) |
Here is a sample expression indicator that just saves the “age” property to an integer column in the database:
{
"type": "expression",
"expression": {
"type": "property_name",
"property_name": "age"
},
"column_id": "age",
"datatype": "integer",
"display_name": "age of patient"
}
Choice list indicators
Choice list indicators take a single choice column (select or multiselect) and expand it into multiple columns where each column represents a different choice. These can support both single-select and multi-select quesitons.
A sample spec is below:
{
"type": "choice_list",
"column_id": "col",
"display_name": "the category",
"property_name": "category",
"choices": [
"bug",
"feature",
"app",
"schedule"
],
"select_style": "single"
}
Ledger Balance Indicators
Ledger Balance indicators take a list of product codes and a ledger section, and produce a column for each product code, saving the value found in the corresponding ledger.
Property |
Description |
---|---|
ledger_section |
The ledger section to use for this indicator, for example, “stock” |
product_codes |
A list of the products
to include in the
indicator. This will be
used in conjunction with
the |
case_id_expression |
An expression used to get the case where each ledger is found. If not specified, it will use the row’s doc id. |
{
"type": "ledger_balances",
"column_id": "soh",
"display_name": "Stock On Hand",
"ledger_section": "stock",
"product_codes": ["aspirin", "bandaids", "gauze"],
"case_id_expression": {
"type": "property_name",
"property_name": "_id"
}
}
This spec would produce the following columns in the data source:
soh_aspirin |
soh_bandaids |
soh_gauze |
---|---|---|
20 |
11 |
5 |
67 |
32 |
9 |
If the ledger you’re using is a due list and you wish to save the dates instead of integers, you can change the “type” from “ledger_balances” to “due_list_date”.
Practical notes for creating indicators
These are some practical notes for how to choose what indicators to create.
Fractions
All indicators output single values. Though fractional indicators are common, these should be modeled as two separate indicators (for numerator and denominator) and the relationship should be handled in the report UI config layer.
Saving Multiple Rows per Case/Form
You can save multiple rows per case/form by specifying a root level
base_item_expression
that describes how to get the repeat data from
the main document. You can also use the root_doc
expression type to
reference parent properties and the base_iteration_number
expression
type to reference the current index of the item. This can be combined
with the iterator
expression type to do complex data source
transforms. This is not described in detail, but the following sample
(which creates a table off of a repeat element called “time_logs” can be
used as a guide). There are also additional examples in the practical examples:
{
"domain": "user-reports",
"doc_type": "DataSourceConfiguration",
"referenced_doc_type": "XFormInstance",
"table_id": "sample-repeat",
"display_name": "Time Logged",
"base_item_expression": {
"type": "property_path",
"property_path": ["form", "time_logs"]
},
"configured_filter": {
},
"configured_indicators": [
{
"type": "expression",
"expression": {
"type": "property_name",
"property_name": "start_time"
},
"column_id": "start_time",
"datatype": "datetime",
"display_name": "start time"
},
{
"type": "expression",
"expression": {
"type": "property_name",
"property_name": "end_time"
},
"column_id": "end_time",
"datatype": "datetime",
"display_name": "end time"
},
{
"type": "expression",
"expression": {
"type": "property_name",
"property_name": "person"
},
"column_id": "person",
"datatype": "string",
"display_name": "person"
},
{
"type": "expression",
"expression": {
"type": "root_doc",
"expression": {
"type": "property_name",
"property_name": "name"
}
},
"column_id": "name",
"datatype": "string",
"display_name": "name of ticket"
}
]
}
Data Cleaning and Validation
Note this is only available for “static” data sources that are created in the HQ repository.
When creating a data source it can be valuable to have strict validation on the type of data that can be inserted.
The attribute validations
at the top level of the configuration can use UCR expressions to determine if the data is invalid.
If an expression is deemed invalid, then the relevant error is stored in the InvalidUCRData
model.
{
"domain": "user-reports",
"doc_type": "DataSourceConfiguration",
"referenced_doc_type": "XFormInstance",
"table_id": "sample-repeat",
"base_item_expression": {},
"validations": [{
"name": "is_starred_valid",
"error_message": "is_starred has unexpected value",
"expression": {
"type": "boolean_expression",
"expression": {
"type": "property_name",
"property_name": "is_starred"
},
"operator": "in",
"property_value": ["yes", "no"]
}
}],
"configured_filter": { },
"configured_indicators": [ ]
}
Report Configurations
A report configuration takes data from a data source and renders it in the UI. A report configuration consists of a few different sections:
Report Filters - These map to filters that show up in the UI, and should translate to queries that can be made to limit the returned data.
Aggregation - This defines what each row of the report will be. It is a list of columns forming the primary key of each row.
Report Columns - Columns define the report columns that show up from the data source, as well as any aggregation information needed.
Charts - Definition of charts to display on the report.
Sort Expression - How the rows in the report are ordered.
Distinct On - Pick distinct rows from result based on columns.
Samples
Here are some sample configurations that can be used as a reference until we have better documentation.
Report Filters
The documentation for report filters is still in progress. Apologies for brevity below.
A note about report filters versus data source filters
Report filters are completely different from data source filters. Data source filters limit the global set of data that ends up in the table, whereas report filters allow you to select values to limit the data returned by a query.
Numeric Filters
Numeric filters allow users to filter the rows in the report by comparing a column to some constant that the user specifies when viewing the report. Numeric filters are only intended to be used with numeric (integer or decimal type) columns. Supported operators are =, ≠, <, ≤, >, and ≥.
ex:
{
"type": "numeric",
"slug": "number_of_children_slug",
"field": "number_of_children",
"display": "Number of Children"
}
Date filters
Date filters allow you filter on a date. They will show a datepicker in the UI.
{
"type": "date",
"slug": "modified_on",
"field": "modified_on",
"display": "Modified on",
"required": false
}
Date filters have an optional compare_as_string
option that allows
the date filter to be compared against an indicator of data type
string
. You shouldn’t ever need to use this option (make your column
a date
or datetime
type instead), but it exists because the
report builder needs it.
Quarter filters
Quarter filters are similar to date filters, but a choice is restricted only to the particular quarter of the year. They will show inputs for year and quarter in the UI.
{
"type": "quarter",
"slug": "modified_on",
"field": "modified_on",
"display": "Modified on",
"required": false
}
Pre-Filters
Pre-filters offer the kind of functionality you get from data source filters. This makes it easier to use one data source for many reports, especially if some of those reports just need the data source to be filtered slightly differently. Pre-filters do not need to be configured by app builders in report modules; fields with pre-filters will not be listed in the report module among the other fields that can be filtered.
A pre-filter’s type
is set to “pre”:
{
"type": "pre",
"field": "at_risk_field",
"slug": "at_risk_slug",
"datatype": "string",
"pre_value": "yes"
}
If pre_value
is scalar (i.e. datatype
is “string”, “integer”,
etc.), the filter will use the “equals” operator. If pre_value
is
null, the filter will use “is null”. If pre_value
is an array, the
filter will use the “in” operator. e.g.
{
"type": "pre",
"field": "at_risk_field",
"slug": "at_risk_slug",
"datatype": "array",
"pre_value": ["yes", "maybe"]
}
(If pre_value
is an array and datatype
is not “array”, it is
assumed that datatype
refers to the data type of the items in the
array.)
You can optionally specify the operator that the prevalue filter uses by adding a pre_operator argument. e.g.
{
"type": "pre",
"field": "at_risk_field",
"slug": "at_risk_slug",
"datatype": "array",
"pre_value": ["maybe", "yes"],
"pre_operator": "between"
}
Note that instead of using eq
, gt
, etc, you will need to use
=
, >
, etc.
Dynamic choice lists
Dynamic choice lists provide a select widget that will generate a list of options dynamically.
The default behavior is simply to show all possible values for a column,
however you can also specify a choice_provider
to customize this
behavior (see below).
Simple example assuming “village” is a name:
{
"type": "dynamic_choice_list",
"slug": "village",
"field": "village",
"display": "Village",
"datatype": "string"
}
Choice providers
Currently the supported choice_provider
s are supported:
Field |
Description |
---|---|
location |
Select a location by name |
user |
Select a user |
owner |
Select a possible case owner owner (user, group, or location) |
Location choice providers also support three additional configuration options:
“include_descendants” - Include descendants of the selected locations in the results. Defaults to
false
.“show_full_path” - Display the full path to the location in the filter. Defaults to
false
. The default behavior shows all locations as a flat alphabetical list.“location_type” - Includes locations of this type only. Default is to not filter on location type.
Example assuming “village” is a location ID, which is converted to names
using the location choice_provider
:
{
"type": "dynamic_choice_list",
"slug": "village",
"field": "location_id",
"display": "Village",
"datatype": "string",
"choice_provider": {
"type": "location",
"include_descendants": true,
"show_full_path": true,
"location_type": "district"
}
}
Choice lists
Choice lists allow manual configuration of a fixed, specified number of choices and let you change what they look like in the UI.
{
"type": "choice_list",
"slug": "role",
"field": "role",
"choices": [
{"value": "doctor", "display": "Doctor"},
{"value": "nurse"}
]
}
Drilldown by Location
This filter allows selection of a location for filtering by drilling down from top level.
{
"type": "location_drilldown",
"slug": "by_location",
"field": "district_id",
"include_descendants": true,
"max_drilldown_levels": 3
}
“include_descendants” - Include descendant locations in the results. Defaults to
false
.“max_drilldown_levels” - Maximum allowed drilldown levels. Defaults to 99
Internationalization
Report builders may specify translations for the filter display value. Also see the sections on internationalization in the Report Column and the translations transform.
{
"type": "choice_list",
"slug": "state",
"display": {"en": "State", "fr": "État"},
}
Report Columns
Reports are made up of columns. The currently supported column types ares:
field which represents a single value
percent which combines two values in to a percent
aggregate_date which aggregates data by month
expanded which expands a select question into multiple columns
expression which can do calculations on data in other columns
Field columns
Field columns have a type of "field"
. Here’s an example field column
that shows the owner name from an associated owner_id
:
{
"type": "field",
"field": "owner_id",
"column_id": "owner_id",
"display": "Owner Name",
"format": "default",
"transform": {
"type": "custom",
"custom_type": "owner_display"
},
"aggregation": "simple"
}
Percent columns
Percent columns have a type of "percent"
. They must specify a
numerator
and denominator
as separate field columns. Here’s an
example percent column that shows the percentage of pregnant women who
had danger signs.
{
"type": "percent",
"column_id": "pct_danger_signs",
"display": "Percent with Danger Signs",
"format": "both",
"denominator": {
"type": "field",
"aggregation": "sum",
"field": "is_pregnant",
"column_id": "is_pregnant"
},
"numerator": {
"type": "field",
"aggregation": "sum",
"field": "has_danger_signs",
"column_id": "has_danger_signs"
}
}
Formats
The following percentage formats are supported.
Format |
Description |
example |
---|---|---|
percent |
A whole number percentage (the default format) |
33% |
fraction |
A fraction |
1/3 |
both |
Percentage and fraction |
33% (1/3) |
numeric_percent |
Percentage as a number |
33 |
decimal |
Fraction as a decimal number |
.333 |
AggregateDateColumn
AggregateDate columns allow for aggregating data by month over a given
date field. They have a type of "aggregate_date"
. Unlike regular
fields, you do not specify how aggregation happens, it is automatically
grouped by month.
Here’s an example of an aggregate date column that aggregates the
received_on
property for each month (allowing you to count/sum
things that happened in that month).
{
"column_id": "received_on",
"field": "received_on",
"type": "aggregate_date",
"display": "Month"
}
AggregateDate supports an optional “format” parameter, which accepts the same format string as Date formatting. If you don’t specify a format, the default will be “%Y-%m”, which will show as, for example, “2008-09”.
Keep in mind that the only variables available for formatting are
year
and month
, but that still gives you a fair range, e.g.
format |
Example result |
---|---|
“%Y-%m” |
“2008-09” |
“%B, %Y” |
“September, 2008” |
“%b (%y)” |
“Sep (08)” |
IntegerBucketsColumn
Bucket columns allow you to define a series of ranges with corresponding names,
then group together rows where a specific field’s value falls within those ranges.
These ranges are inclusive, since they are implemented using the between
operator.
It is the user’s responsibility to make sure the ranges do not overlap; if a value
falls into multiple ranges, it is undefined behavior which bucket it will be assigned to.
Here’s an example that groups children based on their age at the time of registration:
{
"display": "age_range",
"column_id": "age_range",
"type": "integer_buckets",
"field": "age_at_registration",
"ranges": {
"infant": [0, 11],
"toddler": [12, 35],
"preschooler": [36, 60]
},
"else_": "older"
}
The "ranges"
attribute maps conditional expressions to labels. If the field’s value
does not fall into any of these ranges, the row will receive the "else_"
value, if provided.
SumWhenColumn and SumWhenTemplateColumn
Note: SumWhenColumn
usage is limited to static reports, and SumWhenTemplateColumn
usage is behind a feature flag.
Sum When columns allow you to aggregate data based on arbitrary conditions.
The SumWhenColumn
allows any expression.
The SumWhenTemplateColumn
is used in conjunction with a subclass of SumWhenTemplateSpec
.
The template defines an expression and typically accepts binds. An example:
Example using sum_when
:
{
"display": "under_six_month_olds",
"column_id": "under_six_month_olds",
"type": "sum_when",
"field": "age_at_registration",
"whens": [
["age_at_registration < 6", 1],
],
"else_": 0
}
Equivalent example using sum_when_template
:
{
"display": "under_x_month_olds",
"column_id": "under_x_month_olds",
"type": "sum_when_template",
"field": "age_at_registration",
"whens": [
{
"type": "under_x_months",
"binds": [6],
"then": 1
}
],
"else_": 0
}
Expanded Columns
Expanded columns have a type of "expanded"
. Expanded columns will be
“expanded” into a new column for each distinct value in this column of
the data source. For example:
If you have a data source like this:
+---------|----------|-------------+
| Patient | district | test_result |
+---------|----------|-------------+
| Joe | North | positive |
| Bob | North | positive |
| Fred | South | negative |
+---------|----------|-------------+
and a report configuration like this:
aggregation columns:
["district"]
columns:
[
{
"type": "field",
"field": "district",
"column_id": "district",
"format": "default",
"aggregation": "simple"
},
{
"type": "expanded",
"field": "test_result",
"column_id": "test_result",
"format": "default"
}
]
Then you will get a report like this:
+----------|----------------------|----------------------+
| district | test_result-positive | test_result-negative |
+----------|----------------------|----------------------+
| North | 2 | 0 |
| South | 0 | 1 |
+----------|----------------------|----------------------+
Expanded columns have an optional parameter "max_expansion"
(defaults to 10) which limits the number of columns that can be created.
WARNING: Only override the default if you are confident that there will
be no adverse performance implications for the server.
Expression columns
Expression columns can be used to do just-in-time calculations on the
data coming out of reports. They allow you to use any UCR expression on
the data in the report row. These can be referenced according to the
column_id
s from the other defined column. They can support
advanced use cases like doing math on two different report columns, or
doing conditional logic based on the contents of another column.
A simple example is below, which assumes another called “number” in the report and shows how you could make a column that is 10 times that column.
{
"type": "expression",
"column_id": "by_tens",
"display": "Counting by tens",
"expression": {
"type": "evaluator",
"statement": "a * b",
"context_variables": {
"a": {
"type": "property_name",
"property_name": "number"
},
"b": 10
}
}
}
Expression columns cannot be used in aggregations or filters. If you need to group by a derived value, then you must add that directly to your data source.
The “aggregation” column property
The aggregation column property defines how the column should be
aggregated. If the report is not doing any aggregation, or if the column
is one of the aggregation columns this should always be "simple"
(see Aggregation below for more information on
aggregation).
The following table documents the other aggregation options, which can be used in aggregate reports.
Format |
Description |
---|---|
simple |
No aggregation |
avg |
Average (statistical mean) of the values |
count_unique |
Count the unique values found |
count |
Count all rows |
min |
Choose the minimum value |
max |
Choose the maximum value |
sum |
Sum the values |
Column IDs
Column IDs in percentage fields must be unique for the whole report.
If you use a field in a normal column and in a percent column you must
assign unique column_id
values to it in order for the report to
process both.
Calculating Column Totals
To sum a column and include the result in a totals row at the bottom of
the report, set the calculate_total
value in the column
configuration to true
.
Not supported for the following column types: - expression
Internationalization
Report columns can be translated into multiple languages. To translate
values in a given column check out the translations
transform below. To specify
translations for a column header, use an object as the display
value
in the configuration instead of a string. For example:
{
"type": "field",
"field": "owner_id",
"column_id": "owner_id",
"display": {
"en": "Owner Name",
"he": "שם"
},
"format": "default",
"transform": {
"type": "custom",
"custom_type": "owner_display"
},
"aggregation": "simple"
}
The value displayed to the user is determined as follows: - If a display
value is specified for the users language, that value will appear in the
report. - If the users language is not present, display the "en"
value. - If "en"
is not present, show an arbitrary translation from
the display
object. - If display
is a string, and not an object,
the report shows the string.
Valid display
languages are any of the two or three letter language
codes available on the user settings page.
Aggregation
Aggregation in reports is done using a list of columns to aggregate on.
This defines how indicator data will be aggregated into rows in the
report. The columns represent what will be grouped in the report, and
should be the column_id
s of valid report columns. In most simple
reports you will only have one level of aggregation. See examples below.
No aggregation
Note that if you use is_primary_key
in any of your columns, you must
include all primary key columns here.
["doc_id"]
Aggregate by ‘username’ column
["username"]
Aggregate by two columns
["column1", "column2"]
Transforms
Transforms can be used in two places - either to manipulate the value of a column just before it gets saved to a data source, or to transform the value returned by a column just before it reaches the user in a report. Here’s an example of a transform used in a report config ‘field’ column:
{
"type": "field",
"field": "owner_id",
"column_id": "owner_id",
"display": "Owner Name",
"format": "default",
"transform": {
"type": "custom",
"custom_type": "owner_display"
},
"aggregation": "simple"
}
The currently supported transform types are shown below:
Translations and arbitrary mappings
The translations transform can be used to give human readable strings:
{
"type": "translation",
"translations": {
"lmp": "Last Menstrual Period",
"edd": "Estimated Date of Delivery"
}
}
And for translations:
{
"type": "translation",
"translations": {
"lmp": {
"en": "Last Menstrual Period",
"es": "Fecha Última Menstruación",
},
"edd": {
"en": "Estimated Date of Delivery",
"es": "Fecha Estimada de Parto",
}
}
}
To use this in a mobile ucr, set the 'mobile_or_web'
property to
'mobile'
{
"type": "translation",
"mobile_or_web": "mobile",
"translations": {
"lmp": "Last Menstrual Period",
"edd": "Estimated Date of Delivery"
}
}
Displaying Readable User Name (instead of user ID)
This takes a user_id value and changes it to HQ’s best guess at the user’s display name, using their first and last name, if available, then falling back to their username.
{
"type": "custom",
"custom_type": "user_display_including_name"
}
Displaying username instead of user ID
{
"type": "custom",
"custom_type": "user_display"
}
Displaying username minus @domain.commcarehq.org instead of user ID
{
"type": "custom",
"custom_type": "user_without_domain_display"
}
Displaying owner name instead of owner ID
{
"type": "custom",
"custom_type": "owner_display"
}
Displaying month name instead of month index
{
"type": "custom",
"custom_type": "month_display"
}
Rounding decimals
Rounds decimal and floating point numbers to two decimal places.
{
"type": "custom",
"custom_type": "short_decimal_display"
}
Generic number formatting
Rounds numbers using Python’s built in formatting.
See below for a few simple examples. Read the docs for complex ones. The input to the format string will be a number not a string.
If the format string is not valid or the input is not a number then the original input will be returned.
Round to the nearest whole number
{
"type": "number_format",
"format_string": "{0:.0f}"
}
Rich text formatting with Markdown
This can be used to do some rich text formatting, using [Markdown](https://www.markdownguide.org/).
There is no configuration required, it will assume the input is valid, markdown-ready text.
{
"type": "markdown"
}
This transform works for report columns only. Using it in a data source will add HTML markup, but it will not be displayed properly in HQ.
Always round to 3 decimal places
{
"type": "number_format",
"format_string": "{0:.3f}"
}
Date formatting
Formats dates with the given format string. See here for an explanation of format string behavior. If there is an error formatting the date, the transform is not applied to that value.
{
"type": "date_format",
"format": "%Y-%m-%d %H:%M"
}
Converting an ethiopian date string to a gregorian date
Converts a string in the YYYY-MM-DD format to a gregorian date. For example, 2009-09-11 is converted to date(2017, 5, 19). If it is unable to convert the date, it will return an empty string.
{
"type": "custom",
"custom_type": "ethiopian_date_to_gregorian_date"
}
Converting a gregorian date string to an ethiopian date
Converts a string in the YYYY-MM-DD format to an ethiopian date. For example, 2017-05-19 is converted to date(2009, 09, 11). If it is unable to convert the date, it will return an empty string.
{
"type": "custom",
"custom_type": "gregorian_date_to_ethiopian_date"
}
Charts
There are currently three types of charts supported. Pie charts, and two types of bar charts.
Pie charts
A pie chart takes two inputs and makes a pie chart. Here are the inputs:
Field |
Description |
---|---|
aggregation_colu mn |
The column you want to group - typically a column from a select question |
value_column |
The column you want to sum - often just a count |
Here’s a sample spec:
{
"type": "pie",
"title": "Remote status",
"aggregation_column": "remote",
"value_column": "count"
}
Aggregate multibar charts
An aggregate multibar chart is used to aggregate across two columns (typically both of which are select questions). It takes three inputs:
Field |
Description |
---|---|
primary_aggregation |
The primary aggregation. These will be the x-axis on the chart. |
secondary_aggregati on |
The secondary aggregation. These will be the slices of the bar (or individual bars in “grouped” format) |
value_column |
The column you want to sum - often just a count |
Here’s a sample spec:
{
"type": "multibar-aggregate",
"title": "Applicants by type and location",
"primary_aggregation": "remote",
"secondary_aggregation": "applicant_type",
"value_column": "count"
}
Multibar charts
A multibar chart takes a single x-axis column (typically a user, date, or select question) and any number of y-axis columns (typically indicators or counts) and makes a bar chart from them.
Field |
Description |
---|---|
x_axis_column |
This will be the x-axis on the chart. |
y_axis_columns |
These are the columns to use for the secondary axis. These will be the slices of the bar (or individual bars in “grouped” format). |
Here’s a sample spec:
{
"type": "multibar",
"title": "HIV Mismatch by Clinic",
"x_axis_column": "clinic",
"y_axis_columns": [
{
"column_id": "diagnoses_match_no",
"display": "No match"
},
{
"column_id": "diagnoses_match_yes",
"display": "Match"
}
]
}
Sort Expression
A sort order for the report rows can be specified. Multiple fields, in either ascending or descending order, may be specified. Example:
Field should refer to report column IDs, not database fields.
[
{
"field": "district",
"order": "DESC"
},
{
"field": "date_of_data_collection",
"order": "ASC"
}
]
Distinct On
Can be used to limit the rows in a report based on a single column or set of columns. The top most row is picked in case of duplicates.
This is different from aggregation in sense that this is done after fetching the rows, whereas aggregation is done before selecting the rows.
This is used in combination with a sort expression to have predictable results.
Please note that the columns used in distinct on clause should also be present in the sort expression as the first set of columns in the same order.
Pick distinct by a single column
Sort expression should have column1 and then other columns if needed
[
{
"field": "column1",
"order": "DESC"
},
{
"field": "column2",
"order": "ASC"
}
]
and distinct on would be
["column1"]
Pick distinct result based on two columns
Sort expression should have column1 and column2 in same order, More columns can be added after these if needed
[
{
"field": "column1",
"order": "DESC"
},
{
"field": "column2",
"order": "ASC"
}
]
and distinct on would be
["column1", "column2"]
Mobile UCR
Mobile UCR is a beta feature that enables you to make application modules and charts linked to UCRs on mobile. It also allows you to send down UCR data from a report as a fixture which can be used in standard case lists and forms throughout the mobile application.
The documentation for Mobile UCR is very sparse right now.
Filters
On mobile UCR, filters can be automatically applied to the mobile reports based on hardcoded or user-specific data, or can be displayed to the user.
The documentation of mobile UCR filters is incomplete. However some are documented below.
Custom Calendar Month
When configuring a report within a module, you can filter a date field by the ‘CustomMonthFilter’. The choice includes the following options: - Start of Month (a number between 1 and 28) - Period (a number between 0 and n with 0 representing the current month).
Each custom calendar month will be “Start of the Month” to (“Start of the Month” - 1). For example, if the start of the month is set to 21, then the period will be the 21th of the month -> 20th of the next month.
Examples: Assume it was May 15: Period 0, day 21, you would sync April 21-May 15th Period 1, day 21, you would sync March 21-April 20th Period 2, day 21, you would sync February 21 -March 20th
Assume it was May 20: Period 0, day 21, you would sync April 21-May 20th Period 1, day 21, you would sync March 21-April 20th Period 2, day 21, you would sync February 21-March 20th
Assume it was May 21: Period 0, day 21, you would sync May 21-May 21th Period 1, day 21, you would sync April 21-May 20th Period 2, day 21, you would sync March 21-April 20th
Export
A UCR data source can be exported, to back an excel dashboard, for instance. The URL for exporting data takes the form https://www.commcarehq.org/a/[domain]/configurable_reports/data_sources/export/[data source id]/
The export supports a “$format” parameter which can be any of the following options: html, csv, xlsx, xls. The default format is csv.
This export can also be filtered to restrict the results returned. The filtering options are all based on the field names:
URL parameter |
Value |
Description |
---|---|---|
{field_name} |
{exact value} |
require an exact match |
{field_name}-range |
{start}..{end} |
return results in range |
{field_name}-lastndays |
{number} |
restrict to the last n days |
This is configured in export_data_source
and tested in
test_export
. It should be pretty straightforward to add support for
additional filter types.
Export example
Let’s say you want to restrict the results to only cases owned by a particular user, opened in the last 90 days, and with a child between 12 and 24 months old as an xlsx file. The querystring might look like this:
?$format=xlsx&owner_id=48l069n24myxk08hl563&opened_on-lastndays=90&child_age-range=12..24
Practical Notes
Some rough notes for working with user configurable reports.
Getting Started
The easiest way to get started is to start with sample data and reports.
Create a simple app and submit a few forms. You can then use report builder to create a report.
Start at a/DOMAIN/reports/builder/select_source/
and create a report based on your form, either a form list or
form summary.
When your report is created, clicking “Edit” will bring you to the report builder editor.
An individual report can be viewed in the UCR editor by changing the report builder URL,
/a/DOMAIN/reports/builder/edit/REPORT_ID/
to the UCR URL, /a/DOMAIN/configurable_reports/reports/edit/REPORT_ID/
.
In this view, you can examine the columns, filters, and aggregation columns that report builder created.
The UCR config UI also includes pages to add new data sources, imports reports, etc.,
all based at /a/DOMAIN/configurable_reports/
. If you add a new report via the UCR UI and copy in the
columns, filters, etc. from a report builder report, that new report will then automatically open in the UCR UI when you edit it.
You can also take an existing report builder report and set my_report.report_meta.created_by_builder
to false
to force it to open in the UCR UI in the future.
Two example UCRs, a case-based UCR for the dimagi
domain and a form-based UCR for the gsid
domain,
are checked into source code. Their data source specs and report specs are in corehq/apps/userreports/examples/
.
The tests are also a good source of documentation for the various filter and indicator formats that are supported.
When editing data sources, you can check the progress of rebuilding using my_datasource.meta.build.finished
Static data sources
As well as being able to define data sources via the UI which are stored in the database you can also define static data sources which live as JSON documents in the source repository.
These are mainly useful for custom reports.
They conform to a slightly different style:
{
"domains": ["live-domain", "test-domain"],
"config": {
... put the normal data source configuration here
}
}
Having defined the data source you need to use the static_ucr_data_source_paths
extension point to make CommCare aware of your data source.
Now when the static data source pillow is run it will pick up the data
source and rebuild it.
Alternatively, the legacy method is to add the path to the data
source file to the STATIC_DATA_SOURCES
setting in settings.py
.
Changes to the data source require restarting the pillow which will rebuild the SQL table. Alternately you can use the UI to rebuild the data source (requires Celery to be running).
Static configurable reports
Configurable reports can also be defined in the source repository. Static configurable reports have the following style:
{
"domains": ["my-domain"],
"data_source_table": "my_table",
"report_id": "my-report",
"config": {
... put the normal report configuration here
}
}
Having defined the report you need to use the static_ucr_report_paths
extension point to make CommCare aware of your report.
Alternatively, the legacy method is to add the path to the data
source file to the STATIC_UCR_REPORTS
setting in settings.py
.
Custom configurable reports
Sometimes a client’s needs for a rendered report are outside of the
scope of the framework. To render the report using a custom Django
template or with custom Excel formatting, define a subclass of
ConfigurableReportView
and override the necessary functions. Then
include the python path to the class in the field
custom_configurable_report
of the static report and don’t forget to
include the static report in STATIC_DATA_SOURCES
in settings.py
.
Extending User Configurable Reports
When building a custom report for a client, you may find that you want to extend UCR with custom functionality. The UCR framework allows developers to write custom expressions, and register them with the framework. To do so:
Define a function that returns an expression object
def custom_expression(spec, evaluation_context):
...
Extend the
custom_ucr_expressions
extension point:
from corehq.apps.userreports.extension_points import custom_ucr_expressions
@custom_ucr_expressions.extend()
def ucr_expressions():
return [
('expression_name', 'path.to.custom_expression'),
]
See also:
CommCare Extension documentation for more details on using extensions.
custom_ucr_expressions
docstring for full extension point details.
location_type_name
: A way to get location type from a location document id.location_parent_id
: A shortcut to get a location’s parent ID a location id.get_case_forms
: A way to get a list of forms submitted for a case.get_subcases
: A way to get a list of subcases (child cases) for a case.indexed_case
: A way to get an indexed case from another case.
You can find examples of these in practical examples.
Scaling UCR
Profiling data sources
You can use
./manage.py profile_data_source <domain> <data source id> <doc id>
to profile a datasource on a particular doc. It will give you
information such as functions that take the longest and number of
database queries it initiates.
Faster Reporting
If reports are slow, then you can add create_index
to the data
source to any columns that have filters applied to them.
Asynchronous Indicators
If you have an expensive data source and the changes come in faster than
the pillow can process them, you can specify asynchronous: true
in
the data source. This flag puts the document id in an intermediary table
when a change happens which is later processed by a celery queue. If
multiple changes are submitted before this can be processed, a new entry
is not created, so it will be processed once. This moves the bottle neck
from kafka/pillows to celery.
The main benefit of this is that documents will be processed only once even if many changes come in at a time. This makes this approach ideal datasources that don’t require ‘live’ data or where the source documents change very frequently.
It is also possible achieve greater parallelization than is currently available via pillows since multiple Celery workers can process the changes.
A diagram of this workflow can be found here
Inspecting database tables
The easiest way to inspect the database tables is to use the sql command line utility.
This can be done by runnning ./manage.py dbshell
or using psql
.
The naming convention for tables is:
config_report_[domain name]_[table id]_[hash]
.
In postgres, you can see all tables by typing \dt
and use sql
commands to inspect the appropriate tables.