Configuring SQL Databases in CommCare

CommCare makes use of a number of logically different SQL databases. These databases can be all be a single physical database or configured as individual databases.

By default CommCare will use the default Django database for all SQL data.

_images/django_db_monolith.png

Auditcare Data

Auditcare data may be stored in a separate database by specifying a mapping in the LOCAL_CUSTOM_DB_ROUTING setting. A database with the specified alias must also exist in the Django DATABASES setting. Example configuration:

LOCAL_CUSTOM_DB_ROUTING = {"auditcare": "auditcare"}

It is recommended to use a separate database in high-traffic environments when auditcare is enabled. Auditcare is enabled for a subset of URLs by default.

Synclog Data

Synclog data may be stored in a separate database specified by the SYNCLOGS_SQL_DB_ALIAS setting. The value of this setting must be a DB alias in the Django DATABASES setting.

UCR Data

Data created by the UCR framework can be stored in multiple separate databases. Each UCR defines an engine_id parameter which tells it which configured database engine to use. These engines are defined in the REPORTING_DATABASES Django setting which maps the engine_id to a Django database alias defined in the DATABASES setting.

REPORTING_DATABASES = {
    'default': 'default',
    'ucr': 'ucr'
}

Sharded Form and Case data

It is recommended to have a separate set of databases to store data for Forms and Cases (as well as a few other models).

CommCare uses a combination of plproxy custom Python code to split the Form and Case data into multiple databases.

_images/django_db_sharded.png

The general rule is that if a query needs to be run on all (or most) shard databases it should go through plproxy since plproxy is more efficient at running multiple queries and compiling the results.

The configuration for these databases must be added to the DATABASES setting as follows:

USE_PARTITIONED_DATABASE = True

DATABASES = {
    'proxy': {
        ...
        'PLPROXY': {
            'PROXY': True
        }
    },
    'p1': {
        ...
        'PLPROXY': {
            'SHARDS': [0, 511]
        }
    },
    'p2': {
        ...
        'PLPROXY': {
            'SHARDS': [512, 1023]
        }
    }
}

Rules for shards

  • There can only DB with PROXY=True

  • The total number of shards must be a power of 2 i.e. 2, 4, 8, 16, 32 etc

  • The number of shards cannot be changed once you have data in them so it is wise to start with a large enough number e.g. 1024

  • The shard ranges must start at 0

  • The shard ranges are inclusive

    • [0, 3] -> [0, 1, 2, 3]

  • The shard ranges must be continuous (no gaps)

Sending read queries to standby databases

By including details for standby databases in the Django DATABASES setting we can configure CommCare to route certain READ queries to them.

Standby databases are configured in the same way as normal databases but may have an additional property group, STANDBY. This property group has the following sup-properties:

MASTER

The DB alias of the master database for this standby. This must refer to a database in the DATABASES setting.

ACCEPTABLE_REPLICATION_DELAY

The value of this must be an integer and configures the acceptable replication delay in seconds between the standby and the master. If the replication delay goes above this value then queries will not be routed to this database.

The default value for ACCEPTABLE_REPLICATION_DELAY is 3 seconds.

DATABASES = {
    'default': {...}
    'standby1': {
        ...
        'STANDBY': {
            'MASTER': 'default',
            'ACCEPTABLE_REPLICATION_DELAY': 30,
        }
    }
}

Once the standby databases are configured in the DATABASES settings there are two additional settings that control which queries get routed to them.

REPORTING_DATABASES

The REPORTING_DATABASES setting can be updated as follows:

REPORTING_DATABASES = {
    'default': 'default',
    'ucr': {
        'WRITE': 'ucr',
        'READ': [
            ('ucr', 1),
            ('ucr_standby1', 2),
            ('ucr_standby2', 2),
        ]
    }
}

The tuples listed under the ‘READ’ key specify a database alias (must be in DATABASES) and weighting. In the configuration above 20% of reads will be sent to ucr and 40% each to ucr_standby1 and ucr_standby2 (assuming both of them are available and have replication delay within range).

LOAD_BALANCED_APPS

This setting is used to route read queries from Django models.

LOAD_BALANCED_APPS = {
    'users': {
        'WRITE': 'default',
        'READ': [
            ('default', 1),
            ('standby1', 4),
        ]
    }
}

In the configuration above all write queries from models in the users app will go to the default database as well as 20% or read queries. The remaining 80% of read queries will be sent to the standby1 database.

For both the settings above, the following rules apply to the databases listed under READ:

  • There can only be one master database (not a standby database)

  • All standby databases must point to the same master database

  • If a master database is in this list, all standbys must point to this master