Migrating models from couch to postgres

This is a step by step guide to migrating a single model from couch to postgres.

Conceptual Steps

This is a multi-deploy process that keeps two copies of the data - one in couch, one in sql - in sync until the final piece of code is deployed and the entire migration is complete. It has three phases:

  1. Add SQL models and sync code

    • Define the new SQL models, based on the existing couch classes and using the SyncSQLToCouchMixin to keep sql changes in sync with couch.

    • Add the SyncCouchToSQLMixin to the couch class so that changes to couch documents get reflected in sql.

    • Write a management command that subclasses PopulateSQLCommand, which will create/update a corresponding SQL object for every couch document. This command will later be run by a django migration to migrate the data. For large servers, this command will also need to be run manually, outside of a deploy, to do the bulk of the migration.

  2. Switch app code to read/write in SQL

    • Update all code references to the couch classes to instead refer to the SQL classes.

    • Write a django migration that integrates with PopulateSQLCommand to ensure that all couch and sql data is synced.

  3. Remove couch

    • Delete the couch classes, and remove the SyncSQLToCouchMixin from the SQL classes.

Practical Steps

Even a simple model takes several pull requests to migrate, to avoid data loss while deploys and migrations are in progress. Best practice is a minimum of three pull requests, described below, each deployed to all large environments before merging the next one.

Some notes on source control:

  • It’s best to create all pull requests at once so that reviewers have full context on the migration.

  • It can be easier to do the work in a single branch and then make the branches for individual PRs later on.

  • If you don’t typically run a linter before PRing, let the linter run on each PR and fix errors before opening the next one.

  • Avoid having more than one migration happening in the same django app at the same time, to avoid migration conflicts.

PR 1: Add SQL model and migration management command, write to SQL

This should contain:

  • A new model and a management command that fetches all couch docs and creates or updates the corresponding SQL model(s).

    • Start by running the management command evaluate_couch_model_for_sql django_app_name MyDocType on a production environment. This will produce code to add to your models file and also a new management command.

      • The reason to run on production is that it will examine existing documents to help determine things like max_length. This also means it can take a while. If you have reasonable data locally, running it locally is fine - but since the sql class will often have stricter data validation than couch, it’s good to run it on prod at some point.

      • If the script encounters any list or dict properties, it’ll ask you if they’re submodels. If you say no, it’ll create them as json columns. If you say yes, it’ll skip them, because it doesn’t currently handle submodels. For the same reason, it’ll skip SchemaProperty and SchemaListProperty attributes. More on this subject below.

      • Properties found on documents in Couch that are not members of the Couch model class will be added to the SQL model. In most cases they can be dropped (and not migrated to SQL).

      • Properties that are present in the Couch model, but always null or not found in Couch will be added to the SQL model as unknown_type(null=True). These fields may be able to be dropped (and not migrated to SQL).

    • Add the generated models code to your models file. Edit as needed. Note the TODOs marked in the code:

      • The new class’s name will start with “SQL” but specify table name db_table that does not include “sql.” This is so that the class can later be renamed back to the original couch class’s name by just removing the db_table. This avoids renaming the table in a django migration, which can be a headache when submodels are involved.

      • The new class will include a column for couch document id.

      • The generated code uses SyncCouchToSQLMixin and SyncSQLToCouchMixin. If your model uses submodels, you will need to add overrides for _migration_sync_to_sql and _migration_sync_to_couch. If you add overrides, definitely add tests for them. Sync bugs are one of the easiest ways for this to go terribly wrong.

        • For an example of overriding the sync code for submodels, see the CommtrackConfig migration, or the CustomDataFields migration which is simpler but includes a P1-level bug fixed here.

        • Beware that the sync mixins capture exceptions thrown while syncing in favor of calling notify_exception. If you’re overwriting the sync code, this makes bugs easy to miss. The branch jls/sync-mixins-hard-fail is included on staging to instead make syncing fail hard; you might consider doing the same while testing locally.

      • Consider if your new model could use any additional db_index flags or a unique_together.

      • Some docs have attributes that are couch ids of other docs. These are weak spots easy to forget when the referenced doc type is migrated. Add a comment so these show up in a grep for the referenced doc type.

    • Run makemigrations

    • Add the generated migration command. Notes on this code:

      • The generated migration does not handle submodels. Edit update_or_create_sql_object to add support.

      • This command’s update_or_create_sql_object populates the sql models based on json alone, not the wrapped document (to avoid introducing another dependency on the couch model). You may need to convert data types that the default wrap implementation would handle. The generated migration will use force_to_datetime to cast datetimes but will not perform any other wrapping. Similarly, if the couch class has a wrap method, the migration needs to manage that logic. As an example, CommtrackActionConfig.wrap was defined here and handled in this migration.

      • The command will include a commit_adding_migration method to let third parties know which commit to deploy if they need to run the migration manually. This needs to be updated after this PR is merged, to add the hash of the commit that merged this PR into master.

  • Most models belong to a domain. For these:

To test this step locally:

  • With master checked out, make sure you have at least one couch document that will get migrated.

  • Check out your branch and run the populate command. Verify it creates as many objects as expected.

  • Test editing the pre-existing object. In a shell, verify your changes appear in both couch and sql.

  • Test creating a new object. In a shell, verify your changes appear in both couch and sql.

Automated tests are also a good idea. Automated tests are definitely necessary if you overrode any parts of the sync mixins. Example of tests for sync and migration code.

The migration command has a --verify option that will find any differences in the couch data vs the sql data.

Once this PR is deployed - later, after the whole shebang has been QAed - you’ll run the migration command in any environments where it’s likely to take more than a trivial amount of time. If the model is tied to domains you should initially migrate a few selected domains using --domains X Y Z and manually verify that the migration worked as expected before running it for all the data.

PR 2: Verify migration and read from SQL

This should contain:

For models with many references, it may make sense to do this work incrementally, with a first PR that includes the verification migration and then subsequent PRs that each update a subset of reads. Throughout this phase, all data should continue to be saved to both couch and sql.

After testing locally, this PR is a good time to ask the QA team to test on staging. Template for QA request notes:

This is a couch to sql migration, with the usual approach:
- Set up <workflow to create items in couch>.
- Ping me on the ticket and I'll deploy the code to staging and run the migration
- Test that you can <workflows to edit the items created earlier> and also <workflow to create new items>.

PR 3: Cleanup

This is the cleanup PR. Wait a few weeks after the previous PR to merge this one; there’s no rush. Clean up:

  • If your sql model uses a couch_id, remove it. Sample commit for HqDeploy

  • Remove the old couch model, which at this point should have no references. This includes removing any syncing code.

  • Now that the couch model is gone, rename the sql model from SQLMyModel to MyModel. Assuming you set up db_table in the initial PR, this is just removing that and running makemigrations.

  • Add the couch class to DELETABLE_COUCH_DOC_TYPES. Blame deletable_doc_types.py for examples.

  • Remove any couch views that are no longer used. Remember this may require a reindex; see the main db migration docs.

Current State of Migration

The current state of the migration is available internally here, which outlines approximate LOE, risk level, and notes on the remaining models.

For a definitive account of remaining couch-based models, you can identify all classes that descend from Document:

from dimagi.ext.couchdbkit import Document

def all_subclasses(cls):
    return set(cls.__subclasses__()).union([s for c in cls.__subclasses__() for s in all_subclasses(c)])

sorted([str(s) for s in all_subclasses(Document)])

To find how many documents of a given type exist in a given environment:

from corehq.dbaccessors.couchapps.all_docs import get_doc_ids_by_class, get_deleted_doc_ids_by_class

len(list(get_doc_ids_by_class(MyDocumentClass) + get_deleted_doc_ids_by_class(MyDocumentClass)))

There’s a little extra value to migrating models that have dedicated views:

grep -r MyDocumentClass . | grep _design.*map.js

There’s a lot of extra value in migrating areas where you’re familiar with the code context.

Ultimately, all progress is good.