I am trying to find some clarity in a problem. I am writing a document of best practices for database schema migrations with Postgres so that I can share with other engineers. A best practice I recommend is using a schema migration tool that allows a developer to create changes incrementally – like alembic. I've used tools like alembic to enable schema management like code management. It enables repeatable builds, tests, and deployment practices.
Assuming engineers are in favor of using schema migrations tools, I define the order of operations for schema migrations and code deployment for an environment is:
- Run database schema migrations
- Deploy application code
Application code is coupled with database schema, so I want schema changes to occur first so that the application code can properly reference any new changes to a schema once it lands onto application servers.
Now that deployment of schema migrations and application code is defined, the next step is defining rollbacks. And this is where I lack clarity.
The order of operations for a rollback is the reversal of steps above. I rollback code first, and then I rollback the schema migrations. But, rolling back a database is not the same as rolling back code.
There is one specific type of schema change that complicates database rollbacks – removing data. For example, if the schema migration dropped a table, then rolling back will not restore the data in the dropped table. This requires restoring the missing table from backup. But, restoring the missing data doesn't mean that the database is in a consistent state (table references) – what if the backup used was one that happened a day ago. So now the database is missing a large chunk of data.
So what am I to do? With this complication, my current recommendation is to never rollback a database schema migration. But, this doesn't feel quite clean enough.
Alas, I've finished my coffee. To be continued next time.