Diff for "Database/LivePatching"

Not logged in - Log In / Register

Differences between revisions 14 and 15
Revision 14 as of 2012-08-01 14:06:58
Size: 6030
Editor: abentley
Comment:
Revision 15 as of 2013-02-04 00:46:17
Size: 5135
Editor: wgrant
Comment: Update for the post-slony era
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
This wiki page documents the schema changes that can be deployed on Launchpad's PostgreSQL + Slony database. This wiki page documents the schema changes that can be deployed on Launchpad's replicated PostgreSQL database.
Line 13: Line 13:
Cold schema changes are landed on db-devel, and hot ones on devel. All schema changes must land without appserver code changes (but test changes may be included, if trigger/db function changes are being landed). Schema patches need a non-zero patch level (e.g. (major, minor, 1) schema version number). Launchpad ignores these patches when detecting if the patch level
of the database matches the patch level it expects, allowing Launchpad
to run even though patches have been applied to the database that do
not exist in that instance's tree.
Cold schema changes are landed on db-devel, and hot ones on devel. All schema changes must land without appserver code changes (but test changes may be included, if trigger/db function changes are being landed).

/!\ These instructions only apply to tables under native PostgreSQL streaming replication. `lp_*` tables are replicated to SSO via Slony, so require extra special care.
Line 21: Line 20:
Some database patches we can apply live. These are applied directly to
the databases, bypassing Slony.
Some database patches we can apply live. They are currently applied by crafting an appropriate wrapper script and then running it directly on the master database node, from where it will replicate to the slaves.
Line 24: Line 22:
They are currently applied by crafting an appropriate wrapper script and then running it directly on all the nodes containing the LP schema (launchpad_prod_*, launchpad_standalone_*).

The (major, minor, patch) revision tuple needs to be manually inserted into the LaunchpadDatabaseRevision table on the master. Failing to do this will abort the next full upgrade when the system attempts to apply a patch that has already been applied (this situation is normally caught by the full staging update). It must not be applied to each node - it has to replicate via slony.
The (major, minor, patch) revision tuple needs to be manually inserted into the LaunchpadDatabaseRevision table on the master. Failing to do this will abort the next full upgrade when the system attempts to apply a patch that has already been applied (this situation is normally caught by the full staging update).
Line 32: Line 28:
Index creation for all but the smallest tables should done in a hot patch. Index creation for all but the smallest tables should be done in a hot patch.
Line 38: Line 34:
CREATE INDEX CONCURRENTLY cannot run inside a transaction, so indexes should first be built on the master. If the index cannot be built for some reason, such as trying to build a UNIQUE index on non-unique data, manual repair will be needed to back out the changes. /!\ Note that deleting an index requires a fastdowntime deploy. CREATE INDEX CONCURRENTLY cannot run inside a transaction, so if the index cannot be built for some reason, such as trying to build a UNIQUE index on non-unique data, manual repair will be needed to back out the changes. /!\ Note that deleting an index requires a fastdowntime deploy.
Line 40: Line 36:
Once build on the master, the indexes can be built on the slaves simultaneously or serially. Once built on the master, the index will automatically replicate to the slaves.
Line 46: Line 42:
After creating the indexes, confirm that there are no invalid indexes on the affected tables on all nodes. After creating the indexes, confirm that there are no invalid indexes on the affected tables.
Line 50: Line 46:
Run the CREATE OR REPLACE VIEW statements on all nodes. Run the CREATE OR REPLACE VIEW statements on the master.
Line 79: Line 75:
Run the CREATE TABLE statements on all nodes.

Add the new tables and associated sequences to the desired replication set (currently requires writing an adhoc Slonik script, but a helper to do this is possible).

On the master, populate the newly replicated table with any wanted data.
Line 102: Line 93:

 * There is logic in upgrade.py that detects unreplicated tables and adds them to the relevant replication sets. This should be split out into a helper that can be applied to the live system to make new table creation easier.

Summary

This wiki page documents the schema changes that can be deployed on Launchpad's replicated PostgreSQL database.

We use Hot schema changes to deploy changes without causing downtime.

And we use Cold schema changes to deploy changes with minimal (a few seconds per change) downtime.

In both cases, we deploy these changes without changing the running appserver code, so the changes need to be compatible with the existing live code. This means that, before landing the database patch, code may need to be landed to ensure everything runs both with and without the database patch applied.

Cold schema changes are landed on db-devel, and hot ones on devel. All schema changes must land without appserver code changes (but test changes may be included, if trigger/db function changes are being landed).

/!\ These instructions only apply to tables under native PostgreSQL streaming replication. lp_* tables are replicated to SSO via Slony, so require extra special care.

Hot patches

Some database patches we can apply live. They are currently applied by crafting an appropriate wrapper script and then running it directly on the master database node, from where it will replicate to the slaves.

The (major, minor, patch) revision tuple needs to be manually inserted into the LaunchpadDatabaseRevision table on the master. Failing to do this will abort the next full upgrade when the system attempts to apply a patch that has already been applied (this situation is normally caught by the full staging update).

We will be automating this in future, to handle at least the happy-path and avoiding accidentally running a hot patch during a fastdowntime.

Index Creation

Index creation for all but the smallest tables should be done in a hot patch.

The patch needs to use CREATE INDEX CONCURRENTLY which will do 3 separate fast transactions to make, populate, and activate the index. There is some contention added but it is minimal.

Don't bother starting index creation when there are long running transactions in progress. The indexes will not complete until the long running transactions have completed. (Because the index has to wait for the existing read transactions on the table to complete before it can be activated.

CREATE INDEX CONCURRENTLY cannot run inside a transaction, so if the index cannot be built for some reason, such as trying to build a UNIQUE index on non-unique data, manual repair will be needed to back out the changes. /!\ Note that deleting an index requires a fastdowntime deploy.

Once built on the master, the index will automatically replicate to the slaves.

Index creation should be done from a screen(1) session or similar to avoid network dropouts - for CREATE INDEX CONCURRENTLY, early disconnection will result in an invalid index rather than rolling back.

After creating the indexes, confirm that there are no invalid indexes on the affected tables.

Views

Run the CREATE OR REPLACE VIEW statements on the master.

Stored Procedures

Stored procedures can be changed (as long as the signature is still consistent with any triggers).

Table trigger definitions must be done as cold patches.

Cold Patches

Cold patches need to be applied with no other system activity to prevent locks holding the patch up. We have automation around this to do it as rapidly as possible.

Patch Process Overview

  1. Update wildcherry Launchpad source tree to the revision to run: make sure that the revision will only include one new patch (we do one patch at a time to reduce the debugging burden when something goes wrong post-apply), and run 'make clean build'.
  2. /!\ Run full-update.py --dry-run and check that only the expected db patch will be applied.

  3. Run full-update.py

Table Creation

This can sometimes be done hot, but not reliably due to FK relations needing exclusive table locks to activate. Do it as a cold patch.

/!\ Don't attempt to do this without the DBA available, as there still seem to be edge cases where we cannot do this live (such as a table with foreign key constraints to a busy table, requiring locks on the busy table that will not be granted quickly enough).

Table Renaming

Because code is being deployed separately from database updates, we need to maintain a fully updateable working alias for the old table name to the new table name. See Database/TableRenamePatch for an example database patch.

Adding columns

If the table is large enough that applying the patch raises performance concerns, split the work up into the following steps. All steps, aside from the first, are optional depending on the desired result.

  1. Create the column without indices and without setting not-null or a default value.
  2. Set the default
  3. Add the index via a hot patch
  4. Run a garbo job (or similar) to fill in the values
  5. Set the column not-null.

TODO

  • Work out how we test things work both with and without the database patch applied.

Database/LivePatching (last edited 2013-02-04 00:46:17 by wgrant)