Diff for "PolicyAndProcess/DatabaseSchemaChangesProcess"

Not logged in - Log In / Register

Differences between revisions 8 and 21 (spanning 13 versions)
Revision 8 as of 2009-12-15 05:45:17
Size: 10434
Editor: brianfromme
Comment:
Revision 21 as of 2010-10-18 15:39:52
Size: 10569
Editor: jtv
Comment:
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
'''NOTE:''' We change the schema through database patches. These live in branches that go through review and landing, just like code changes.
Line 5: Line 5:
 * Instructions on how to do schema changes are mirrored in rocketfuel in `database/schema/README`. If you edit this wiki page, make sure to also commit your changes in that file so that the instructions are available offline. A schema change needs a review of type "db" from the DBA (`stub`) or, in his absence, the technical architect (`lifeless`). But ''request db reviews from both'' these people, so that they can both track changes. A database patch also needs an identifying "patch number"; the db reviewer will assign one.
Line 7: Line 7:
== Overview == Changes to the permissions in `database/schema/security.py` or the comments in `database/schema/comments.sql` do not require db review. If you get both code and db reviews, the code reviewer should look at them. (But if nothing else in your branch requires code review, then they're for the db reviewer to inspect).
Line 9: Line 9:
First, you will need to get approval for your schema change from both the product strategist (jml) and the DBA (stub).
== Procedure ==

 1. Discuss non-trivial or controversial changes.

 1. Prepare a branch containing your database patch for review. Feel free to request assistance from or outsource work to the DBA (stub).

 1. Submit a merge proposal for your branch into `lp:launchpad` (`lp:~launchpad-pqm/launchpad/db-devel`), requesting two '''db''' reviews from respectively the technical architect (lifeless) and the DBA (stub).

 1. If your branch also contains code changes, then request a separate code review as well.

 1. Follow your reviewers' instructions. The DBA or Technical Architect will allocate a database patch number as required. The schema change is approved for landing when you have an 'Approved' vote from the DBA - no need to block waiting on the Technical Architect review.

 1. Ensure the sampledata is up to date on your branch.

 1. Ensure the test suite still passes.

 1. Land the branch on `lp:launchpad` (`lp:~launchpad-pqm/launchpad/db-devel`) unless someone has stated it is being landed it on your behalf.


== Making a database patch ==

You need to run these steps whenever you make a schema change, regardless of
whether you intend to add new sample data or not. For example, if you are
adding a new column to the `Person` table, these steps ensure that the new
sample data will include this new column.

 1. '''Run `make schema` to get a pristine database of sample data.'''

 1. Create a SQL file in `database/schema/pending/` containing the changes you want.
    1. The first line of your file should be `SET client_min_messages=ERROR;`.
    1. Don't use the TRUNCATE or DROP TABLE statements as they don't work with Slony-I replication.
       1. To drop a table, rename it into the `todrop` namespace using a statement like `ALTER TABLE FooBar RENAME TO todrop.FooBar`.
    1. If you are doing data initialization or data migration, don't use indeterminate values that might give different results on different databases, such as random() or CURRENT_TIMESTAMP. This would create skew in our data between the master and slave databases.
         1. For a new column with an indeterminate DEFAULT, you may need to manually set the initial values to something determinate before adding the DEFAULT to the column. eg. initialize the new date_eviscerated column to a fixed, known timestamp before setting its default to CURRENT_TIMESTAMP AT TIME ZONE 'UTC'.

 1. Run your new SQL patch on the development database to ensure that it works. Do this by running `psql launchpad_dev -f your-patch.sql`

 1. ''(Optional):'' Interact with `launchpad.dev` to add any additional sample data you want to demonstrate in the web u/i.

 1. Run `psql launchpad_ftest_playground -f your-patch.sql` as the ftest playground db is used to generate newsampledata.sql in the following step.

 1. Run `make newsampledata`.

 1. Review the sample data changes that occured using `diff current.sql newsampledata.sql`. This diff can be hard to review as-is. You might want to use a graphical diff viewer like `kompare` or `meld` which will make it easier. Make sure that you understand all the changes you see.

 1. In `database/sampledata/`, move `newsampledata.sql` to `current.sql`, replacing the latter.

 1. In `database/sampledata/`, move `newsampledata-dev.sql` to `current-dev.sql`, replacing the latter.

 1. Move your pending SQL file into `database/schema/` with a name like `patch-xx-99-0.sql` (where ''xx'' matches the existing patches), and ending with the line `INSERT INTO LaunchpadDatabaseRevision VALUES (xx, 99, 0);`. When your patch is reviewed and approved, you will be assigned an official patch number, which you will use instead of `99` in both the name of the file and this last line.

 1. '''Run `make schema` again to ensure that it works, and that you now have a pristine database with the new sample data.'''

 1. Comments on new tables and columns need to be added to `database/schema/comments.sql`.

 1. Make any necessary changes to `database/schema/fti.py`, `database/schema/security.cfg`, and at least the minimum code changes required to keep the test suite passing.

 1. Make any necessary changes to the SQL patch to reflect new default values.

 1. '''Run the full test suite to ensure that your new sample data doesn't break any existing tests by side effect.'''

 1. Go have lunch.

Note that if you make subsequent additional changes, you may be able to skip straight to step 5.


== Sample data ==
Line 17: Line 84:
We use sample data to provide well-known baseline data for the test suite, and
to populate a developer's Launchpad instance so that `launchpad.dev` can
display interesting stuff. There are some guidelines and recommendations you
should be aware of before you make changes to the sample data, or you may
break the tests for yourself or others.
We use sample data to provide well-known baseline data for the test suite, and to populate a developer's Launchpad instance so that `launchpad.dev` can display interesting stuff. There are some guidelines and recommendations you should be aware of before you make changes to the test suite sample data, or you may break the tests for yourself or others.
Line 43: Line 106:
 * follow the `make newsampledata` steps below  * follow the `make newsampledata` steps above.
Line 51: Line 114:
that PQM will reject your changes due to test suite failures when you go to
land your branch.
that you will break the trunk.
Line 54: Line 116:
== Making schema changes ==
Line 56: Line 117:
You need to run these steps whenever you make a schema change, regardless of
whether you intend to add new sample data or not. For example, if you are
adding a new column to the `Person` table, these steps ensure that the new
sample data will include this new column.

 1. '''Run `make schema` to get a pristine database of sample data.'''
 1. Create a SQL file in `database/schema/pending/` containing the changes you want, excluding any changes to default values. The first line of your file should be `SET client_min_messages=ERROR;`. Don't add `COMMENT` statements in this file, those should be added to `database/schema/comments.sql`. Don't `bzr add` this file unless you make sure to `bzr rm` it before your branch lands.
 1. Run your new SQL patch on the development database to ensure that it works. Do this by running `psql launchpad_dev -f your-patch.sql`
 1. ''(Optional):'' Interact with `launchpad.dev` to add any additional sample data you want to demonstrate in the web u/i.
 1. In `database/schema/` run `make newsampledata`.
 1. Review the sample data changes that occured using `diff current.sql newsampledata.sql`. This diff can be hard to review as-is. You might want to use a graphical diff viewer like `kompare` or `meld` which will make it easier. Make sure that you understand all the changes you see.
 1. In `database/sampledata/`, move `newsampledata.sql` to `current.sql`, replacing the latter.
 1. Move your pending SQL file into `database/schema/` with a name like `patch-xx-99-0.sql` (where ''xx'' matches the existing patches), and ending with the line `INSERT INTO LaunchpadDatabaseRevision VALUES (xx, 99, 0);`. When your patch is reviewed and approved, you will be assigned an official patch number, which you will use instead of `99` in both the name of the file and this last line.
 1. '''Run `make schema` again to ensure that it works, and that you now have a pristine database with the new sample data.'''
 1. Make any necessary changes to `database/schema/fti.py`, `database/schema/security.cfg`, and to the relevant `lib/canonical/launchpad/database/` classes.
 1. Make any necessary changes to the SQL patch to reflect new default values.
 1. '''Run the full test suite to ensure that your new sample data doesn't break any existing tests by side effect.''' To do this, run `./test.py -vv`.
 1. Go have lunch.

Note that if you make subsequent additional changes, you may be able to skip straight to step 5.

== Proposing database schema changes ==

For any tables and fields that you change with an SQL script via Stuart (stub on IRC), please make sure you include comments.

The process now looks like this:

 1. If you think the proposed changes may be controversial, or you are just unsure, it is worth discussing the changes on the launchpad mailing list first to avoid wasting your time.
 1. Obtain approval of your database (schema) patch before working on matching code.
 1. Work on the patch in a branch as documented above.
 1. Work on it in revision control till your review is complete. Stuart should give your an official patch number.
 1. Rename your patch to match the official patch number.
 1. Once code is also ready and reviewed, commit as normal.

== Resolving schema conflicts ==
=== Resolving schema conflicts ===
Line 109: Line 136:
Changes to security.cfg can cause OOPS on edge.launchpad.net, if edge requires a permission that has not been granted to the production DB. So if you are landing a security.cfg change, you need to email stub, jml, CC: launchpad@ asking them to apply the manual change to jubany as well. Reviewers should remind about this when seeing a security.cfg change.


=== Rationale ===

 * PQM runs the launchpad test suite
 * The Launchpad test suite runs against sample data
 * However, there are problems that can occur with schema changes or data update scripts, running against production data, which do not occur running them against staging data. Examples are:
  * The update does not run correctly because of integrity errors.
  * The update takes a very long time to run.
 * So, we test new database schemas (patch levels above production) out on the staging server, which uses a daily-updated copy of the production database.
  1. We take a copy of the production database, then run whatever schema changes and updates are needed to bring it up to the patch level currently on mainline.
  1. Then we run mainline code against that new staging database.
 * The fact that staging is actually running, and that it got updated in a reasonable amount of time, shows that an update from production to the code in mainline is actually possible. The output from the staging update process shows the time it took to process each database patch.

 * Running on staging still doesn't catch all the errors we may find when actually doing a production database schema update; although the data will be the same, the organisation within the database may be different. For example, tables may need vacuuming, or be stored in a way that causes scripts to take a long time to run when they did not take so long when run on the same data on staging. This is because when we move the data from production to staging, we're moving just the data, not the exact internal arrangement of data in the database.
It is possible to land changes to security.cfg on lp:~launchpad-pqm/launchpad/devel rather than lp:~launchpad-pqm/launchpad/db-devel. These changes can cause OOPSes on edge.launchpad.net if edge requires a permission that has not been granted to the production DB. So if you are landing a security.cfg change that needs to land on lp:~launchpad-pqm/launchpad/devel, you need to email stub & the losas, CC: launchpad@ asking them to apply the manual change to the production databases as well. Reviewers should remind about this when seeing a security.cfg change.
Line 144: Line 156:
 2. DBA (in our case, Stuart)  2. DBA (in our case, StuartBishop)
 3. Technical Architect (in our case, RobertCollins)
Line 154: Line 167:
 1. When it has passed review, Stuart gives them a database patch number, which becomes the filename of the SQL.
 1. The acting DBA (again, Stuart) will also issue further instuctions on how to proceed.
 1. A merge proposal is submitted, with reviews requested from the Launchpad Product Strategist and the DBA.
 1. When it has passed review, the DBA gives them a database patch number, which becomes the filename of the SQL.
 1. The DBA will also issue further instuctions on how to proceed.
Line 163: Line 177:
 * If code changes need to land in tandem with a database patch, these needs to be discussed on a case by case basis with StuartBishop.

We change the schema through database patches. These live in branches that go through review and landing, just like code changes.

A schema change needs a review of type "db" from the DBA (stub) or, in his absence, the technical architect (lifeless). But request db reviews from both these people, so that they can both track changes. A database patch also needs an identifying "patch number"; the db reviewer will assign one.

Changes to the permissions in database/schema/security.py or the comments in database/schema/comments.sql do not require db review. If you get both code and db reviews, the code reviewer should look at them. (But if nothing else in your branch requires code review, then they're for the db reviewer to inspect).

Procedure

  1. Discuss non-trivial or controversial changes.
  2. Prepare a branch containing your database patch for review. Feel free to request assistance from or outsource work to the DBA (stub).
  3. Submit a merge proposal for your branch into lp:launchpad (lp:~launchpad-pqm/launchpad/db-devel), requesting two db reviews from respectively the technical architect (lifeless) and the DBA (stub).

  4. If your branch also contains code changes, then request a separate code review as well.
  5. Follow your reviewers' instructions. The DBA or Technical Architect will allocate a database patch number as required. The schema change is approved for landing when you have an 'Approved' vote from the DBA - no need to block waiting on the Technical Architect review.
  6. Ensure the sampledata is up to date on your branch.
  7. Ensure the test suite still passes.
  8. Land the branch on lp:launchpad (lp:~launchpad-pqm/launchpad/db-devel) unless someone has stated it is being landed it on your behalf.

Making a database patch

You need to run these steps whenever you make a schema change, regardless of whether you intend to add new sample data or not. For example, if you are adding a new column to the Person table, these steps ensure that the new sample data will include this new column.

  1. Run make schema to get a pristine database of sample data.

  2. Create a SQL file in database/schema/pending/ containing the changes you want.

    1. The first line of your file should be SET client_min_messages=ERROR;.

    2. Don't use the TRUNCATE or DROP TABLE statements as they don't work with Slony-I replication.
      1. To drop a table, rename it into the todrop namespace using a statement like ALTER TABLE FooBar RENAME TO todrop.FooBar.

    3. If you are doing data initialization or data migration, don't use indeterminate values that might give different results on different databases, such as random() or CURRENT_TIMESTAMP. This would create skew in our data between the master and slave databases.
      1. For a new column with an indeterminate DEFAULT, you may need to manually set the initial values to something determinate before adding the DEFAULT to the column. eg. initialize the new date_eviscerated column to a fixed, known timestamp before setting its default to CURRENT_TIMESTAMP AT TIME ZONE 'UTC'.
  3. Run your new SQL patch on the development database to ensure that it works. Do this by running psql launchpad_dev -f your-patch.sql

  4. (Optional): Interact with launchpad.dev to add any additional sample data you want to demonstrate in the web u/i.

  5. Run psql launchpad_ftest_playground -f your-patch.sql as the ftest playground db is used to generate newsampledata.sql in the following step.

  6. Run make newsampledata.

  7. Review the sample data changes that occured using diff current.sql newsampledata.sql. This diff can be hard to review as-is. You might want to use a graphical diff viewer like kompare or meld which will make it easier. Make sure that you understand all the changes you see.

  8. In database/sampledata/, move newsampledata.sql to current.sql, replacing the latter.

  9. In database/sampledata/, move newsampledata-dev.sql to current-dev.sql, replacing the latter.

  10. Move your pending SQL file into database/schema/ with a name like patch-xx-99-0.sql (where xx matches the existing patches), and ending with the line INSERT INTO LaunchpadDatabaseRevision VALUES (xx, 99, 0);. When your patch is reviewed and approved, you will be assigned an official patch number, which you will use instead of 99 in both the name of the file and this last line.

  11. Run make schema again to ensure that it works, and that you now have a pristine database with the new sample data.

  12. Comments on new tables and columns need to be added to database/schema/comments.sql.

  13. Make any necessary changes to database/schema/fti.py, database/schema/security.cfg, and at least the minimum code changes required to keep the test suite passing.

  14. Make any necessary changes to the SQL patch to reflect new default values.
  15. Run the full test suite to ensure that your new sample data doesn't break any existing tests by side effect.

  16. Go have lunch.

Note that if you make subsequent additional changes, you may be able to skip straight to step 5.

Sample data

Let's say your branch needs to make changes to the database schema. You need to follow the steps on this page to ensure that the sample data is updated to match your schema changes.

In fact, there are now two sets of sampledata that need to be updated.

We use sample data to provide well-known baseline data for the test suite, and to populate a developer's Launchpad instance so that launchpad.dev can display interesting stuff. There are some guidelines and recommendations you should be aware of before you make changes to the test suite sample data, or you may break the tests for yourself or others.

Please note that sample data is for developer's instances only. It would make no sense to use the sample data on production systems!

If your tests require new data, you should strongly consider creating the data in your test's harness instead of adding new sample data. This will often make the tests themselves more readable because you're not relying on magical values in the sample database. Doing it this way also reduces the chance that your changes will break other tests by side-effect. Add the new data in your test's setUp() or in the narrative of your doctest. Because the test suite uses the launchpad_ftest database, there is no chance that running the test suite will accidentally add new sample data.

However, if you interact with the web U/I for launchpad.dev your changes will end up in the launchpad_dev database. This database is used to create the new sample data, so it is imperative that you run make schema to start with a pristine database before generating new sample data. If in fact you do want the effects of your u/i interactions to land in the new sample data, then the general process is to

  • run make schema

  • interact with launchpad.dev

  • follow the make newsampledata steps above.

Be aware though that your generation of new sample data will probably have an effect on tests not related to your changes! For example, if you generate new karma events, you will probably break the karma_sample_data tests because they expect all karma events to be dated prior to the year 2002. If you make changes to the sample data, you must run the full test suite and ensure that you get no failures, otherwise there is a very high likelihood that you will break the trunk.

Resolving schema conflicts

Resolving conflicts in current.sql manually is usually more trouble than it's worth. Instead, first resolve any conflicts in comments.sql, then:

cd database/schema/
mv {patch-in-question}-0.sql comments.sql pending/
cp {parent branch, e.g. rocketfuel}/database/schema/comments.sql ./
cp ../sampledata/current.sql.OTHER ../sampledata/current.sql
make
psql launchpad_dev -f pending/patch-xx-99-0.sql
make newsampledata
mv ../sampledata/newsampledata.sql ../sampledata/current.sql
mv pending/{patch-in-question}-0.sql pending/comments.sql ./
make   # Just to make sure everything works
cd ../..
bzr resolve database/sampledata/current.sql

Notes on Changing security.cfg

It is possible to land changes to security.cfg on lp:~launchpad-pqm/launchpad/devel rather than lp:~launchpad-pqm/launchpad/db-devel. These changes can cause OOPSes on edge.launchpad.net if edge requires a permission that has not been granted to the production DB. So if you are landing a security.cfg change that needs to land on lp:~launchpad-pqm/launchpad/devel, you need to email stub & the losas, CC: launchpad@ asking them to apply the manual change to the production databases as well. Reviewers should remind about this when seeing a security.cfg change.

Triggers

  1. Developer needs a database schema change implemented as part of his/her development activities
  2. Developer needs a database schema change implemented as part of his/her bug fixing activities

Inputs

  1. Developer originated schema change concept

Outputs

  1. SQL Patch File

Participants

  1. Developer
  2. DBA (in our case, StuartBishop)

  3. Technical Architect (in our case, RobertCollins)

Subprocesses

  • N/A.

Standard Path Events/Activities

  1. When a developer has a change they want to make to the database schema, they write a database patch (see database/schema in the source tree).
  2. They can give themselves a provisional patch number on their development tree.
  3. A merge proposal is submitted, with reviews requested from the Launchpad Product Strategist and the DBA.
  4. When it has passed review, the DBA gives them a database patch number, which becomes the filename of the SQL.
  5. The DBA will also issue further instuctions on how to proceed.

Notes:

  • Because each database patch number is unique, various database schema changes can be worked on by different members of the team in parallel, and successfully merged together in different ways.
  • This is most often used just for schema changes.
  • Sometimes schema changes have accompanying data updates.

Alternative Path Events/Activities

Comments

PolicyAndProcess/DatabaseSchemaChangesProcess (last edited 2022-09-29 10:26:50 by cjwatson)