Diff for "PolicyAndProcess/DatabaseSchemaChangesProcess"

Not logged in - Log In / Register

Differences between revisions 3 and 4
Revision 3 as of 2009-12-11 21:55:57
Size: 10333
Editor: brianfromme
Comment:
Revision 4 as of 2009-12-11 21:56:27
Size: 10329
Editor: brianfromme
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
 * Instructions on how to do schema changes are mirrored in rocketfuel in `database/schema/README`. If you
edit this wiki page, make sure to also commit your changes in that file so that the instructions are available
offline.
 * Instructions on how to do schema changes are mirrored in rocketfuel in `database/schema/README`. If you edit this wiki page, make sure to also commit your changes in that file so that the instructions are available offline.

  • Process Name: Database Schema Changes

NOTE:

  • Instructions on how to do schema changes are mirrored in rocketfuel in database/schema/README. If you edit this wiki page, make sure to also commit your changes in that file so that the instructions are available offline.

  • It may not be possible to make changes to files in that directory due to

database freezes. Process is then uncertain... (UTWL rules should apply: Use the Wiki, Luke.)

Overview

Let's say your branch needs to make changes to the database schema. You need to follow the steps on this page to ensure that the sample data is updated to match your schema changes.

We use sample data to provide well-known baseline data for the test suite, and to populate a developer's Launchpad instance so that launchpad.dev can display interesting stuff. There are some guidelines and recommendations you should be aware of before you make changes to the sample data, or you may break the tests for yourself or others.

Please note that sample data is for developer's instances only. It would make no sense to use the sample data on production systems!

If your tests require new data, you should strongly consider creating the data in your test's harness instead of adding new sample data. This will often make the tests themselves more readable because you're not relying on magical values in the sample database. Doing it this way also reduces the chance that your changes will break other tests by side-effect. Add the new data in your test's setUp() or in the narrative of your doctest. Because the test suite uses the launchpad_ftest database, there is no chance that running the test suite will accidentally add new sample data.

However, if you interact with the web U/I for launchpad.dev your changes will end up in the launchpad_dev database. This database is used to create the new sample data, so it is imperative that you run make schema to start with a pristine database before generating new sample data. If in fact you do want the effects of your u/i interactions to land in the new sample data, then the general process is to

  • run make schema

  • interact with launchpad.dev

  • follow the make newsampledata steps below

Be aware though that your generation of new sample data will probably have an effect on tests not related to your changes! For example, if you generate new karma events, you will probably break the karma_sample_data tests because they expect all karma events to be dated prior to the year 2002. If you make changes to the sample data, you must run the full test suite and ensure that you get no failures, otherwise there is a very high likelihood that PQM will reject your changes due to test suite failures when you go to land your branch.

Making schema changes

You need to run these steps whenever you make a schema change, regardless of whether you intend to add new sample data or not. For example, if you are adding a new column to the Person table, these steps ensure that the new sample data will include this new column.

  1. Run make schema to get a pristine database of sample data.

  2. Create a SQL file in database/schema/pending/ containing the changes you want, excluding any changes to default values. The first line of your file should be SET client_min_messages=ERROR;. Don't add COMMENT statements in this file, those should be added to database/schema/comments.sql. Don't bzr add this file unless you make sure to bzr rm it before your branch lands.

  3. Run your new SQL patch on the development database to ensure that it works. Do this by running psql launchpad_dev -f your-patch.sql

  4. (Optional): Interact with launchpad.dev to add any additional sample data you want to demonstrate in the web u/i.

  5. In database/schema/ run make newsampledata.

  6. Review the sample data changes that occured using diff current.sql newsampledata.sql. This diff can be hard to review as-is. You might want to use a graphical diff viewer like kompare or meld which will make it easier. Make sure that you understand all the changes you see.

  7. In database/sampledata/, move newsampledata.sql to current.sql, replacing the latter.

  8. Move your pending SQL file into database/schema/ with a name like patch-xx-99-0.sql (where xx matches the existing patches), and ending with the line INSERT INTO LaunchpadDatabaseRevision VALUES (xx, 99, 0);. When your patch is reviewed and approved, you will be assigned an official patch number, which you will use instead of 99 in both the name of the file and this last line.

  9. Run make schema again to ensure that it works, and that you now have a pristine database with the new sample data.

  10. Make any necessary changes to database/schema/fti.py, database/schema/security.cfg, and to the relevant lib/canonical/launchpad/database/ classes.

  11. Make any necessary changes to the SQL patch to reflect new default values.
  12. Run the full test suite to ensure that your new sample data doesn't break any existing tests by side effect. To do this, run ./test.py -vv.

  13. Go have lunch.

Note that if you make subsequent additional changes, you may be able to skip straight to step 5.

Proposing database schema changes

For any tables and fields that you change with an SQL script via Stuart (stub on IRC), please make sure you include comments.

The process now looks like this:

  1. If you think the proposed changes may be controversial, or you are just unsure, it is worth discussing the changes on the launchpad mailing list first to avoid wasting your time.
  2. Work on the patch in a branch as documented above.
  3. Work on it in revision control till your review is complete.
  4. Rename your patch to match the official patch number.
  5. Once code is also ready and reviewed, commit as normal.

Resolving schema conflicts

Resolving conflicts in current.sql manually is usually more trouble than it's worth. Instead, first resolve any conflicts in comments.sql, then:

cd database/schema/
mv {patch-in-question}-0.sql comments.sql pending/
cp {parent branch, e.g. rocketfuel}/database/schema/comments.sql ./
cp ../sampledata/current.sql.OTHER ../sampledata/current.sql
make
psql launchpad_dev -f pending/patch-xx-99-0.sql
make newsampledata
mv ../sampledata/newsampledata.sql ../sampledata/current.sql
mv pending/{patch-in-question}-0.sql pending/comments.sql ./
make   # Just to make sure everything works
cd ../..
bzr resolve database/sampledata/current.sql

Notes on Changing security.cfg

Changes to security.cfg can cause OOPS on edge.launchpad.net, if edge requires a permission that has not been granted to the production DB. So if you are landing a security.cfg change, you need to email stub, jml, CC: launchpad@ asking them to apply the manual change to jubany as well. Reviewers should remind about this when seeing a security.cfg change.

Rationale

  • PQM runs the launchpad test suite
  • The Launchpad test suite runs against sample data
  • However, there are problems that can occur with schema changes or data update scripts, running against production data, which do not occur running them against staging data. Examples are:
    • The update does not run correctly because of integrity errors.
    • The update takes a very long time to run.
  • So, we test new database schemas (patch levels above production) out on the staging server, which uses a daily-updated copy of the production database.
    1. We take a copy of the production database, then run whatever schema changes and updates are needed to bring it up to the patch level currently on mainline.
    2. Then we run mainline code against that new staging database.
  • The fact that staging is actually running, and that it got updated in a reasonable amount of time, shows that an update from production to the code in mainline is actually possible. The output from the staging update process shows the time it took to process each database patch.
  • Running on staging still doesn't catch all the errors we may find when actually doing a production database schema update; although the data will be the same, the organisation within the database may be different. For example, tables may need vacuuming, or be stored in a way that causes scripts to take a long time to run when they did not take so long when run on the same data on staging. This is because when we move the data from production to staging, we're moving just the data, not the exact internal arrangement of data in the database.

Triggers

  1. Developer needs a database schema change implemented as part of his/her development activities
  2. Developer needs a database schema change implemented as part of his/her bug fixing activities

Inputs

  1. Developer originated schema change concept

Outputs

  1. SQL Patch File

Participants

  1. Developer
  2. DBA (in our case, Stuart)

Subprocesses

  • N/A.

Standard Path Events/Activities

  1. When a developer has a change they want to make to the database schema, they write a database patch (see database/schema in the source tree).
  2. They can give themselves a provisional patch number on their development tree.
  3. When it has passed review, Stuart gives them a database patch number, which becomes the filename of the SQL.
  4. The acting DBA (again, Stuart) will also issue further instuctions on how to proceed.

Notes:

  • Because each database patch number is unique, various database schema changes can be worked on by different members of the team in parallel, and successfully merged together in different ways.
  • This is most often used just for schema changes.
  • Sometimes schema changes have accompanying data updates.

Alternative Path Events/Activities

  • If code changes need to land in tandem with a database patch, these needs to be discussed on a case by case basis with StuartBishop.

Comments

PolicyAndProcess/DatabaseSchemaChangesProcess (last edited 2022-09-29 10:26:50 by cjwatson)