Diff for "LEP/FastDowntime"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2011-07-12 03:23:53
Size: 2594
Editor: lifeless
Comment: booyah
Revision 2 as of 2011-07-12 10:43:14
Size: 2594
Editor: allenap
Comment: Typo
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
Our development cycle times correlate very highly with schema changes. Technical limitations in our environment make applying schema changes require disconnecting all clients for a period of time. By making this short and designing our schema changes carefully we can dramatically simplify the way that we do downtime (most of the time), resulting is less overall downtime and faster delivery of features (with less churn on developer focus). Our development cycle times correlate very highly with schema changes. Technical limitations in our environment make applying schema changes require disconnecting all clients for a period of time. By making this short and designing our schema changes carefully we can dramatically simplify the way that we do downtime (most of the time), resulting in less overall downtime and faster delivery of features (with less churn on developer focus).

Fast downtime

Rather than extended (typically 60-90 minutes) of downtime, have short downtime windows multiple times a week.

Contact: RobertCollins
On Launchpad: https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=fastdowntime

Rationale

Our development cycle times correlate very highly with schema changes. Technical limitations in our environment make applying schema changes require disconnecting all clients for a period of time. By making this short and designing our schema changes carefully we can dramatically simplify the way that we do downtime (most of the time), resulting in less overall downtime and faster delivery of features (with less churn on developer focus).

The basis for this change has been raised and hammered out on the stakeholders list; coding can start while further fine tuning is done on the -users list.

Stakeholders

All the LP stakeholders; particularly OEM who depend on LP to do daily releases.

User stories

developer-make-change

As a developer
I want to change Launchpads schema without waiting 4 weeks
so that I can fix a bug / improve functionality for users.

Constraints and Requirements

Must

  • Reliably remove all contention on the DB for schema changes
  • Reliably restore connections to the DB without requiring appserver / librarian / builddmaster / codebrowse-mapper instance restarts
  • Be reliably fast: 3-5 minutes initially, but aim for 30-60 seconds medium term.

Nice to have

There are a lot of bells and whistles we could do, but they will be the focus of future completely distinct work: we want to deliver the core functionality as rapidly and reliably as possible.

Must not

  • Require manual steps during the schema deploy: fully automated

Out of scope

  • A 'fail whale' page during the downtime
  • Schema patches that cannot be done incrementally [or which we decide are simply too-hard].

Subfeatures

Success

How will we know when we are done?

We can reliably deploy schema changes 24 hours after they land in devel, with < 5 minutes downtime.

How will we measure how well we have done?

The project lead has cycletime graphs which reflect long cycle times for DB related projects: their cycle time should come way down : the further it comes down the better this project succeeded.

Thoughts?

Put everything else here. Better out than in.

LEP/FastDowntime (last edited 2011-12-22 18:22:22 by gary)