Diff for "PolicyAndProcess/Downtime"

Not logged in - Log In / Register

Differences between revisions 15 and 16
Revision 15 as of 2009-11-04 03:27:58
Size: 6522
Comment:
Revision 16 as of 2009-11-06 14:52:39
Size: 6771
Comment:
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:

One option that has worked very well is to share the release-manager role across timezones, handing over the current status and tasks to the backup in the next timezone at the end of your day. It's a great opportunity to work together as a team.

  • Process Name: Release Manager Rotation Process

  • Process Owner: Francis Lacoste

  • Parent Process/Activity: None

  • Supported Policy: None

Process Overview

Each cycle a different engineer takes the role of release manager. The release manager coordinates with the release team and all team leads to ensure that the tree is ready for the roll-out and that all critical bugs are in or worked-around.

Back-up release managers are the two RMs from the previous two cycles.

One option that has worked very well is to share the release-manager role across timezones, handing over the current status and tasks to the backup in the next timezone at the end of your day. It's a great opportunity to work together as a team.

Release Manager inputs

  • Email and IRC messages from engineers and team leads.
  • OOPS reports
  • Merge proposals

Activities

Before the roll-out

  • During week 3 ensure that staging is up-to-date so that it can be used for non-edge QA.
  • During week 3 ensure that a downtime announcement email is ready for lp-announce.
  • You might want to request that pqm-blockers are processed (such as cherry-picks) on the day that PQM is closing.
  • At the beginning of week 4. Make sure that release-critical was turned on in PQM. (Monday 00:00 UTC)
  • Determine the schedule and deadlines. Send an email to launchpad-dev with all of the deadlines, similar to this example email. Place the deadlines on the team calendar.

  • Update the #launchpad-dev topic to state we are in 'Release Critical' and to list the release manager.

  • Maintain the list of the Current roll-out blockers

    • The release manager should poll the team leads and QA engineers continuously to ensure that the list of release blockers is up-to-date. (We need to explore a work-around to retire this wiki page and do the management in Launchpad.) All bugs that are likely to cause lots of OOPSes, time-outs or prevent several users from working are good CRB candidates. It's a good idea to subscribe yourself to the page. (Currently broken.)
  • Make sure that developers are assigned to all problems we want to fix.
  • Review release-critical merge proposals. The policy should be:
    • All RC candidates go through the normal review process.
    • After code and UI review the MP is left in 'Needs Review' state.
    • A new review of type 'release-critical' is added to the MP and assigned to the release manager.
    • If the MP is approved for 'release-critical', the review is marked 'Approve' and the state of the MP is set to 'Approved'.

On the day before the roll-out

  • Request that landing to the devel branch be closed, 24 hours before the scheduled release. All changes should on the last day be merged through db-devel.

On the day of the roll-out

  • Chase up Current Rollout Blockers and any other pending release-critical fixes.

  • With PQM remaining open, have the LOSAs stop buildbot and set it do manual runs.
  • Remind people that all changes need to be in buildbot for 9 hours before the roll-out time. The LOSAs require two hours of pre-release preparation and we need to allow for two complete buildbot cycles. (9 = 2 + 2 * 3.5)

  • In the case of failures, it's best to roll-out the last-known-good-build rather than delaying the release. The cut-off point to decide which revision

    to roll out is 2 hours before the scheduled release.

  • Ensure that any embargoed external resources (e.g. blog entries) are live and accessible through the links provided. Ensure that a blog editor (Matthew Revell or delegate) is available at the time of the roll-out.
  • Immediately after the roll-out, examine the site for problems. For example, ensure CSS loads properly, all external links on the front page are reachable, etc.

After the roll-out

  • With the QA engineers, review the OOPS reports.
    • All common OOPSes are candidates for more release-critical fixes and scheduling another roll-out.
  • Prepare and schedule any necessary re-roll.
  • When a re-roll is needed, same activities than in the pre-roll out case.
  • Open the tree, when the released version is fine for the next cycle.
  • The release-manager needs to select the next release manager.

Release critical policy

  • To apply for a release-critical approval, you must have a reviewed merge proposal on Launchpad. The engineer adds a review of type

    release-critical to the merge proposal and ensures it is in the 'Needs Review' state.

  • Good candidates for release-critical approval are issues found during QA that are bound to create OOPSes and time outs or otherwise significantly inconvenience our end-users.
  • Apart from special exceptions discussed with the project lead, only bug fixes should be granted release-critical approval.
  • If there is no way for the developer to QA his change on staging through the normal update procedure before the roll-out, it's recommended to request a cowboy of the branch on staging to QA it before approval.
  • For the second roll-out (a.k.a. the re-roll), any change requiring database changes should go through the project lead, since DB updates seriously increase the length of the upgrade window.

Scheduling

  • Engineers apply in advance for one cycle.
  • They are selected by the previous release manager. Once selected, their name

    is put on the Launchpad Production Status page.

  • The actual roll-out time is determined by the release-manager's location:
    • Location

      Roll out time

      Americas

      00:00UTC

      Europe

      09:00UTC

      Asia/Pacific

      00:00UTC

  • No engineer should apply for the role more than twice a year.

References

PolicyAndProcess/Downtime (last edited 2011-06-06 22:02:02 by flacoste)