Diff for "PolicyAndProcess/Downtime"

Not logged in - Log In / Register

Differences between revisions 1 and 11 (spanning 10 versions)
Revision 1 as of 2009-07-21 23:00:34
Size: 4124
Editor: flacoste
Comment:
Revision 11 as of 2009-10-27 15:55:08
Size: 6214
Editor: jml
Comment:
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
Back-up release managers are the two RMs from the previous two cycles.
Line 18: Line 20:
  * OOPS report
  * Merge proposal
  * OOPS reports
  * Merge proposals
Line 26: Line 28:
  in PQM.   in PQM. (Monday 00:00 UTC)
Line 28: Line 30:
  * Update the `#launchpad-dev` topic to list him as release-manager.   * Determine the schedule and deadlines. Send an email to launchpad-dev with all of the deadlines, similar to this [[ExampleReleaseScheduleEmail|example email]]. Place the deadlines on the team calendar.

  * Update the `#launchpad-dev` topic to state we are in 'Release Critical' and to list the release manager.
Line 33: Line 37:
    continuously to ensure that the list of release blockers is up to date.     continuously to ensure that the list of release blockers is up-to-date.  (We need to explore a
    work-around to retire this wiki page and do the management in Launchpad.)
Line 35: Line 40:
    All bugs that are likely to cause lots of OOPSes, time outs or prevent     All bugs that are likely to cause lots of OOPSes, time-outs or prevent
Line 37: Line 42:

    It's a good idea to subscribe yourself to the page. (Currently broken.)
Line 40: Line 47:
  * Review release-critical merge.   * Review release-critical merge proposals. The policy should be:
     * All RC candidates go through the normal review process.
     * After code and UI review the MP is left in 'Needs Review' state.
     * A new review of type 'release-critical' is added to the MP and assigned to the release manager.
     * If the MP is approved for 'release-critical', the review is marked 'Approve' and the state of the MP is set to 'Approved'.
Line 45: Line 56:
  * Request that landing to the `devel` branch be closed. (All changes    should on the last day be merged through `db-devel`.)   * Request that landing to the `devel` branch be closed, 24 hours before the scheduled release.  All changes should on the last day be merged through `db-devel`.
Line 54: Line 64:
  * Remind people that all changes need to be in buildbot for '''6 hours'''
  before the roll-out time.
  * With PQM remaining open, have the LOSAs stop buildbot and set it do manual runs.
  * Remind people that all changes need to be in buildbot for '''9 hours'''
  before the roll-out time. The LOSAs require two hours of pre-release preparation and we need
  to allow for two complete buildbot cycles. (9 = 2 + 2 * 3.5)
Line 61: Line 73:
  * Ensure that any embargoed external resources (e.g. blog entries) are live and accessible through the links provided. Ensure that a blog editor (Matthew Revell or delegate) is available at the time of the roll-out.

 * Immediately after the roll-out, examine the site for problems. For example, ensure CSS loads properly, all external links on the front page are reachable, etc.
Line 66: Line 81:
    All common OOPSes are canditates for more release-critical fixes and     All common OOPSes are candidates for more release-critical fixes and
Line 69: Line 84:
  
Line 75: Line 91:
  * The release-manager need to select the next release manager.   * The release-manager needs to select the next release manager.
Line 80: Line 96:
  merge proposal on Launchpad. The release manager simply add a review of type
  `release-critical` to the merge proposal.
  merge proposal on Launchpad. The engineer adds a review of type
  `release-critical` to the merge proposal and ensures it is in the 'Needs Review' state.
Line 83: Line 99:
  * Any issues found during QA that is bound to create OOPSes, time outs or be
  v
ery inconveniencing to users are good candidate for release-critical
  approval
.
  * Good candidates for release-critical approval are issues found during QA that are
 
bound to create OOPSes and time outs or otherwise significantly inconvenience our end-users.
Line 87: Line 102:
  * Apart special exceptions discussed with the project lead, only bug fixes   * Apart from special exceptions discussed with the project lead, only bug fixes
Line 90: Line 105:
  * If there is no way that the developer can QA his change on staging through
  the normal update procedure before the roll-out, for complex changes, it's
 
recommended to ask a cow-boy of the branch on staging to QA it before
 
approval.
  * If there is no way for the developer to QA his change on staging through
  the normal update procedure before the roll-out, it's recommended to request
 
a cowboy of the branch on staging to QA it before approval.
Line 95: Line 109:
  * For the second roll-out, any change requiring database changes should go
  through the project lead, since a re-roll with a DB updates creates
significant down-time for our users.
  * For the second roll-out (a.k.a. the re-roll), any change requiring database changes should go
  through the project lead, since DB updates seriously increase the length of the upgrade window.
Line 102: Line 115:
  * Engineer apply in advance for one cycle.   * Engineers apply in advance for one cycle.
Line 104: Line 117:
  * They are selected by the previous release manager. Once selected, their   * They are selected by the previous release manager. Once selected, their name
Line 107: Line 120:
  * The actual roll-out time is determined based on the release-manager
 
location:
  * The actual roll-out time is determined by the release-manager's location:
Line 115: Line 127:
  * No engineer can apply for the role more than twice a year.   * No engineer should apply for the role more than twice a year.

== References ==

 * [[https://wiki.canonical.com/InformationInfrastructure/OSA/LaunchpadRollout | OSA Launchpad Rollout Procedures]]
 * SpuriousFailures -- useful for diagnosing last-minute build failures
 * CurrentRolloutBlockers -- things currently blocking rollout
 * QATeam/TestPlans -- proof from all the teams that their code works
 * [[https://wiki.canonical.com/InformationInfrastructure/OSA/LaunchpadProductionStatus|Launchpad production status]]

  • Process Name: Release Manager Rotation Process

  • Process Owner: Francis Lacoste

  • Parent Process/Activity: None

  • Supported Policy: None

Process Overview

Each cycle a different engineer takes the role of release manager. The release manager coordinates with the release team and all team leads to ensure that the tree is ready for the roll-out and that all critical bugs are in or worked-around.

Back-up release managers are the two RMs from the previous two cycles.

Release Manager inputs

  • Email and IRC messages from engineers and team leads.
  • OOPS reports
  • Merge proposals

Activities

Before the roll-out

  • At the beginning of week 4. Make sure that release-critical was turned on in PQM. (Monday 00:00 UTC)
  • Determine the schedule and deadlines. Send an email to launchpad-dev with all of the deadlines, similar to this example email. Place the deadlines on the team calendar.

  • Update the #launchpad-dev topic to state we are in 'Release Critical' and to list the release manager.

  • Maintain the list of the Current roll-out blockers

    • The release manager should poll the team leads and QA engineers continuously to ensure that the list of release blockers is up-to-date. (We need to explore a work-around to retire this wiki page and do the management in Launchpad.) All bugs that are likely to cause lots of OOPSes, time-outs or prevent several users from working are good CRB candidates. It's a good idea to subscribe yourself to the page. (Currently broken.)
  • Make sure that developers are assigned to all problems we want to fix.
  • Review release-critical merge proposals. The policy should be:
    • All RC candidates go through the normal review process.
    • After code and UI review the MP is left in 'Needs Review' state.
    • A new review of type 'release-critical' is added to the MP and assigned to the release manager.
    • If the MP is approved for 'release-critical', the review is marked 'Approve' and the state of the MP is set to 'Approved'.

On the day before the roll-out

  • Request that landing to the devel branch be closed, 24 hours before the scheduled release. All changes should on the last day be merged through db-devel.

On the day of the roll-out

  • Chase up Current Rollout Blockers and any other pending release-critical fixes.

  • With PQM remaining open, have the LOSAs stop buildbot and set it do manual runs.
  • Remind people that all changes need to be in buildbot for 9 hours before the roll-out time. The LOSAs require two hours of pre-release preparation and we need to allow for two complete buildbot cycles. (9 = 2 + 2 * 3.5)

  • In the case of failures, it's best to roll-out the last-known-good-build rather than delaying the release. The cut-off point to decide which revision

    to roll out is 2 hours before the scheduled release.

  • Ensure that any embargoed external resources (e.g. blog entries) are live and accessible through the links provided. Ensure that a blog editor (Matthew Revell or delegate) is available at the time of the roll-out.
  • Immediately after the roll-out, examine the site for problems. For example, ensure CSS loads properly, all external links on the front page are reachable, etc.

After the roll-out

  • With the QA engineers, review the OOPS reports.
    • All common OOPSes are candidates for more release-critical fixes and scheduling another roll-out.
  • Prepare and schedule any necessary re-roll.
  • When a re-roll is needed, same activities than in the pre-roll out case.
  • Open the tree, when the released version is fine for the next cycle.
  • The release-manager needs to select the next release manager.

Release critical policy

  • To apply for a release-critical approval, you must have a reviewed merge proposal on Launchpad. The engineer adds a review of type

    release-critical to the merge proposal and ensures it is in the 'Needs Review' state.

  • Good candidates for release-critical approval are issues found during QA that are bound to create OOPSes and time outs or otherwise significantly inconvenience our end-users.
  • Apart from special exceptions discussed with the project lead, only bug fixes should be granted release-critical approval.
  • If there is no way for the developer to QA his change on staging through the normal update procedure before the roll-out, it's recommended to request a cowboy of the branch on staging to QA it before approval.
  • For the second roll-out (a.k.a. the re-roll), any change requiring database changes should go through the project lead, since DB updates seriously increase the length of the upgrade window.

Scheduling

  • Engineers apply in advance for one cycle.
  • They are selected by the previous release manager. Once selected, their name

    is put on the Launchpad Production Status page.

  • The actual roll-out time is determined by the release-manager's location:
    • Location

      Roll out time

      Americas

      00:00UTC

      Europe

      09:00UTC

      Asia/Pacific

      00:00UTC

  • No engineer should apply for the role more than twice a year.

References

PolicyAndProcess/Downtime (last edited 2011-06-06 22:02:02 by flacoste)