7216
Comment:
|
7687
Add notes about ISD and staging the code deployment
|
Deletions are marked like this. | Additions are marked like this. |
Line 32: | Line 32: |
* During week 3 send out an email to Stuart Metcalf from Canonical ISD about the upcoming roll-out to ask about any changes that need to be rolled-out for canonical-identity-provider or shipit. (Launchpad roll-outs imply a roll-out of the Canonical Identity Provider and ShipIt code). |
|
Line 69: | Line 74: |
* Check that the LOSA do a staged deployment of the code. We are looking for any hidden build problems and to determine the amount of time this step will take. |
Process Name: Release Manager Rotation Process
Process Owner: Francis Lacoste
Parent Process/Activity: None
Supported Policy: None
Process Overview
Each cycle a different engineer takes the role of release manager. The release manager coordinates with the release team and all team leads to ensure that the tree is ready for the roll-out and that all critical bugs are in or worked-around.
Back-up release managers are the two RMs from the previous two cycles.
One option that has worked very well is to share the release-manager role across timezones, handing over the current status and tasks to the backup in the next timezone at the end of your day. It's a great opportunity to work together as a team.
Release Manager inputs
- Email and IRC messages from engineers and team leads.
- OOPS reports
- Merge proposals
Activities
Before the roll-out
- During week 3 ensure that staging is up-to-date so that it can be used for non-edge QA.
- During week 3 ensure that Matt Revell has a downtime announcement email ready for lp-announce.
- During week 3 send out an email to Stuart Metcalf from Canonical ISD about the upcoming roll-out to ask about any changes that need to be rolled-out for canonical-identity-provider or shipit. (Launchpad roll-outs imply
a roll-out of the Canonical Identity Provider and ShipIt code).
- You might want to request that pqm-blockers (such as cherry-picks) are not processed on the day that PQM is closing.
- At the beginning of week 4. Make sure that release-critical was turned on in PQM. (Monday 00:00 UTC)
- At the beginning of week 4, schedule a call with the Foundations team lead (and other leads if known to be pertinent) to determine what system changes might need comprehensive QA. If these exist, consider these thoughts.
- Any related problem encountered should be treated as a red flag, forcing more thorough QA.
- Foundations lead should report on reviewing logs on edge, such as of cronscript output.
Determine the schedule and deadlines. Send an email to launchpad-dev with all of the deadlines, similar to this example email. Place the deadlines on the team calendar.
Update the #launchpad-dev topic to state we are in 'Release Critical' and to list the release manager.
Maintain the list of the Current roll-out blockers
- The release manager should poll the team leads and QA engineers continuously to ensure that the list of release blockers is up-to-date. (We need to explore a work-around to retire this wiki page and do the management in Launchpad.) All bugs that are likely to cause lots of OOPSes, time-outs or prevent several users from working are good CRB candidates. It's a good idea to subscribe yourself to the page. (Currently broken.)
- Make sure that developers are assigned to all problems we want to fix.
- Review release-critical merge proposals. The policy should be:
- All RC candidates go through the normal review process.
- After code and UI review the MP is left in 'Needs Review' state.
- A new review of type 'release-critical' is added to the MP and assigned to the release manager.
- If the MP is approved for 'release-critical', the review is marked 'Approve' and the state of the MP is set to 'Approved'.
On the day before the roll-out
- Check that the LOSA do a staged deployment of the code. We are looking for any hidden build problems and to determine the amount of time this step will take.
Request that landing to the devel branch be closed, 24 hours before the scheduled release. All changes should on the last day be merged through db-devel.
On the day of the roll-out
Chase up Current Rollout Blockers and any other pending release-critical fixes.
- With PQM remaining open, have the LOSAs stop buildbot and set it do manual runs.
Remind people that all changes need to be in buildbot for 9 hours before the roll-out time. The LOSAs require two hours of pre-release preparation and we need to allow for two complete buildbot cycles. (9 = 2 + 2 * 3.5)
- In the case of failures, it's best to roll-out the last-known-good-build rather than delaying the release. The cut-off point to decide which revision
to roll out is 2 hours before the scheduled release.
- Ensure that any embargoed external resources (e.g. blog entries) are live and accessible through the links provided. Ensure that a blog editor (Matthew Revell or delegate) is available at the time of the roll-out.
- Immediately after the roll-out, examine the site for problems. For example, ensure CSS loads properly, all external links on the front page are reachable, etc.
After the roll-out
- With the QA engineers, review the OOPS reports.
- All common OOPSes are candidates for more release-critical fixes and scheduling another roll-out.
- Prepare and schedule any necessary re-roll.
- When a re-roll is needed, same activities than in the pre-roll out case.
- Open the tree, when the released version is fine for the next cycle.
- The release-manager needs to select the next release manager.
Release critical policy
- To apply for a release-critical approval, you must have a reviewed merge proposal on Launchpad. The engineer adds a review of type
release-critical to the merge proposal and ensures it is in the 'Needs Review' state.
- Good candidates for release-critical approval are issues found during QA that are bound to create OOPSes and time outs or otherwise significantly inconvenience our end-users.
- Apart from special exceptions discussed with the project lead, only bug fixes should be granted release-critical approval.
- If there is no way for the developer to QA his change on staging through the normal update procedure before the roll-out, it's recommended to request a cowboy of the branch on staging to QA it before approval.
- For the second roll-out (a.k.a. the re-roll), any change requiring database changes should go through the project lead, since DB updates seriously increase the length of the upgrade window.
Scheduling
- Engineers apply in advance for one cycle.
- They are selected by the previous release manager. Once selected, their name
is put on the Launchpad Production Status page.
- The actual roll-out time is determined by the release-manager's location:
Location
Roll out time
Americas
00:00UTC
Europe
09:00UTC
Asia/Pacific
00:00UTC
- No engineer should apply for the role more than twice a year.
References
SpuriousFailures -- useful for diagnosing last-minute build failures
CurrentRolloutBlockers -- things currently blocking rollout
QATeam/TestPlans -- proof from all the teams that their code works