6771
Comment:
|
7203
erm, adding a missing 'not' to the sentence.
|
Deletions are marked like this. | Additions are marked like this. |
Line 33: | Line 33: |
* You might want to request that pqm-blockers are processed (such as cherry-picks) on the day that PQM is closing. | * You might want to request that pqm-blockers (such as cherry-picks) are not processed on the day that PQM is closing. |
Line 37: | Line 37: |
* At the beginning of week 4, schedule a call with the Foundations team lead (and other leads if known to be pertinent) to determine what system changes might need comprehensive QA. If these exist, consider these thoughts. * Any related problem encountered should be treated as a red flag, forcing more thorough QA. * Foundations lead should report on reviewing logs on edge, such as of cronscript output. |
Process Name: Release Manager Rotation Process
Process Owner: Francis Lacoste
Parent Process/Activity: None
Supported Policy: None
Process Overview
Each cycle a different engineer takes the role of release manager. The release manager coordinates with the release team and all team leads to ensure that the tree is ready for the roll-out and that all critical bugs are in or worked-around.
Back-up release managers are the two RMs from the previous two cycles.
One option that has worked very well is to share the release-manager role across timezones, handing over the current status and tasks to the backup in the next timezone at the end of your day. It's a great opportunity to work together as a team.
Release Manager inputs
- Email and IRC messages from engineers and team leads.
- OOPS reports
- Merge proposals
Activities
Before the roll-out
- During week 3 ensure that staging is up-to-date so that it can be used for non-edge QA.
- During week 3 ensure that a downtime announcement email is ready for lp-announce.
- You might want to request that pqm-blockers (such as cherry-picks) are not processed on the day that PQM is closing.
- At the beginning of week 4. Make sure that release-critical was turned on in PQM. (Monday 00:00 UTC)
- At the beginning of week 4, schedule a call with the Foundations team lead (and other leads if known to be pertinent) to determine what system changes might need comprehensive QA. If these exist, consider these thoughts.
- Any related problem encountered should be treated as a red flag, forcing more thorough QA.
- Foundations lead should report on reviewing logs on edge, such as of cronscript output.
Determine the schedule and deadlines. Send an email to launchpad-dev with all of the deadlines, similar to this example email. Place the deadlines on the team calendar.
Update the #launchpad-dev topic to state we are in 'Release Critical' and to list the release manager.
Maintain the list of the Current roll-out blockers
- The release manager should poll the team leads and QA engineers continuously to ensure that the list of release blockers is up-to-date. (We need to explore a work-around to retire this wiki page and do the management in Launchpad.) All bugs that are likely to cause lots of OOPSes, time-outs or prevent several users from working are good CRB candidates. It's a good idea to subscribe yourself to the page. (Currently broken.)
- Make sure that developers are assigned to all problems we want to fix.
- Review release-critical merge proposals. The policy should be:
- All RC candidates go through the normal review process.
- After code and UI review the MP is left in 'Needs Review' state.
- A new review of type 'release-critical' is added to the MP and assigned to the release manager.
- If the MP is approved for 'release-critical', the review is marked 'Approve' and the state of the MP is set to 'Approved'.
On the day before the roll-out
Request that landing to the devel branch be closed, 24 hours before the scheduled release. All changes should on the last day be merged through db-devel.
On the day of the roll-out
Chase up Current Rollout Blockers and any other pending release-critical fixes.
- With PQM remaining open, have the LOSAs stop buildbot and set it do manual runs.
Remind people that all changes need to be in buildbot for 9 hours before the roll-out time. The LOSAs require two hours of pre-release preparation and we need to allow for two complete buildbot cycles. (9 = 2 + 2 * 3.5)
- In the case of failures, it's best to roll-out the last-known-good-build rather than delaying the release. The cut-off point to decide which revision
to roll out is 2 hours before the scheduled release.
- Ensure that any embargoed external resources (e.g. blog entries) are live and accessible through the links provided. Ensure that a blog editor (Matthew Revell or delegate) is available at the time of the roll-out.
- Immediately after the roll-out, examine the site for problems. For example, ensure CSS loads properly, all external links on the front page are reachable, etc.
After the roll-out
- With the QA engineers, review the OOPS reports.
- All common OOPSes are candidates for more release-critical fixes and scheduling another roll-out.
- Prepare and schedule any necessary re-roll.
- When a re-roll is needed, same activities than in the pre-roll out case.
- Open the tree, when the released version is fine for the next cycle.
- The release-manager needs to select the next release manager.
Release critical policy
- To apply for a release-critical approval, you must have a reviewed merge proposal on Launchpad. The engineer adds a review of type
release-critical to the merge proposal and ensures it is in the 'Needs Review' state.
- Good candidates for release-critical approval are issues found during QA that are bound to create OOPSes and time outs or otherwise significantly inconvenience our end-users.
- Apart from special exceptions discussed with the project lead, only bug fixes should be granted release-critical approval.
- If there is no way for the developer to QA his change on staging through the normal update procedure before the roll-out, it's recommended to request a cowboy of the branch on staging to QA it before approval.
- For the second roll-out (a.k.a. the re-roll), any change requiring database changes should go through the project lead, since DB updates seriously increase the length of the upgrade window.
Scheduling
- Engineers apply in advance for one cycle.
- They are selected by the previous release manager. Once selected, their name
is put on the Launchpad Production Status page.
- The actual roll-out time is determined by the release-manager's location:
Location
Roll out time
Americas
00:00UTC
Europe
09:00UTC
Asia/Pacific
00:00UTC
- No engineer should apply for the role more than twice a year.
References
SpuriousFailures -- useful for diagnosing last-minute build failures
CurrentRolloutBlockers -- things currently blocking rollout
QATeam/TestPlans -- proof from all the teams that their code works