QA Shepherd project developer notes
Design notes for the Shepherd project.
Algorithm details can be found on the QAProcessContinuousRollouts page.
User stories
As a LOSA, I want to read an HTML report that tells me the latest fully QA'd revision of stable is, so that I can deploy that revision to production.
- As an operator, I want to run a command-line script that will tell me what landings the shepherd knows about, and what the shepherd thinks the most recent deployable revision is, so that I can deploy that revision manually.
- As a user, I want to read an HTML report that tells me whether my branch has been promoted from QA to production, so I do not have to rely on yet more email to tell me what is going on. (Ursinha, gary)
We probably want to publish this on https://qa.launchpad.net
As a quicker version we can rsync the report to some static location.
- As an operator and developer, I want to read in the log files the detailed reasons that the shepherd took a particular action, so that debugging the tool is easier. (mars)
the log should record both the current state of the change sources and the state transitions that the shepherd executes based on said current state.
- As an operator, I want to run the parts of the rollout process individually and on-demand, so that resolving problems is easier. (mars)
This implies a small sharp script for each part - finding branches, doing promotion, reporting the system state.
- As an operator, I want to run any script without worrying if another copy of the script is already running and stomping on the data, so that on-demand runs and maintenance are safer and easier.
This implies a single-instance log, like a PID file.
- As an operator, I want a loud warning if a PID file is more than X minutes old, so I have some foreknowledge that a hang or crash is blocking further updates from happening. (mars)
- As a user, I want to be warned once a day by email that my branch is ready for QA, so that I remember to do it. (mars, Ursinha)
- As a user, I want a message sent to a mailing list when a branch has been sitting in QA for more than X days, so that someone can roll back the branch and unblock updates. (mars, Ursinha)
This implies a QA policy with a grace period.
- As an operator, I want to toggle a switch that keeps the scripts from running, so that I can do maintenance and updates without hunting down and disabling a bunch of cron scripts. (mars)
Probably dropping a maintenance.txt file on disk would do this.
- As a user, I want the log file and HTML reports to tell me when updates were aborted because a maintenance.txt is in place, so that I know why updates aren't happening. (mars)
- As an operator, I want the log file and HTML reports to tell me how long ago the maintenance.txt file was created, so I know if someone forgot to remove the file by accident. (mars)
- As an operator, I would like the script to be deployed as the lpqateam user, so that I can find all of the parts of the QA process in the same place. (Ursinha, matsubara, gary)
- As an operator, I want a single push-button command to set maintenance mode, update the code, then remove maintenance mode again, so that I am less likely to make mistakes when deploying a new version. (mars)
Need both maintenance.txt and some way to wait for executing scripts to finish their work before proceeding.
Other notes
- The QA tagger can pass the revno, branch name, and linked bugs through to the shepherd using its database. This saves the shepherd from doing the work.
- As a developer, I want the qa-tagger to run in the same interpreter as the shepherd, so that I do not have to write tricky inter-process communication code.
- If the tagger passes only the in-QA revisions through to the shepherd, then the shepherd does not have to store any persistent state. The shepherd can simply (re)sort the entire list of in-qa revisions and write its report.