Launchpad Production oops-tools setup
The Launchpad instance of OOPS-Tools is run on devpad under the lpqateam role account.
OOPSes are transported AMQP and for some legacy systems that haven't had their deploy updated via rsync.
The web UI reports its own crashes via AMQP (back to itself - kindof meta:)). See production.cfg for the credentials etc (and don't ever 'bzr revert!').
Pruning is done on devpad via the datedir-repo bin/prune tool; db record pruning is implemented in python-oops-tools but not automated on devpad yet (there are 27M references to cleanup, once thats done it can be put in cron). See crontab of lpqateam to find the pruner. There is not automation etc of its deployments as yet. Pruning is done against launchpad-project - things not in that project group will not have their oopses preserved.
oops-tools deployment
TL;DR: cd /srv/lp-oops.canonical.com/cgi-bin/lpoops && ./deploy.sh
To deploy a new version of LP's oops-tools instance:
ssh devpad sudo su - lpqateam cd /srv/lp-oops.canonical.com/cgi-bin/lpoops pkill amqp2disk bzr up bzr up download-cache # then - see below - bin/buildout
Do not run make or bin/buildout without arguments. bin/buildout -c production.cfg is what you want, otherwise you'll clobber the DB config in src/oopstools/settings.py. If you do, you might have to delete that file before it will regenerate.
If there are changes to the models and/or you need to do data migration (a new migration has been created) then run:
bin/django migrate
Finally activate the new code:
- Ask a LOSA to graceful apache
Run ./run-amqp2disk.sh to start the OOPS AMQP consumers again. (That script runs three consumers, one per LP instance, writing to /srv/oops-amqp/$instance - e.g. /srv/oops-amqp/production). It is not part of the source tree, and can be edited to run more consumers, or to consume from new AMQP sources.
cronjobs
There are some cronjobs setup in the lpqateam crontab.
- update_db: script that loads oops reports from the file system into the oops-tools database.
- dir_finder: script that searches the file system for new oops directories and build the cache used by the update_db script to look up for oopses.
- report: script that saves a daily aggregation of OOPS report that happened in the previous day. It also emails the report to the launchpad mailing list.
Loading new prefixes
New prefixes are automatically accepted. Once an oops from one has been seen they need to be assigned to reports within oops-tools.
Deploying locally (e.g. devpad)
Follow src/oopstools/README.txt to get an instance up and running.
Use bin/amqp2disk --host XXXX --username guest --password guest --output . --queue oops-tools --bind-to oopses to setup and bind to a new exchange on your local rabbit for experimenting with Launchpad. See my notes for more info.
Admin
Creating new reports
To create a new report which will be sent to the LP mailing list daily go to: https://lp-oops.canonical.com/admin/oops/report. If you don't have access run bin/django createsuperuser.
There you can add a new report or change the existing ones.
Each report is composed by a name, the title of the report which will be used in the email sent to the list, a summary type, which know how to group the OOPSes and render the html and txt output and prefixes, which are the OOPS prefixes in the given report.
Diagnostic hints
There are queue consumers (amqp2disk processes) for production, staging, and qastaging.
The consumers should generally consume very little CPU time, and should normally be sleeping ("S" in the process state column of ps ax | grep amqp2disk) and rarely running ("R").
To verify that amqp2disk is processing events it can be stopped and then run in the foreground with the -v switch.
You can use lp:pythn-oops-amqp's oops-amqp-trace command to see that oopses are flowing: bin/oops-amqp-trace --host localhost