Launchpad Production oops-tools setup

The Launchpad instance of OOPS-Tools is run on devpad under the lpqateam role account.

OOPSes are transported AMQP and for some legacy systems that haven't had their deploy updated via rsync.

The web UI reports its own crashes via AMQP (back to itself - kindof meta:)). See production.cfg for the credentials etc (and don't ever 'bzr revert!').

Pruning is done on devpad via the datedir-repo bin/prune tool; db record pruning is implemented in python-oops-tools but not automated on devpad yet (there are 27M references to cleanup, once thats done it can be put in cron). See crontab of lpqateam to find the pruner. There is not automation etc of its deployments as yet. Pruning is done against launchpad-project - things not in that project group will not have their oopses preserved.

oops-tools deployment

TL;DR: cd /srv/lp-oops.canonical.com/cgi-bin/lpoops && ./deploy.sh

To deploy a new version of LP's oops-tools instance:

ssh devpad
sudo su - lpqateam
cd /srv/lp-oops.canonical.com/cgi-bin/lpoops
pkill amqp2disk
bzr up
bzr up download-cache
# then - see below - bin/buildout

Do not run make or bin/buildout without arguments. bin/buildout -c production.cfg is what you want, otherwise you'll clobber the DB config in src/oopstools/settings.py. If you do, you might have to delete that file before it will regenerate.

If there are changes to the models and/or you need to do data migration (a new migration has been created) then run:

bin/django migrate

Finally activate the new code:

cronjobs

There are some cronjobs setup in the lpqateam crontab.

Loading new prefixes

New prefixes are automatically accepted. Once an oops from one has been seen they need to be assigned to reports within oops-tools.

Deploying locally (e.g. devpad)

Follow src/oopstools/README.txt to get an instance up and running.

Use bin/amqp2disk --host XXXX --username guest --password guest --output . --queue oops-tools --bind-to oopses to setup and bind to a new exchange on your local rabbit for experimenting with Launchpad. See my notes for more info.

Admin

Creating new reports

To create a new report which will be sent to the LP mailing list daily go to: https://lp-oops.canonical.com/admin/oops/report. If you don't have access run bin/django createsuperuser.

There you can add a new report or change the existing ones.

Each report is composed by a name, the title of the report which will be used in the email sent to the list, a summary type, which know how to group the OOPSes and render the html and txt output and prefixes, which are the OOPS prefixes in the given report.

Diagnostic hints

There are queue consumers (amqp2disk processes) for production, staging, and qastaging.

The consumers should generally consume very little CPU time, and should normally be sleeping ("S" in the process state column of ps ax | grep amqp2disk) and rarely running ("R").

To verify that amqp2disk is processing events it can be stopped and then run in the foreground with the -v switch.

You can use lp:pythn-oops-amqp's oops-amqp-trace command to see that oopses are flowing: bin/oops-amqp-trace --host localhost

QA/OopsToolsSetup (last edited 2012-09-21 00:14:34 by lifeless)