Diff for "QA/OopsToolsSetup"

Not logged in - Log In / Register

Differences between revisions 10 and 22 (spanning 12 versions)
Revision 10 as of 2011-07-21 19:21:04
Size: 2501
Editor: matsubara
Comment: page was renamed from Foundations/QA/OopsToolsSetup
Revision 22 as of 2012-09-21 00:14:34
Size: 4349
Editor: lifeless
Comment: note the download cache update too
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from Foundations/QA/OopsToolsSetup
## page was renamed from Foundations/QA/OopsToolsDeployment
= Launchpad oops-tools setup =
||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||
Line 5: Line 3:
This page describes how to deploy new code to Launchpad's oops-tools instance, how and which cronjobs are configured, how to make the tool aware of new prefixes when those are added to Launchpad's config files and how to create new reports to be sent to the list. = Launchpad Production oops-tools setup =

The Launchpad instance of OOPS-Tools is run on devpad under the lpqateam role account.

OOPSes are transported AMQP and for some legacy systems that haven't had their deploy updated via rsync.

The web UI reports its own crashes via AMQP (back to itself - kindof meta:)). See production.cfg for the credentials etc (and don't ever 'bzr revert!').

Pruning is done on devpad via the datedir-repo bin/prune tool; db record pruning is implemented in python-oops-tools but not automated on devpad yet (there are 27M references to cleanup, once thats done it can be put in cron). See crontab of lpqateam to find the pruner. There is not automation etc of its deployments as yet. Pruning is done against launchpad-project - things not in that project group will not have their oopses preserved.
Line 8: Line 14:

TL;DR: {{{cd /srv/lp-oops.canonical.com/cgi-bin/lpoops && ./deploy.sh}}}
Line 15: Line 23:
pkill amqp2disk
Line 16: Line 25:
bzr up download-cache
# then - see below - bin/buildout
Line 17: Line 28:

'''Do not''' run `make` or `bin/buildout` without arguments. `bin/buildout -c production.cfg` is what you want, otherwise you'll clobber the DB config in `src/oopstools/settings.py`. If you do, you might have to delete that file before it will regenerate.
Line 24: Line 37:
Usually there's no need to restart the service after an update. Oops tools has
a monitor.py script (see src/oopstools/monitor.py) which monitors changes to .py files and when there are changes to them, it kills the process and starts a new one.
Finally activate the new code:

 * Ask a LOSA to graceful apache
 * Run {{{./run-amqp2disk.sh}}} to start the OOPS AMQP consumers again. (That script runs three consumers, one per LP instance, writing to /srv/oops-amqp/$instance - e.g. /srv/oops-amqp/production). It is not part of the source tree, and can be edited to run more consumers, or to consume from new AMQP sources.
Line 39: Line 54:
Once new prefixes are added to lp:lp-production-configs, they need to be added to oops-tools. Go to:
{{{
 https://lp-oops.canonical.com/prefixloader/
}}}
New prefixes are automatically accepted. Once an oops from one has been seen they need to be assigned to reports within oops-tools.
Line 44: Line 56:
and click the Load prefixes button. You should see a list of new prefixes added if there were any new ones to add. After loading the new prefixes be sure to edit the relevant reports under 'creating new reports'. = Deploying locally (e.g. devpad) =

Follow src/oopstools/README.txt to get an instance up and running.

Use bin/amqp2disk --host XXXX --username guest --password guest --output . --queue oops-tools --bind-to oopses to setup and bind to a new exchange on your local rabbit for experimenting with Launchpad. See [[https://lists.launchpad.net/launchpad-dev/msg08183.html|my notes]] for more info.

= Admin =
Line 53: Line 71:

== Diagnostic hints ==

There are queue consumers (amqp2disk processes) for production, staging,
and qastaging.

The consumers should generally consume very little CPU time, and should
normally be sleeping ("S" in the process state column of ps ax | grep
amqp2disk) and rarely running ("R").

To verify that amqp2disk is processing events it can be stopped and then
run in the foreground with the -v switch.

You can use lp:pythn-oops-amqp's oops-amqp-trace command to see that oopses are flowing:
{{{bin/oops-amqp-trace --host localhost}}}

Launchpad Production oops-tools setup

The Launchpad instance of OOPS-Tools is run on devpad under the lpqateam role account.

OOPSes are transported AMQP and for some legacy systems that haven't had their deploy updated via rsync.

The web UI reports its own crashes via AMQP (back to itself - kindof meta:)). See production.cfg for the credentials etc (and don't ever 'bzr revert!').

Pruning is done on devpad via the datedir-repo bin/prune tool; db record pruning is implemented in python-oops-tools but not automated on devpad yet (there are 27M references to cleanup, once thats done it can be put in cron). See crontab of lpqateam to find the pruner. There is not automation etc of its deployments as yet. Pruning is done against launchpad-project - things not in that project group will not have their oopses preserved.

oops-tools deployment

TL;DR: cd /srv/lp-oops.canonical.com/cgi-bin/lpoops && ./deploy.sh

To deploy a new version of LP's oops-tools instance:

ssh devpad
sudo su - lpqateam
cd /srv/lp-oops.canonical.com/cgi-bin/lpoops
pkill amqp2disk
bzr up
bzr up download-cache
# then - see below - bin/buildout

Do not run make or bin/buildout without arguments. bin/buildout -c production.cfg is what you want, otherwise you'll clobber the DB config in src/oopstools/settings.py. If you do, you might have to delete that file before it will regenerate.

If there are changes to the models and/or you need to do data migration (a new migration has been created) then run:

bin/django migrate

Finally activate the new code:

  • Ask a LOSA to graceful apache
  • Run ./run-amqp2disk.sh to start the OOPS AMQP consumers again. (That script runs three consumers, one per LP instance, writing to /srv/oops-amqp/$instance - e.g. /srv/oops-amqp/production). It is not part of the source tree, and can be edited to run more consumers, or to consume from new AMQP sources.

cronjobs

There are some cronjobs setup in the lpqateam crontab.

  • update_db: script that loads oops reports from the file system into the oops-tools database.
  • dir_finder: script that searches the file system for new oops directories and build the cache used by the update_db script to look up for oopses.
  • report: script that saves a daily aggregation of OOPS report that happened in the previous day. It also emails the report to the launchpad mailing list.

Loading new prefixes

New prefixes are automatically accepted. Once an oops from one has been seen they need to be assigned to reports within oops-tools.

Deploying locally (e.g. devpad)

Follow src/oopstools/README.txt to get an instance up and running.

Use bin/amqp2disk --host XXXX --username guest --password guest --output . --queue oops-tools --bind-to oopses to setup and bind to a new exchange on your local rabbit for experimenting with Launchpad. See my notes for more info.

Admin

Creating new reports

To create a new report which will be sent to the LP mailing list daily go to: https://lp-oops.canonical.com/admin/oops/report. If you don't have access run bin/django createsuperuser.

There you can add a new report or change the existing ones.

Each report is composed by a name, the title of the report which will be used in the email sent to the list, a summary type, which know how to group the OOPSes and render the html and txt output and prefixes, which are the OOPS prefixes in the given report.

Diagnostic hints

There are queue consumers (amqp2disk processes) for production, staging, and qastaging.

The consumers should generally consume very little CPU time, and should normally be sleeping ("S" in the process state column of ps ax | grep amqp2disk) and rarely running ("R").

To verify that amqp2disk is processing events it can be stopped and then run in the foreground with the -v switch.

You can use lp:pythn-oops-amqp's oops-amqp-trace command to see that oopses are flowing: bin/oops-amqp-trace --host localhost

QA/OopsToolsSetup (last edited 2012-09-21 00:14:34 by lifeless)