LoggingOopses

Not logged in - Log In / Register

Revision 23 as of 2009-11-18 13:13:17

Clear message

Logging Oopses

WARNING: Work in progress. Needs improvement. If you think you know better, you probably do—so hit Edit!

This page describes how to add oops logging to your script. All fatal errors that need operator or developer attention should be logged as oopses.

Assumptions for examples:

Where you see these in the examples, replace them with whatever you've got.

You'll want to run this before you start with any of this, to hide the differences between local setups:

. ~/.rocketfuel-env.sh

Choosing an oops code

TODO: How exactly are oops codes chosen?

The first thing you do is pick a code for your oopses. The code is included in the file names for oops reports, and will be the first indication that an oops was generated by your script.

Oops codes should be:

Code

The oops logging machinery comes from Zope, and is lightly extended in Lazr. It was designed for web applications, so expect some twists and turns as you try to squeeze a script error into the mould of a web request.

TODO: Verify that bit of potential nonsense.

In a nutshell:

import sys

# Import oops logging support.
from canonical.launchpad.webapp import errorlog

# ...

# Get traceback information.
exception_info = sys.exc_info()

# Describe the failure as a list of key/value pairs.
description = [
    ('key1', value1),
    ('key2', value2),
    ]

# Create a request to hold your failure description, as if
# the failure happened while servicing a web request.
request = errorlog.ScriptRequest(description)

# Report the oops.
errorlog.globalErrorUtility.raising(exception_info, request)

Generating oopses is hard to test, especially when it comes to proper configuration. If your script is breaking while trying to report an oops, you may not get any notice about it. So keep this part of your script simple!

Configuration

The new type of oops needs to be configured with its own oops prefix code, as well as a storage location for its oops reports. We have separate configurations for test runs, local runs, staging, and production. It's important to get all of these right: the test suite can't catch configuration mistakes except in its own config.

In your branch

Still in the bug-999999 branch, you'll be adding items to two configuration files in configs/.

TODO: Refer to lazr configs documentation.

Your script must have a configuration section in configs/schema-lazr.conf. It should probably look a lot like this:

[frobnicate]
dbuser: frobnicate
storm_cache: generational
storm_cache_size: 500

If the section does not exist yet, this snippet should be a good start for creating it.

To this section, add blank default config items for your oopses:

oops_prefix: none
error_dir: none
copy_to_zlog: false

TODO: Explain what these do.

TODO: Multiple scripts' oopses can map to one error_dir.

The specific values will go into specific configs for various setups. Here, if the configuration files do not have frobnicate sections, just create them by adding the header:

[frobnicate]

In configs/development/launchpad-lazr.conf add these to the frobnicate section:

oops_prefix: FROB
error_dir: /var/tmp/frobnicate.test

Note the "T" prefix to the oops code, and the ".test" suffix to the oops directory.

Similarly, in configs/testrunner/launchpad-lazr.conf, under in the frobnicate header, add:

oops_prefix: TFROB
error_dir: /var/tmp/frobnicate

The testrunner configuration is derived from the development one, so any settings made there but not here are inherited. But keeping separate error_dir settings for these two configurations puts the oops reports from test runs and from manual local runs separate.

Make sure it works

Once you've configured and programmed your oopses, you can run your tests to trigger them and you will see the oops reports appearing in /var/tmp/frobnicate.test/, neatly categorized by date. Since they're in /var/tmp/, they'll be cleaned up on reboot.

In the production world

You need to add configuration items to the production configs as well. Make these changes in a branch of lp-production-configs, separate from the bug-999999 branch.

This is not a regular launchpad branch but once from a separate project, so you'll want to create your local branch outside the directory where you keep your regular Launchpad branches ($LP_SHARED_REPO).

I'd suggest setting up a new repository next to the one that holds your Launchpad branches:

cd "$LP_PROJECT_ROOT"

# Set up a place for lp-production-configs branches.
bzr init-repo lp-production-configs
cd lp-production-configs
bzr branch lp:~launchpad-pqm/lp-production-configs/trunk

Even if you don't create a repository, you'll have to add wherever you put your lp-production-configs branch to .bazaar/locations.conf to make things work:

[/home/me/canonical/lp-production-configs]
pqm_email = launchpad@pqm.canonical.com
submit_branch = bzr+ssh://bazaar.launchpad.net/~launchpad-pqm/lp-production-configs/trunk
public_branch = bzr+ssh://bazaar.launchpad.net/~me/lp-production-configs
public_branch:policy = appendpath
push_location = bzr+ssh://me@bazaar.launchpad.net/%7Eme/lp-production-configs
push_location:policy = appendpath

Branching lp-production-configs

You'll need to branch off lp:~launchpad-pqm/lp-production-configs/trunk. If you set up the repository as suggested above:

cd "$LP_PROJECT_ROOT/lp-production-configs"
bzr branch trunk production-configs-bug-999999
cd production-configs-bug-999999

TODO: Make the setup use the appropriate LP setup variables so it's no longer based on my local directory structures.

Now make your changes in production-configs-bug-999999. Add the following snippet to staging-lazr.conf, in the config section for your script. If the section did not exist, create an empty one.

oops_prefix: SFROB
error_dir: /srv/staging.launchpad.net/staging-logs/frobnicate

TODO: A different hostname may be appropriate.

To production/launchpad-lazr.conf, add this to the frobnicate section:

oops_prefix: FROB
error_dir: /srv/launchpad.net/production-logs/frobnicate

Commit the changes and push your branch:

bzr push lp:~me/lp-production-configs/production-configs-bug-999999

Now propose this branch for merging into lp:~launchpad-pqm/lp-production-configs. This is the default target for the merge proposal. The branch needs to be reviewed by one of the LOSA's ("canonical-losas"); this is also the default choice of reviewer.

Make sure it works

Before you go to PQM, have the change cowboy-patched onto staging for testing. Then produce the oops there.

After a while, the oops report you generated should be mirrored to one of the directories in devpad:/srv/launchpad.net-logs/staging/. It's typically the subdirectory with the same name as the server the script ran on, in another subdirectory with today's date.

At the time of writing, the oopses from staging are synced to devpad every 3 minutes.

Land

Once you're satisfied that the oops are logged as expected, submit your lp-production-configs to PQM for landing. Use the regular bzr pqm-submit for this:

bzr pqm-submit -m "[r=myreviewer][bug=999999] Configure oops logging for frobnicate."

This makes use of the bzr configuration for your repository.

And now... wait!