Diff for "LoggingOopses"

Not logged in - Log In / Register

Differences between revisions 12 and 31 (spanning 19 versions)
Revision 12 as of 2009-10-29 07:30:27
Size: 5194
Editor: jtv
Comment:
Revision 31 as of 2011-10-31 10:43:45
Size: 7466
Editor: lifeless
Comment: bit of a face lift
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
||<tablestyle="float:right; font-size: 0.9em; width:30%; background:#F1F1ED; margin: 0 0 1em 1em;" style="padding:0.5em;"><<TableOfContents>>||
Line 3: Line 5:
'''WARNING: Work in progress. Needs improvement. If you think you know better, you probably do—so hit Edit!''' '''WARNING: Needs improvement. If you think you know better, you probably do—so hit Edit!'''
Line 16: Line 18:
== Choosing an oops code == '''You'll want to run this before you start with any of this, to hide the differences between local setups:'''
{{{
. ~/.rocketfuel-env.sh
}}}
Line 18: Line 23:
'''TODO: How exactly are oops codes chosen?'''
Line 20: Line 24:
The first thing you do is pick a code for your oopses. The code is included in the file names for oops reports, and will be the first indication that an oops was generated by your script. == Choosing an oops prefix ==
Line 22: Line 26:
Oops codes should be: The first thing you do is pick a prefix for your oopses. The prefix is used to distinguish oopses from different scripts.

The prefix can be anything, but we prefer:
Line 27: Line 33:
Line 30: Line 35:
The oops logging machinery comes from Zope, and is lightly extended in Lazr. It was designed for web applications, so expect some twists and turns as you try to squeeze a script error into the mould of a web request. There are two oops stacks - a native oops stack in python-oops and friends, and the web stack which layers on this in the canonical.launchpad.webapp.errorlog module.
Line 32: Line 37:
'''TODO: Verify that bit of possible nonsense.''' In LP scripts, the easiest way is to log a warning or error:
{{{
logging.getLogger('foo').warning('bar')
}}}
Line 34: Line 42:
'''TODO: Describe what's needed in the code.''' If you have the LP zope environment and are not in an LP script then you can using the ErrorReportingUtility:
{{{
import sys

# Import oops logging support.
from canonical.launchpad.webapp import errorlog

# ...

# Get traceback information.
exception_info = sys.exc_info()

# Describe the failure as a list of key/value pairs.
description = [
    ('key1', value1),
    ('key2', value2),
    ]

# Create a request to hold your failure description, as if
# the failure happened while servicing a web request.
request = errorlog.ScriptRequest(description)

# Report the oops.
errorlog.globalErrorUtility.raising(exception_info, request)
}}}

Finally, outside of those environments, you can write to the oops stack directly:
{{{
from functools import partial

from amqplib import client_0_8 as amqp
from oops import (
    Config,
    publish_new_only,
    )
from oops_amqp import Publisher
from oops_datedir_repo import DateDirRepo

config = Config()
# Get all the parameters from options or a config file or whatever.
factory = partial(amqp.Connection, host=xxx, userid=xxx, password=xxx, virtual_host=xxxx)
config.publishers.append(Publisher(factory, "oopses", ""))
datedirrepo = DateDirRepo('path-to-output')
config.publishers.append(publish_new_only(datedirrepo))

context = {}
oops = config.create(context)
oops_ids = config.publish(oops)
}}}

Your tests should wire up your oops config back to a receipient (e.g. amqp and listen to that) and generate a single oops, to be sure your codepath works. This can be a little tedious but straight forward. Production config shouldn't be tested by your code tests though!

== In twistd / twisted daemons ==

{{{
from lp.services.twistedsupport.loggingsupport import set_up_oops_reporting
set_up_oops_reporting('loggername', 'configsection')
}}}
The config section needs its schema changed to have an oops prefix, just like any other process.
Line 37: Line 103:

The new type of oops needs to be configured with its own oops prefix code, as well as a storage location for its oops reports. We have separate configurations for test runs, local runs, staging, and production. It's important to get all of these right: the test suite can't catch configuration mistakes except in its own config.
Line 41: Line 109:
You'll need to add some configuration items to the configuration files in {{{configs/}}}. This is in the {{{bug-999999}}} branch.

'''TODO: Refer to configs documentation.'''
Still in the {{{bug-999999}}} branch, you'll be adding items to two configuration files in {{{configs/}}}.
Line 64: Line 130:
'''TODO: Can map multiple scripts' oopses to one error_dir.''' '''TODO: Multiple scripts' oopses can map to one error_dir.'''
Line 82: Line 148:
error_dir: /var/tmp/frobnicate
Line 84: Line 151:
The {{{testrunner}}} configuration is derived from the {{{development}}} one, so no need to specify the same {{{error_dir}}} again here. The {{{testrunner}}} configuration is derived from the {{{development}}} one, so any settings made there but not here are inherited. But keeping separate {{{error_dir}}} settings for these two configurations puts the oops reports from test runs and from manual local runs separate.
Line 91: Line 158:
You can also write tests to test OOPS generation explicitly. Unfortunately, the infrastructure for that is still rather primitive, but it can be done: see bug Bug:567257 and bug Bug:567689 for more information.
Line 94: Line 163:
You need to add configuration items to the production configs as well.  Make your changes in a branch of {{{lp-production-configs}}}, separate from the {{{bug-999999}}} branch. You need to add configuration items to the production configs as well.
Line 96: Line 165:
This is not a regular {{{launchpad}}} branch but once from a separate project, so create your local branch somewhere ''outside'' the directory (usually called {{{lp-branches}}}) where you keep your regular Launchpad branches! So: To do this, create a branch of the [[WorkingWithProductionConfigs|Launchpad production configs]]. We'll call the branch {{{production-configs-bug-999999}}}.
Line 98: Line 167:
{{{
cd ~/canonical
# (Or wherever, as long as it isn't where your Launchpad branches are.)

# Set up a place for lp-production-configs branches.
mkdir -p lp-production-configs
cd lp-production-configs
bzr branch lp:~launchpad-pqm/lp-production-configs/trunk

# Create your working branch.
bzr branch trunk production-configs-bug-999999
cd production-configs-bug-999999
}}}

Now m
ake your changes in production-configs-bug-999999. Add the following snippet to {{{staging-lazr.conf}}}, in the config section for your script. If the section did not exist, create an empty one.
Make your changes in production-configs-bug-999999. Add the following snippet to {{{staging-lazr.conf}}}, in the config section for your script. If the section did not exist, create an empty one.
Line 118: Line 173:
'''TODO: A different hostname may be appropriate.''' If your script runs on another server, e.g. {{{bazaar.staging.launchpad.net}}}, then you may want a directory in {{{/srv}}} that's named after a different host.
Line 131: Line 186:
Now propose this branch for merging into {{{lp:~launchpad-pqm/lp-production-configs}}}. This is the default target for the merge proposal.

'''TODO: Who needs to review this branch?'''

Once the merge proposal has been approved, submit to PQM for landing.

'''TODO: How to submit to PQM?'''

'''TODO: What's the loadtest config for?'''
Get this branch through review & landed (see WorkingWithProductionConfigs). You'll definitely want to Q/A on staging before you land.
Line 142: Line 189:
==== Make sure it works === ==== Make sure it works ====
Line 144: Line 191:
'''TODO: What is the Q/A procedure? Can we get this branch on staging first?''' During QA try and trigger an OOPS (easy if you have a test mechanism, hopefully hard otherwise!). It should sync over amqp instantly. If it doesn't there is an issue ;).

== Oops reports ==

New prefixes need to be added to oops reports - see [[QA/OopsToolsSetup|the docs]].

Logging Oopses

WARNING: Needs improvement. If you think you know better, you probably do—so hit Edit!

This page describes how to add oops logging to your script. All fatal errors that need operator or developer attention should be logged as oopses.

Assumptions for examples:

  • Your Launchpad login name is "me."
  • The script you're working on is called frobnicate.

  • Your oopses have a prefix code of FROB.
  • Adding the oops logging is fixing bug 999999.
  • You're working in a branch of devel or db-devel called bug-999999.

Where you see these in the examples, replace them with whatever you've got.

You'll want to run this before you start with any of this, to hide the differences between local setups:

. ~/.rocketfuel-env.sh

Choosing an oops prefix

The first thing you do is pick a prefix for your oopses. The prefix is used to distinguish oopses from different scripts.

The prefix can be anything, but we prefer:

  • All upper-case ASCII letters: [A-Z]+

  • Short, typically 2—4 letters.
  • Unique.

Code

There are two oops stacks - a native oops stack in python-oops and friends, and the web stack which layers on this in the canonical.launchpad.webapp.errorlog module.

In LP scripts, the easiest way is to log a warning or error:

logging.getLogger('foo').warning('bar')

If you have the LP zope environment and are not in an LP script then you can using the ErrorReportingUtility:

import sys

# Import oops logging support.
from canonical.launchpad.webapp import errorlog

# ...

# Get traceback information.
exception_info = sys.exc_info()

# Describe the failure as a list of key/value pairs.
description = [
    ('key1', value1),
    ('key2', value2),
    ]

# Create a request to hold your failure description, as if
# the failure happened while servicing a web request.
request = errorlog.ScriptRequest(description)

# Report the oops.
errorlog.globalErrorUtility.raising(exception_info, request)

Finally, outside of those environments, you can write to the oops stack directly:

from functools import partial

from amqplib import client_0_8 as amqp
from oops import (
    Config,
    publish_new_only,
    )
from oops_amqp import Publisher
from oops_datedir_repo import DateDirRepo

config = Config()
# Get all the parameters from options or a config file or whatever.
factory = partial(amqp.Connection, host=xxx, userid=xxx, password=xxx, virtual_host=xxxx)
config.publishers.append(Publisher(factory, "oopses", ""))
datedirrepo = DateDirRepo('path-to-output')
config.publishers.append(publish_new_only(datedirrepo))

context = {}
oops = config.create(context)
oops_ids = config.publish(oops)

Your tests should wire up your oops config back to a receipient (e.g. amqp and listen to that) and generate a single oops, to be sure your codepath works. This can be a little tedious but straight forward. Production config shouldn't be tested by your code tests though!

In twistd / twisted daemons

from lp.services.twistedsupport.loggingsupport import set_up_oops_reporting
set_up_oops_reporting('loggername', 'configsection')

The config section needs its schema changed to have an oops prefix, just like any other process.

Configuration

The new type of oops needs to be configured with its own oops prefix code, as well as a storage location for its oops reports. We have separate configurations for test runs, local runs, staging, and production. It's important to get all of these right: the test suite can't catch configuration mistakes except in its own config.

In your branch

Still in the bug-999999 branch, you'll be adding items to two configuration files in configs/.

Your script must have a configuration section in configs/schema-lazr.conf. It should probably look a lot like this:

[frobnicate]
dbuser: frobnicate
storm_cache: generational
storm_cache_size: 500

If the section does not exist yet, this snippet should be a good start for creating it.

To this section, add blank default config items for your oopses:

oops_prefix: none
error_dir: none
copy_to_zlog: false

TODO: Explain what these do.

TODO: Multiple scripts' oopses can map to one error_dir.

The specific values will go into specific configs for various setups. Here, if the configuration files do not have frobnicate sections, just create them by adding the header:

[frobnicate]

In configs/development/launchpad-lazr.conf add these to the frobnicate section:

oops_prefix: FROB
error_dir: /var/tmp/frobnicate.test

Note the "T" prefix to the oops code, and the ".test" suffix to the oops directory.

Similarly, in configs/testrunner/launchpad-lazr.conf, under in the frobnicate header, add:

oops_prefix: TFROB
error_dir: /var/tmp/frobnicate

The testrunner configuration is derived from the development one, so any settings made there but not here are inherited. But keeping separate error_dir settings for these two configurations puts the oops reports from test runs and from manual local runs separate.

Make sure it works

Once you've configured and programmed your oopses, you can run your tests to trigger them and you will see the oops reports appearing in /var/tmp/frobnicate.test/, neatly categorized by date. Since they're in /var/tmp/, they'll be cleaned up on reboot.

You can also write tests to test OOPS generation explicitly. Unfortunately, the infrastructure for that is still rather primitive, but it can be done: see bug 567257 and bug 567689 for more information.

In the production world

You need to add configuration items to the production configs as well.

To do this, create a branch of the Launchpad production configs. We'll call the branch production-configs-bug-999999.

Make your changes in production-configs-bug-999999. Add the following snippet to staging-lazr.conf, in the config section for your script. If the section did not exist, create an empty one.

oops_prefix: SFROB
error_dir: /srv/staging.launchpad.net/staging-logs/frobnicate

If your script runs on another server, e.g. bazaar.staging.launchpad.net, then you may want a directory in /srv that's named after a different host.

To production/launchpad-lazr.conf, add this to the frobnicate section:

oops_prefix: FROB
error_dir: /srv/launchpad.net/production-logs/frobnicate

Commit the changes and push your branch:

bzr push lp:~me/lp-production-configs/production-configs-bug-999999

Get this branch through review & landed (see WorkingWithProductionConfigs). You'll definitely want to Q/A on staging before you land.

Make sure it works

During QA try and trigger an OOPS (easy if you have a test mechanism, hopefully hard otherwise!). It should sync over amqp instantly. If it doesn't there is an issue ;).

Oops reports

New prefixes need to be added to oops reports - see the docs.

LoggingOopses (last edited 2011-10-31 10:43:45 by lifeless)