Diff for "ParallelTests"

Not logged in - Log In / Register

Differences between revisions 19 and 21 (spanning 2 versions)
Revision 19 as of 2012-01-03 13:14:59
Size: 3892
Editor: bac
Comment:
Revision 21 as of 2012-01-09 14:57:21
Size: 4149
Editor: gary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
= Design sketch =
 * bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners. (implemented)
 * Layers dynamically allocate/deallocate resources such as:
  * dbnames (implemented)
  * config files (implemented)
  * librarian work dir
  * librarian ports
  * keyserver work area
  * soyuz work area

Things that need specialist knowledge:
 * dynamically allocating ports for zope - port 8085 and 9025 specifically, which can then be fed back into e.g. zcml files/launchpad.conf.
 * Buildmaster slave tests hard code the xmlrpc port to 8221 everywhere.
Line 23: Line 9:
LXC containers combined with [[http://en.wikipedia.org/wiki/Aufs|aufs]] offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William has put together a proof of concept, and Robert has made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2251192/+files/lxc-start-aufs|generic]]. That combined with an updated .testr.conf (a TODO is to offer profiles for testr) like: {{{
[DEFAULT]
test_command=lxc-start-aufs $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test
--subunit $IDOPTION $LISTOPT
test_id_option=
--load-list $IDFILE
test_list_option=--list
}}}
will let testr run tests in a temporary container. (e.g. testr -- -
t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).
LXC containers combined with [[http://en.wikipedia.org/wiki/Aufs|aufs]] offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2251192/+files/lxc-start-aufs|generic]], and it is now available in lxc for Oneiric and later (as "lxc-start-ephemeral"). That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).
Line 56: Line 36:
=== One-time ===

 * [[Running/LXC|Set up an lxc instance]].
 * {{{sudo apt-get install testrepository}}} in your host instance.
 * In your source tree, run {{{testr init}}}.
 * Change your source tree's .testr.conf to the following:
 {{{
[DEFAULT]
test_command=lxc-start-ephemeral $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
test_id_option=--load-list $IDFILE
test_list_option=--list
}}}
Line 60: Line 52:

=== Working ===
Line 68: Line 63:
  * shut it down (e.g. with lxc-stop-n <name>, or poweroff -n, or your preferred method).
    Note that lxc is fragile at the moment, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
 * Run tests with testr:
  * shut it down. In theory, {{{sudo poweroff}}} in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with {{{lxc-stop -n <name>}}}.
    * Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.

    * We may investigate creating a second test-only base instance in order to make this easier.

=== Running tests ===

 * Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section.
Line 73: Line 73:
TEMP=$(pwd)/temp testr run --parallel LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
Line 77: Line 77:
TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg

Overview

Parallel testing would be nice. Theres a bunch of things to do to make it work. See the LEP for constraints/goals/resourcing.

Known bugs/issues: parallel test bugs

LXC containers and parallel testing

LXC containers combined with aufs offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that generic, and it is now available in lxc for Oneiric and later (as "lxc-start-ephemeral"). That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See Running/LXC for info on setting up a base container.

Caveats

  • If the base container is running it will be a disaster. Don't try.
  • aufs does not seem to permit deletes in some circumstances 729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:

    • /var/tmp/testkeyserver.test
    • /var/lib/postgresql/8.4/main/postmaster.pid
    • /var/tmp/bazaar.launchpad.dev/mirrors
  • and conversely some need a tree:
      File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
        handler = logging.FileHandler(log_file)
      File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
        StreamHandler.__init__(self, self._open())
      File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
        stream = open(self.baseFilename, self.mode)
    IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
    - this was because buildmailman had not been run in the base container.
  • If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. 820726. sudo pkill memcached can be used to work around this.

Workflow

One-time

  • Set up an lxc instance.

  • sudo apt-get install testrepository in your host instance.

  • In your source tree, run testr init.

  • Change your source tree's .testr.conf to the following:
    [DEFAULT]
    test_command=lxc-start-ephemeral $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
    test_id_option=--load-list $IDFILE
    test_list_option=--list
  • You need a temp directory in your source tree to workaround bug 808557

    mkdir temp

Working

  • Edit outside the container in your normal work area
  • Start the base container to do maintenance: make schema, bin/buildout
    lxc-start -n $basename -d
    • ssh to it
    • make schema
    • bin/buildout
    • shut it down. In theory, sudo poweroff in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with lxc-stop -n <name>.

      • Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
      • We may investigate creating a second test-only base instance in order to make this easier.

Running tests

  • Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section. All tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
    Some tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg

ParallelTests (last edited 2012-05-24 11:24:29 by bac)