Diff for "ParallelTests"

Not logged in - Log In / Register

Differences between revisions 1 and 21 (spanning 20 versions)
Revision 1 as of 2010-10-20 00:32:44
Size: 680
Editor: lifeless
Comment: sketch
Revision 21 as of 2012-01-09 14:57:21
Size: 4149
Editor: gary
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Parallel testing would be nice. Theres a bunch of things to do to make it work. = Overview =
Parallel testing would be nice. Theres a bunch of things to do to make it work. See [[LEP/ParallelTesting|the LEP]] for constraints/goals/resourcing.
Line 4: Line 5:
[https://bugs.launchpad.net/launchpad-project/+bugs?field.timeout=paralleltest|parallel test bugs] [[https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=paralleltest|parallel test bugs]]
Line 6: Line 7:
Design sketch:
 - bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners.
 - Layers dynamically allocate/deallocate resources such as:
  - dbnames
  - config files
  - librarian work dir
  - librarian ports
= LXC containers and parallel testing =
Line 14: Line 9:
Things that need specialist knowledge:
 - dynamically allocating ports for zope - port 8085 and 9025 specifically, which can then be fed back into e.g. zcml files/launchpad.conf.
LXC containers combined with [[http://en.wikipedia.org/wiki/Aufs|aufs]] offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2251192/+files/lxc-start-aufs|generic]], and it is now available in lxc for Oneiric and later (as "lxc-start-ephemeral"). That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See [[Running/LXC]] for info on setting up a base container.

== Caveats ==

 * If the base container is running it will be a disaster. Don't try.
 * aufs does not seem to permit deletes in some circumstances Bug:729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:
   * /var/tmp/testkeyserver.test
   * /var/lib/postgresql/8.4/main/postmaster.pid
   * /var/tmp/bazaar.launchpad.dev/mirrors
 * and conversely some need a tree:
 {{{
  File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
    handler = logging.FileHandler(log_file)
  File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
}}} - this was because buildmailman had not been run in the base container.
 * If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. Bug:820726. sudo pkill memcached can be used to work around this.

== Workflow ==

=== One-time ===

 * [[Running/LXC|Set up an lxc instance]].
 * {{{sudo apt-get install testrepository}}} in your host instance.
 * In your source tree, run {{{testr init}}}.
 * Change your source tree's .testr.conf to the following:
 {{{
[DEFAULT]
test_command=lxc-start-ephemeral $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
test_id_option=--load-list $IDFILE
test_list_option=--list
}}}
 * You need a temp directory in your source tree to workaround bug Bug:808557
 {{{
mkdir temp
}}}

=== Working ===

 * Edit outside the container in your normal work area
 * Start the base container to do maintenance: make schema, bin/buildout
 {{{
lxc-start -n $basename -d
}}}
  * ssh to it
  * make schema
  * bin/buildout
  * shut it down. In theory, {{{sudo poweroff}}} in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with {{{lxc-stop -n <name>}}}.
    * Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.

    * We may investigate creating a second test-only base instance in order to make this easier.

=== Running tests ===

 * Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section.
 All tests
 {{{
LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
}}}
 Some tests
 {{{
LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg
}}}

Overview

Parallel testing would be nice. Theres a bunch of things to do to make it work. See the LEP for constraints/goals/resourcing.

Known bugs/issues: parallel test bugs

LXC containers and parallel testing

LXC containers combined with aufs offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that generic, and it is now available in lxc for Oneiric and later (as "lxc-start-ephemeral"). That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See Running/LXC for info on setting up a base container.

Caveats

  • If the base container is running it will be a disaster. Don't try.
  • aufs does not seem to permit deletes in some circumstances 729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:

    • /var/tmp/testkeyserver.test
    • /var/lib/postgresql/8.4/main/postmaster.pid
    • /var/tmp/bazaar.launchpad.dev/mirrors
  • and conversely some need a tree:
      File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
        handler = logging.FileHandler(log_file)
      File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
        StreamHandler.__init__(self, self._open())
      File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
        stream = open(self.baseFilename, self.mode)
    IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
    - this was because buildmailman had not been run in the base container.
  • If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. 820726. sudo pkill memcached can be used to work around this.

Workflow

One-time

  • Set up an lxc instance.

  • sudo apt-get install testrepository in your host instance.

  • In your source tree, run testr init.

  • Change your source tree's .testr.conf to the following:
    [DEFAULT]
    test_command=lxc-start-ephemeral $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
    test_id_option=--load-list $IDFILE
    test_list_option=--list
  • You need a temp directory in your source tree to workaround bug 808557

    mkdir temp

Working

  • Edit outside the container in your normal work area
  • Start the base container to do maintenance: make schema, bin/buildout
    lxc-start -n $basename -d
    • ssh to it
    • make schema
    • bin/buildout
    • shut it down. In theory, sudo poweroff in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with lxc-stop -n <name>.

      • Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
      • We may investigate creating a second test-only base instance in order to make this easier.

Running tests

  • Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section. All tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
    Some tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg

ParallelTests (last edited 2012-05-24 11:24:29 by bac)