Diff for "ParallelTests"

Not logged in - Log In / Register

Differences between revisions 3 and 19 (spanning 16 versions)
Revision 3 as of 2010-10-20 00:35:38
Size: 682
Editor: lifeless
Comment: urls
Revision 19 as of 2012-01-03 13:14:59
Size: 3892
Editor: bac
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
Parallel testing would be nice. Theres a bunch of things to do to make it work. = Overview =
Parallel testing would be nice. Theres a bunch of things to do to make it work. See [[LEP/ParallelTesting|the LEP]] for constraints/goals/resourcing.
Line 4: Line 5:
[[https://bugs.launchpad.net/launchpad-project/+bugs?field.timeout=paralleltest|parallel test bugs]] [[https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=paralleltest|parallel test bugs]]
Line 6: Line 7:
Design sketch:
 * bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners.
= Design sketch =
 * bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners. (implemented)
Line 9: Line 10:
  * dbnames
  * config files
  * dbnames (implemented)
  * config files (implemented)
Line 13: Line 14:
  * keyserver work area
  * soyuz work area
Line 16: Line 19:
 * Buildmaster slave tests hard code the xmlrpc port to 8221 everywhere.

= LXC containers and parallel testing =

LXC containers combined with [[http://en.wikipedia.org/wiki/Aufs|aufs]] offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William has put together a proof of concept, and Robert has made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2251192/+files/lxc-start-aufs|generic]]. That combined with an updated .testr.conf (a TODO is to offer profiles for testr) like: {{{
[DEFAULT]
test_command=lxc-start-aufs $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
test_id_option=--load-list $IDFILE
test_list_option=--list
}}}
will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See [[Running/LXC]] for info on setting up a base container.

== Caveats ==

 * If the base container is running it will be a disaster. Don't try.
 * aufs does not seem to permit deletes in some circumstances Bug:729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:
   * /var/tmp/testkeyserver.test
   * /var/lib/postgresql/8.4/main/postmaster.pid
   * /var/tmp/bazaar.launchpad.dev/mirrors
 * and conversely some need a tree:
 {{{
  File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
    handler = logging.FileHandler(log_file)
  File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
}}} - this was because buildmailman had not been run in the base container.
 * If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. Bug:820726. sudo pkill memcached can be used to work around this.

== Workflow ==

 * You need a temp directory in your source tree to workaround bug Bug:808557
 {{{
mkdir temp
}}}
 * Edit outside the container in your normal work area
 * Start the base container to do maintenance: make schema, bin/buildout
 {{{
lxc-start -n $basename -d
}}}
  * ssh to it
  * make schema
  * bin/buildout
  * shut it down (e.g. with lxc-stop-n <name>, or poweroff -n, or your preferred method).
    Note that lxc is fragile at the moment, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
 * Run tests with testr:
 All tests
 {{{
TEMP=$(pwd)/temp testr run --parallel
}}}
 Some tests
 {{{
TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg
}}}

Overview

Parallel testing would be nice. Theres a bunch of things to do to make it work. See the LEP for constraints/goals/resourcing.

Known bugs/issues: parallel test bugs

Design sketch

  • bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners. (implemented)
  • Layers dynamically allocate/deallocate resources such as:
    • dbnames (implemented)
    • config files (implemented)
    • librarian work dir
    • librarian ports
    • keyserver work area
    • soyuz work area

Things that need specialist knowledge:

  • dynamically allocating ports for zope - port 8085 and 9025 specifically, which can then be fed back into e.g. zcml files/launchpad.conf.
  • Buildmaster slave tests hard code the xmlrpc port to 8221 everywhere.

LXC containers and parallel testing

LXC containers combined with aufs offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William has put together a proof of concept, and Robert has made that generic. That combined with an updated .testr.conf (a TODO is to offer profiles for testr) like:

[DEFAULT]
test_command=lxc-start-aufs $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT
test_id_option=--load-list $IDFILE
test_list_option=--list

will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See Running/LXC for info on setting up a base container.

Caveats

  • If the base container is running it will be a disaster. Don't try.
  • aufs does not seem to permit deletes in some circumstances 729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:

    • /var/tmp/testkeyserver.test
    • /var/lib/postgresql/8.4/main/postmaster.pid
    • /var/tmp/bazaar.launchpad.dev/mirrors
  • and conversely some need a tree:
      File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
        handler = logging.FileHandler(log_file)
      File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
        StreamHandler.__init__(self, self._open())
      File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
        stream = open(self.baseFilename, self.mode)
    IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
    - this was because buildmailman had not been run in the base container.
  • If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. 820726. sudo pkill memcached can be used to work around this.

Workflow

  • You need a temp directory in your source tree to workaround bug 808557

    mkdir temp
  • Edit outside the container in your normal work area
  • Start the base container to do maintenance: make schema, bin/buildout
    lxc-start -n $basename -d
    • ssh to it
    • make schema
    • bin/buildout
    • shut it down (e.g. with lxc-stop-n <name>, or poweroff -n, or your preferred method).

      • Note that lxc is fragile at the moment, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
  • Run tests with testr: All tests
    TEMP=$(pwd)/temp testr run --parallel
    Some tests
    TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg

ParallelTests (last edited 2012-05-24 11:24:29 by bac)