Diff for "ParallelTests"

Not logged in - Log In / Register

Differences between revisions 9 and 29 (spanning 20 versions)
Revision 9 as of 2011-07-10 22:49:36
Size: 2562
Editor: lifeless
Comment: fix
Revision 29 as of 2012-05-24 11:24:29
Size: 5168
Editor: bac
Comment:
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
= Design sketch =
 * bin/test --parallel runs N test runners with subunit, where N is the number of cores, and the tests are partitioned across runners. (implemented)
 * Layers dynamically allocate/deallocate resources such as:
  * dbnames (implemented)
  * config files (implemented)
  * librarian work dir
  * librarian ports
  * keyserver work area
  * soyuz work area

Things that need specialist knowledge:
 * dynamically allocating ports for zope - port 8085 and 9025 specifically, which can then be fed back into e.g. zcml files/launchpad.conf.
 * Buildmaster slave tests hard code the xmlrpc port to 8221 everywhere.
Line 23: Line 9:
LXC containers combined with aufs offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William has put together a proof of concept, and Robert has made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2196930/+files/lxc-start-aufs|generic]]. That combined with an updated .testr.conf (a TODO is to offer profiles for testr) like: {{{ LXC containers combined with overlayfs (was [[http://en.wikipedia.org/wiki/Aufs|aufs]]) offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that [[https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/807351/+attachment/2251192/+files/lxc-start-aufs|generic]], and it is now available, significantly refactored by Serge Hallyn, in lxc for Oneiric and later as "lxc-start-ephemeral". That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See [[Running/LXC]] for info on setting up a base container.

== Caveats ==

 * [[Bug:914365|If the base container is running it will be a disaster.]] Don't try.
 * aufs does not seem to permit deletes in some circumstances Bug:729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:
   * /var/tmp/testkeyserver.test
   * /var/lib/postgresql/8.4/main/postmaster.pid
   * /var/tmp/bazaar.launchpad.dev/mirrors
 * and conversely some need a tree:
 {{{
  File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
    handler = logging.FileHandler(log_file)
  File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
    StreamHandler.__init__(self, self._open())
  File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
    stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
}}} - this was because buildmailman had not been run in the base container.
 * If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. Bug:820726. sudo pkill memcached can be used to work around this.

== Workflow ==

=== One-time ===

 * [[Running/LXC|Set up an lxc instance]].
 * {{{sudo apt-get install testrepository}}} in your host instance.
 * In your source tree, run {{{testr init}}}.
 * Change your source tree's .testr.conf to the following:
 {{{
Line 25: Line 44:
test_command=lxc-start-aufs $LP_LXC_BASE $PWD xvfb-run $PWD/bin/test --subunit $IDOPTION $LISTOPT test_command=lxc-start-ephemeral -o $LP_LXC_BASE -b $PWD -- xvfb-run -a $PWD/bin/test --subunit $IDOPTION $LISTOPT
Line 29: Line 48:
will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it). We need a new bin/test though, as --list --subunit currently outputs subunit rather than just a list of tests, which isn't what we need..  * You need a temp directory in your source tree to workaround bug Bug:808557
 {{{
mkdir temp
}}}
Line 31: Line 53:
== Caveats ==

 * If the base container is running it will be a disaster. Don't try.
 * aufs does not seem to permit deletes in some circumstances, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:
   * /var/tmp/testkeyserver.test

== Workflow ==
=== Working ===
Line 41: Line 57:
{{{  {{{
Line 45: Line 61:
  * and within it - {{{sudo poweroff -n}}}
 * Run tests with testr:
All tests
  * make schema
  * bin/buildout
  * shut it down. In theory, {{{sudo poweroff}}} in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with {{{lxc-stop -n <name>}}}.
    * Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.

    * We may investigate creating a second test-only base instance in order to make this easier.

=== Running tests ===

 * Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section.
 All tests
 {{{
LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
}}}
 Some tests
 {{{
LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg
}}}

 * XXX Note that the test count lies (layer setup and teardown confuses things in particular). The yellow kanban board tracks a number of bugs in testr and related code pertinent to this.

== Tips ==

=== Accessing a LXC container from a buildbot slave ===
The default lxc container does not have any users you can use to login. To get around this you can set a password for root in the host and copy the corresponding line from the host's {{{/etc/shadow}}} to that of the container. You can then log into the container using the same password. The steps are:
Line 49: Line 87:
testr run --parallel root@host> passwd root # set password to, say, 'foo'
root@host> grep root /etc/shadow >> /var/lib/lxc/lptests/rootfs/etc/shadow
root@host> vi /var/lib/lxc/lptests/rootfs/etc/shadow # Remove the orginal 'root' line with blank password information
Line 51: Line 91:
Some tests
{{{
testr run --parallel -- -t stories/gpg
}}}

Now, when you {{{lxc-start -n lptests}}} you can login as {{{root}}} using the password {{{foo}}}.

Overview

Parallel testing would be nice. Theres a bunch of things to do to make it work. See the LEP for constraints/goals/resourcing.

Known bugs/issues: parallel test bugs

LXC containers and parallel testing

LXC containers combined with overlayfs (was aufs) offer a pretty cheap way to get solid isolation - a great big hammer of a workaround for our existing globals (shared work dirs etc). William put together a proof of concept, Robert made that generic, and it is now available, significantly refactored by Serge Hallyn, in lxc for Oneiric and later as "lxc-start-ephemeral". That, combined with an updated .testr.conf given in the instructions below (a TODO is to offer profiles for testr) will let testr run tests in a temporary container. (e.g. testr -- -t stories/gpg will fire up an aufs container and run the stories/gpg tests inside it).

Be sure to export LP_LXC_BASE with the name of your lxc base container.

See Running/LXC for info on setting up a base container.

Caveats

  • If the base container is running it will be a disaster. Don't try.

  • aufs does not seem to permit deletes in some circumstances 729338, so test fixtures which start by deleting a directory tree will fail if the directory tree exists. Known cases:

    • /var/tmp/testkeyserver.test
    • /var/lib/postgresql/8.4/main/postmaster.pid
    • /var/tmp/bazaar.launchpad.dev/mirrors
  • and conversely some need a tree:
      File "/home/robertc/source/launchpad/lp-branches/working/lib/canonical/testing/layers.py", line 1775, in startSMTPServer
        handler = logging.FileHandler(log_file)
      File "/usr/lib/python2.6/logging/__init__.py", line 819, in __init__
        StreamHandler.__init__(self, self._open())
      File "/usr/lib/python2.6/logging/__init__.py", line 838, in _open
        stream = open(self.baseFilename, self.mode)
    IOError: [Errno 2] No such file or directory: '/var/tmp/mailman/logs/smtpd'
    - this was because buildmailman had not been run in the base container.
  • If we leak a child process with a shared stdout/stderr sshd will not terminate, which will cause the testr test runner to look like it has hung. 820726. sudo pkill memcached can be used to work around this.

Workflow

One-time

  • Set up an lxc instance.

  • sudo apt-get install testrepository in your host instance.

  • In your source tree, run testr init.

  • Change your source tree's .testr.conf to the following:
    [DEFAULT]
    test_command=lxc-start-ephemeral -o $LP_LXC_BASE -b $PWD -- xvfb-run -a $PWD/bin/test --subunit $IDOPTION $LISTOPT
    test_id_option=--load-list $IDFILE
    test_list_option=--list
  • You need a temp directory in your source tree to workaround bug 808557

    mkdir temp

Working

  • Edit outside the container in your normal work area
  • Start the base container to do maintenance: make schema, bin/buildout
    lxc-start -n $basename -d
    • ssh to it
    • make schema
    • bin/buildout
    • shut it down. In theory, sudo poweroff in your container should be sufficient. Experience shows that sometimes this hangs. Therefore, follow the poweroff with lxc-stop -n <name>.

      • Note that, also because of fragility, you may need to manually shutdown postgresql before stopping lxc, to get it to shutdown cleanly.
      • We may investigate creating a second test-only base instance in order to make this easier.

Running tests

  • Run tests with testr. These commands assume that your lxc base instance (as created in the initial steps) is named lpdev. If it is named something else, replace "lpdev" with that other name. This also assumes that the base instance is shut down, as described in the previous section. All tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel
    Some tests
    LP_LXC_BASE=lpdev TEMP=$(pwd)/temp testr run --parallel -- -t stories/gpg
  • XXX Note that the test count lies (layer setup and teardown confuses things in particular). The yellow kanban board tracks a number of bugs in testr and related code pertinent to this.

Tips

Accessing a LXC container from a buildbot slave

The default lxc container does not have any users you can use to login. To get around this you can set a password for root in the host and copy the corresponding line from the host's /etc/shadow to that of the container. You can then log into the container using the same password. The steps are:

root@host> passwd root  # set password to, say, 'foo'
root@host> grep root /etc/shadow >> /var/lib/lxc/lptests/rootfs/etc/shadow
root@host> vi /var/lib/lxc/lptests/rootfs/etc/shadow  # Remove the orginal 'root' line with blank password information

Now, when you lxc-start -n lptests you can login as root using the password foo.

ParallelTests (last edited 2012-05-24 11:24:29 by bac)