Differences between revisions 11 and 12

Python Style Guide

This document describes expected practices when writing Python code. There are occasions when you can break these rules, but be prepared to justify doing so when your code gets reviewed.

Contents

Python Style Guide

Existing Conventions

There are well-established conventions in the Python community, and in general we should follow these. General Python conventions, and required reading:

PEP 8: Style Guide for Python Code
PEP 257: Docstring Conventions
The Zen of Python: python -c "import this"

Note that our standards differ slightly from PEP-8 in some cases.

Coding standards other projects use:

Whitespace and Wrapping

Code should fit within 78 columns, so as to fit nicely in an 80 column terminal, even when quoted in an email.
Indents should be 4 spaces.
No tabs. This is not negotiable.
For statements that span multiple lines, use () rather than \.

Multiline braces

There are lots of cases where you might have a list, tuple or dictionary literal that spans multiple lines. In these cases, you should always format these literals as such:

   1     mydict = {
   2         'first': 1,
   3         'second': 2,
   4         'third': 3,
   5         }
   6 
   7     mylist = [
   8         'this is the first line',
   9         'this is the second line',
  10         'this is the third line',
  11         ]

Things to note:

The opening brace (or parenthesis or bracket) sits at the right end of the first line. Nothing follows this open brace.
Every subsequent item in the sequence is on a line by itself, and every line ends in a comma. This includes the last item in the sequence.
The closing brace lives on a line by itself and is indented to be under the first non-whitespace character of the last item line.

This style holds for __all__ declarations, such that this is correct:

   1 __all__ = [
   2     'HeldMessageDetails',
   3     'MailingList',
   4     'MailingListSet',
   5     'MailingListSubscription',
   6     'MessageApproval',
   7     'MessageApprovalSet',
   8     ]

Note however one exception: multiline imports. In this case, we scrunch all the imports into as few lines as possible, and of course we sort the names alphabetically. See Imports for more details about our import style.

Multiline function definitions

When your code defines a function or method with lots of parameters, it could very well exceed the 79 character maximum. In that case, you have two options for wrapping the parameter list. The first one is preferred.

   1         def do_something(the_first_argument, the_second_argument,
   2                          the_third_argument, the_fourth_argument,
   3                          the fifth_argument):

If indention or length of the function name make first form impractical, use this second form.

   1         def do_something_that_needs_a_lot_explaining(
   2             the_first_argument, the_second_argument, the_third_argument,
   3             the_fourth_argument, the fifth_argument):

It might be a good idea to leave a blank line after the parameter list when using this style. Remember, it's all about readability.

Multiline function calls

When your code calls a function with lots of arguments, it could very well exceed the 79 character maximum. In that case, you should wrap your function call to fit, like so:

   1         my_thing = some_object.do_something_to(
   2             the_first_argument, the_second_argument, the_third_argument,
   3             the_fourth_argument, the fifth_argument)

Do not indent your code like this:

   1         my_thing = some_object.do_something_to(the_first_argument,
   2                                                the_second_argument,
   3                                                the_third_argument,
   4                                                the_fourth_argument,
   5                                                the fifth_argument)

or like this:

   1         my_thing = some_object.do_something_to(the_first_argument,
   2             the_second_argument, the_third_argument, the_fourth_argument,
   3             the fifth_argument)

Vim Configuration

To make wrapping and tabs fit the above standard, you can add the following to your .vimrc:

autocmd BufNewFile,BufRead *.py set tw=78 ts=4 sts=4 sw=4 et

To make trailing whitespace visible:

set list
set listchars=tab:>.,trail:-

This will also make it obvious if you accidentally introduce a tab.

To make long lines show up:

match Error /\%>79v.\+/

For an even more in-depth Vim configuration, have a look at UltimateVimPythonSetup for a complete vim file you can copy to your local setup.

Emacs Configuration

There are actually two Emacs Python modes. Emacs comes with python.el which (IMO) has some quirks and does not seem to be as popular among hardcore Python programmers. python-mode.el comes with XEmacs and is supported by a group of hardcore Python programmers. Even though it's an add-on, it works with Emacs just fine.

Naming

Classes should be in CamelCase
Methods should be initialCamelCase
Functions should use_underscores
Non-method attributes and properties should use_underscores
Local variables should use_underscores
Modules should be in lowercase
Constants should be in ALL_CAPS
Interface class names should always begin with I
Single character names should be avoided for both variables and methods. Even list comprehensions and generator expressions should not use them.

However, these rules may not apply when modifying non-Launchpad code -- e.g. Zope. In those cases, consistency with existing code should be a priority. When working on older Launchpad code that doesn't meet this standard, converting it is preferable but not required.

As PEP-8 states, consistency is the top priority.

Shadowing Builtins

Take care not to use names that shadow builtins. This includes names such as str, list or dict. This can cause subtle bugs; for example:

   1 def func(str):
   2     ...
   3     # Convert number to a string
   4     s = str(number)  # BOOM!

It is okay to shadow builtins that we don't use, such as filter and reduce.

Private Name Mangling

In general, don't mangle attribute names using two leading underscores, e.g. __very_specific_attribute:

http://www.python.org/doc/2.4.4/ref/atom-identifiers.html

Just use a single underscore to create a pseudo-private attribute and name it appropriately.

BarryWarsaw says: double-underscore private names are really only there to assist in classes designed for inheritance. If class A wants to make sure some of its private attributes won't collide with subclass B's attributes, you can use double-underscores. This is pretty rare, and since we control all the code, is almost never appropriate for Launchpad.

Private names are private

You should never call a non-public attribute or method from another class. In other words, if class A has a method _foo(), don't call it from anywhere outside class A.

Since we control all the Launchpad code, it's almost always better to just rename A._foo() to A.foo(), making it public, and then it's fine to call it from outside class A.

If you need to keep the old name for backward compatibility (because you don't want to change all call sites right now), create an alias and make sure you indicate that the old name is deprecated.

Docstrings

If you haven't already, read PEP 257
In general, everything that can have a docstring should: modules, classes, methods, functions.
Docstrings should always be enclosed in triple double quotes: """Like this."""
When a class or a method implements an interface, the docstring should say """See `IFoo`."""

Docstrings should be valid reST (with all the painful indentation rules that implies) so that tools such as pydoctor can be used to automatically generate API documentation.

You should use fields to describe arguments, return values, and exceptions raised, as documented in epydoc. Avoid using synonym fields such as "returns" and "raises".

Using `name` outputs a link to the documentation of the named object, if pydoctor can figure out what it is. For an example of pydoctor's output, see http://starship.python.net/crew/mwh/hacks/example.html.

Here are some examples:

   1 def example1(a, b):
   2     """Perform some calculation.
   3 
   4     It is a **very** complicated calculation.
   5     """

   1 def example2(a, b):
   2     """Perform some calculation.
   3 
   4     It is a **very** complicated calculation.
   5 
   6     :param a: The number of gadget you think this
   7               function should frobnozzle.
   8     :type a: ``int``
   9     :param b: The name of the thing.
  10     :type b: ``str``
  11     :return: The answer!
  12     :rtype: ``str``.
  13     :raise ZeroDivisionError: when ``a`` is 0.
  14     """

   1 def example3():
   2     """Call `example2` with sensible arguments."""

There is some controversy about whether interface methods should get a docstring or not. OT1H, we want everything that can have a docstring to have a docstring. OTOH, this messes up pydoctor which can extract the method's docstring from the interface if the implementation doesn't have a docstring. We have no resolution on this yet, so for now, include the docstrings on such methods.

Modules

Each module should look like this:

   1 # Copyright 2004-2007 Canonical Ltd.  All rights reserved.
   2 
   3 """Module docstring goes here."""
   4 
   5 __metaclass__ = type
   6 __all__ = [
   7     ...
   8     ]

The file standard_template.py has most of this already, so save yourself time by copying that when starting a new module. The "..." should be filled in with a list of public names in the module. See [#multiline the section on multiline braces] for more details.

Note that although PEP-8 says to "put any relevant __all__ specification after the imports", Launchpad code should have the __all__ before the imports. This makes it easy to see what a module contains and exports, and avoids the problem that differing amounts of imports among files means that the __all__ list is in a different place each time. Given that we have the Import Fascist utility (see Imports), we use __all__ more often than in general Python code.

Importing a module should not have side effects. This means that any code other than a function/class/constant declaration should be guarded by an if __name__ == '__main__': line.

Imports

Restrictions

There are restrictions on which imports can happen in Launchpad. Namely:

Database code cannot be imported directly
import * cannot be used if the module being imported from does not have an __all__
Database code may not import zope.exceptions.NotFoundError -- it must instead use canonical.launchpad.interfaces.NotFoundError

These restrictions are enforced by the Import Fascist, which will cause your tests not to pass if you don't abide by the rules.

Imports should be fully qualified. Good:

   1 # foo/bar.py
   2 import foo.baz

Bad:

   1 # foo/bar.py
   2 import baz

I.e. if foo.bar imports foo.baz, it should say import foo.baz, not import baz.

Imports should not be circular. Bad:

   1 # foo.py
   2 import bar

   1 # bar.py
   2 import foo

This causes weird bugs. Find a different way to structure your code.

Multiline imports

Sometimes import lines must span multiple lines, either because the package path is very long or because there are multiple names inside the module that you want to import.

Never use backslashes in import statements! Use parenthesized imports:

   1 from foo import (
   2    That TheOther This)

Our style guide for multiline import statements differs from our general guideline for multiline braces, as a compromise to keep our import sections to a reasonable size. Our imports should be scrunched together, and of course sorted alphabetically, like so:

   1 from canonical.database.sqlbase import (
   2     cursor, flush_database_caches, flush_database_updates, quote, quote_like,
   3     sqlvalues, SQLBase)

Also, 'make lint' is a very good tool for helping you maintain your imports.

Import scope

We now discourage importing names from packages, where the package's __init__.py has sub-module import-* statements. We used to do this quite a bit for canonical.launchpad.interfaces and many other packages. This style is discouraged now because

import-*'s obscure the actual location of names, making finding the
- objects much more difficult
their use tends to encourage hugantically ginormous multiline import
- statements, increasing the likelihood of merge conflicts
you also have to maintain the __init__.py files
this style can lead to circular import problems

Instead, we now recommend that you do not add import-*'s in a package's __init__.py, and instead, use the more fully qualified import statements in your code. E.g. instead of

   1 from canonical.launchpad.interfaces import (
   2     CannotChangeSubscription, CannotSubscribe, CannotUnsubscribe,
   3     EmailAddressStatus, IEmailAddressSet, IHeldMessageDetails,
   4     ILaunchpadCelebrities, IMailingList, IMailingListSet,
   5     IMailingListSubscription, IMessageApproval, IMessageApprovalSet,
   6     IMessageSet, MailingListStatus, PostedMessageStatus)

Use

   1 from canonical.launchpad.interfaces.emailaddress import (
   2     EmailAddressStatus, IEmailAddressSet)
   3 from canonical.launchpad.interfaces.launchpad import ILaunchpadCelebrities
   4 from canonical.launchpad.interfaces.mailinglist import (
   5     CannotChangeSubscription, CannotSubscribe, CannotUnsubscribe,
   6     IHeldMessageDetails, IMailingList, IMailingListSet,
   7     IMailingListSubscription, IMessageApproval, IMessageApprovalSet,
   8     MailingListStatus)
   9 from canonical.launchpad.interfaces.message import IMessageSet

Circular imports

With the increased use of native Storm APIs, you may encounter more circular import situations. For example, a MailingList method may need a reference to the EmailAddress class for a query, and vice versa. The classic way to solve this is to put one of the imports inside a method instead of at module global scope (a "nested import").

Short of adopting something like Zope's lazy imports (which has issues of its own), you can't avoid this, so here are some tips to make it less painful.

Do the nested import in the least common case. For example, if 5 methods
in database/mailinglist.py need access to EmailAddress but only one method in database/emailaddress.py needs access to MailingList, put the import inside the emailaddres.py method, so you have fewer overall nested imports.
Clearly comment that the nested import is for avoiding a circular import, using the example below.
Put the nested import at the top of the method.

   1         def doFooWithBar(self, ...):
   2             # Import this here to avoid circular imports.
   3             from canonical.launchpad.database.bar import Bar
   4             # ...
   5             return store.find((Foo, Bar), ...)

Circular imports and webservice exports

One of the largest sources of pain from circular imports is caused when you need to export an interface on the webservice. Generally, the only way around this is to specify generic types (like the plain old Interface) at declaration time and then later patch the webservice's data structures at the bottom of the interface file.

Fortunately there are some helper functions to make this less painful, in components/apihelpers.py. These are simple functions where you can some info about your exported class/method/parameters and they do the rest for you.

For example:

   1     from canonical.launchpad.components.apihelpers import (
   2         patch_entry_return_type, patch_collection_return_type)
   3     patch_collection_return_type(
   4         IArchive, 'getComponentsForQueueAdmin', IArchivePermission)
   5     patch_entry_return_type(
   6         IArchive, 'newPackageUploader', IArchivePermission)

Attributes

Trivial attribute getter/setter functions are not necessary. Python takes a "We're all consenting adults" view on encapsulation, so if you want to access an attribute on an instance, just do it.

Be careful when accessing attributes of attributes. E.g.:

   1 x = foo.bar.baz * 2

If foo.bar is None, Bad Things will happen, so always check if you're not sure. This is sometimes referred to as the "two dots rule".

If you need to dynamically access an attribute, use getattr rather than eval. It is clearer and safer. For example:

   1 callback = getattr(self, "handler_" + state)

Properties

When you need to add a property, ensure that __metaclass__ = type is used as mentioned above. For a read-only property, use the following pattern:

   1 class SomeClass:
   2     @property
   3     def foo(self):
   4         # Do some processing.
   5         return bar

You should rarely need to add a property that has a getter and a setter, but on those occasions, this is how to do it. The getter is the same as for the simple case. The setter is called _setfoo. The name _setfoo starts with an underscore to show that it is not meant to be used except via the property.

   1 class SomeClass:
   2     def foo(self):
   3         """Docstrings are helpful."""
   4         # Do some processing.
   5         return bar
   6         
   7     def _setfoo(self, value):
   8         # Do something with 'value'.
   9 
  10     foo = property(foo, _setfoo, doc=foo.__doc__)

Note the order: getter, setter, make them into a property.

Truth conditionals

Launchpad style tests for boolean True only. False, None, and 0 are not the same, and tests that rely on them being the same are ambiguous. Tests of objects that may be None or 0 must be written as truth tests because either value will be interpreted as False by the test.

Let's say you have a sequence (e.g. a list or tuple) and you want test whether it's empty or not. Standard Python convention is just to use Python's rules that empty sequences are false:

   1     if not mylist:
   2         # sequence mylist is empty
   3         blah()

However, Launchpad hackers consider it to be more explicit to test against the sequence's length:

   1     if len(mylist) == 0:
   2         # sequence mylist is empty
   3         blah()

If the value could be None though, please check explicitly against that using an identity check. In other words, don't do this:

   1     if not foo:

and definitely don't do this:

   1     if foo == None:

}

do this instead:

   1     if foo is None:

Note though that is and is not comparisions should only be used for comparing singletons, which include built-ins like None but also singleton objects you create in your program:

   1 missing = object()
   2 thing = mydict.get('thing', missing)
   3 if thing is missing:
   4    # etc...

You wouldn't use == in that case, but similarly, if you're not comparing against singletons, you should not use is, but ==:

   1 if thing == 'foo':

You might think that because of string interning, it would be safe to use is here, but remember that interning is an implementation detail and you should not count on any particular string being interned. Always use == for comparison against strings and numbers. Note too that because of the way we use SQLObject, there may be some cases where you have to use == in places you'd normally think to use `is'.

If statements

The most commonly used patterns like

   1     if something:
   2         do_something()

and

   1     if something:
   2         do_something()
   3     else:
   4         do_something_else()

are perfectly fine.

However, if you do have some elif statements, we consider it a good practice to always include an else statement, even if it contains only a comment and a pass statement, or an assert statement with an appropriate failure message.

   1     if isinstance(fruit, apple):
   2         eat(fruit)
   3     elif isinstance(fruit, pineapple):
   4         eat(peel(fruit))
   5     else:
   6         # We only eat apples and pineapples.
   7         pass

Similarly you may have empty elif blocks, like

   1     if 'foo' in request.form:
   2         foo()
   3     elif 'bar' in request.form:
   4         bar()
   5     elif 'goback' in request.form:
   6         # No need to do anything; we'll simply redirect the user.
   7         pass
   8     else:
   9         raise UnexpectedFormData()
  10     redirect()

This may be considered an example of the "Explicit is better than implicit." principle.

Chaining method calls

Since in some cases (e.g. class methods and other objects that rely on descriptor get() behaviour) it's not possible to use the old style of chaining method calls (SuperClass.method(self, ...)), we should always use the super() builtin when we want that.

$/!\$ The exception to this rule is when we have class hierarchies outside of our control that are known not to use super() and that we want to use for diamond-shaped inheritance.

Use of lambda, and operator.attrgetter

Avoid using lambda expressions. It is usually better to define a small function and use that instead; this ensures that your function has a meaningful name, allows you to write a docstring, and allows the function to be tested more easily.

One exception to this rule are callbacks within the Twisted framework where the use of lambdas to ignore a parameter is common practice.

A common reason to use lambda is when you want to sort a list of objects according to one of the object's attributes. Such code will often look like this:

   1     # You have a list of objects called 'results' from some database method.
   2     return sorted(results, key=lambda obj: obj.title)

In this case, you can use operator.attrgetter instead.

   1     from operator import attrgetter
   2     ...
   3     # You have a list of objects called 'results' from some database method.
   4     return sorted(results, key=attrgetter('title'))

Note that you can use attrgetter only when you want to sort on an attribute. You cannot use it like this when you want to sort on the result of calling a method, or when you want to sort on more than a single attribute.

Use of hasattr

Use of the built-in hasattr function should be avoided since it swallows exceptions. Instead use:

    if getattr(obj, 'attrname', None) is not None:

However, if None is a valid value for the attribute, you have to do this instead:

    missing = object()
    if getattr(obj, 'attrname', missing) is not missing:

Database related

Storm

We use two database ORM (object-relational mapper) APIs in Launchpad, the older and deprecated SQLObject API and the new and improved Storm API. All new code should use the Storm API, and you are encourages to convert existing code to Storm as part of your tech-debt payments.

Note:

The SQLObject and Storm ResultSet interfaces are not compatible, so e.g. if you need to UNION between these two, you will run into trouble. We are looking into ways to address this.
Using the native Storm API can be verbose and inconvenient for some folks. There is a (somewhat) experimental Sugar base class in lib/canonical/launchpad/database/stormsugar.py which provides many convenient APIs to make your use of Storm easier.

Field attributes

When you need to add ID attributes to your database class, use field_id as the attribute name instead of fieldID.

Multi-line SQL

SQL doesn't care about whitespace, so use triple quotes for large SQL queries or fragments, e.g.:

   1     query = """
   2         SELECT TeamParticipation.team, Person.name, Person.displayname
   3         FROM TeamParticipation
   4         INNER JOIN Person ON TeamParticipation.team = Person.id
   5         WHERE TeamParticipation.person = %s
   6         """ % sqlvalues(personID)

This also easy to cut-and-paste into psql for interactive testing, unlike if you use several lines of single quoted strings.

Unpacking assignment

   1   val_1, val_2 = multival_func()

And for 1-tuples

   1   assert (len(vals) == 1,
   2       "Expected 1 value, but received %s" % len(vals))
   3   val_1 = vals[0]

Creating temporary files

We should use the most convenient method of the tempfile module, never taint '/tmp/' or any other 'supposed to be there' path.

Despite of being developed and deployed on Ubuntu systems, turning it into restriction might not be a good idea.

When using tempfile.mkstemp remember it returns an open file-descriptor which has to be closed or bound to the open file, otherwise they will leak and eventually hit the default Linux limit (1024).

There are 2 good variations according to the scope of the temporary file.

   1     fd, filename = mkstemp()
   2     os.close(fd)
   3     ...
   4     act_on_filename(filename)

Or:

   1     fd, filename = mkstemp()
   2     temp_file = os.fdopen(fd, 'w')
   3     ...
   4     temp_file.write('foo')
   5     temp_file.close()

Never use:

   1     fd, filename = mkstemp()
   2     temp_file = open(filename)
   3     temp_file.write('foo')
   4     temp_file.close()
   5     # BOOM! 'fd' leaked.

It's also important to mention that in testing context, specially if you are using the lp.testing.TestCase (or one of its specializations) you can simply create a brand new temporary directory (using mkdtemp). Create as many files you need within it and register a cleanup function to purge the temporary directory recursively.

   1 class TestFoo(TestCase):
   2 ...
   3     def test_foo(self):
   4         tempdir = mkdtemp()
   5         self.addCleanup(shutils.rmtree, tempdir)
   6         ...
   7         do_something(os.path.join(tempdir, 'test.log'))
   8         ...

-  ⇤ ← Revision 11 as of 2010-04-21 14:16:59 → 
  Size: 27438
  Editor: allenap
  Comment: Move the function definition whitespace rules near to the other whitespace rules.
+   ← Revision 12 as of 2010-04-21 14:42:00 → ⇥
  Size: 27878
  Editor: henninge
  Comment: Expanded rule for function definition as per last weeks' reviewers meeting.
-Deletions are marked like this.
+Additions are marked like this.
 Line 83:
+=== Multiline function definitions ===

When your code ''defines'' a function or method with lots of parameters, it
could very well exceed the 79 character maximum.  In that case, you have two
options for wrapping the parameter list. The first one is preferred.

{{{#!python
        def do_something(the_first_argument, the_second_argument,
                         the_third_argument, the_fourth_argument,
                         the fifth_argument):
}}}

If indention or length of the function name make first form impractical, use
this second form.

{{{#!python
        def do_something_that_needs_a_lot_explaining(
            the_first_argument, the_second_argument, the_third_argument,
            the_fourth_argument, the fifth_argument):
}}}

It might be a good idea to leave a blank line after the parameter list when
using this style. Remember, it's all about readability.
-Line 85:
+Line 110:
-When your code calls a function with lots of arguments, it could very well
+When your code ''calls'' a function with lots of arguments, it could very well
-Line 112:
+Line 137:
-=== Multiline function definitions ===

If you need to write a function or method with a long list of arguments, you should format it thus:

{{{
#!python
def function_with_many_args(arg1, arg2, arg3, arg4, arg5,
                            spam, eggs, ham, jam, lamb):
    """Docstring..."""
    # Some code...
}}}

 * Why does this differ from the function call definition? What happens_when_the_function_name_is_really_quite_long? -- ''allenap, 2010-04-21''

launchpad development

Python Style Guide

Existing Conventions

Related Documents on This Wiki

Whitespace and Wrapping

Multiline braces

Multiline function definitions

Multiline function calls

Vim Configuration

Emacs Configuration

Naming

Shadowing Builtins

Private Name Mangling

Private names are private

Docstrings

Modules

Imports

Restrictions

Multiline imports

Import scope

Circular imports

Circular imports and webservice exports

Attributes

Properties

Truth conditionals

If statements

Chaining method calls

Use of lambda, and operator.attrgetter

Use of hasattr

Database related

Storm

Field attributes

Multi-line SQL

Unpacking assignment

Creating temporary files

launchpad development

Diff for "PythonStyleGuide"

Python Style Guide

Existing Conventions

Related Documents on This Wiki

Whitespace and Wrapping

Multiline braces

Multiline function definitions

Multiline function calls

Vim Configuration

Emacs Configuration

Naming

Shadowing Builtins

Private Name Mangling

Private names are private

Docstrings

Modules

Imports

Restrictions

Multiline imports

Import scope

Circular imports

Circular imports and webservice exports

Attributes

Properties

Truth conditionals

If statements

Chaining method calls

Use of lambda, and operator.attrgetter

Use of hasattr

Database related

Storm

Field attributes

Multi-line SQL

Unpacking assignment

Creating temporary files