ArchitectureGuide

Not logged in - Log In / Register

Revision 6 as of 2010-09-01 22:12:36

Clear message

Architectural Guide

In this guide you will find some expansion and clarification on the architectural values I presented in the Launchpad Architectural Vision 2010. Be sure to look at the speakers notes: they are the juicy bits.

All the code we write will meet these values to a greater or lesser degree. Where you can, please make choices that make you code more strongly meet these values.

Some existing code does not meet them well; this is simply an opportunity to get big improvements - by increasing e.g. the transparency of existing code, operational issues and debugging headaches can be reduced without a great deal of work.

This guide is intended as a living resource : all Launchpad developers, and other interested parties, are welcome to join in and improve it.

Goals

The goal of the recommendations and suggestions in this guide are to help us reach a number of big picture goals: We want Launchpad to be:

(See the presentation for more details).

However its hard when making any particular design choice to be confident that it drives us towards these goals : they are quite specific, and not directly related to code structure or quality.

Of particular note the PythonStyleGuide is specifies coding style guidelines.

Values

There are a number things that are more closely related to code, which do help drive us towards our goals. These are values I (RobertCollins) hold dear, and which the more our code meets these values, the easier it will be to meet our goals.

The values are:

Transparency

Transparency speaks to the ability to analyse the system without dropping into pdb or taking guesses.

Some specific things that aid transparency:

We already have a lot of transparency. We can use more.

Aim for automation, Developer usability, minimal losa-intervention, on-demand access.

When adding code to the system, ask yourself 'how will I troubleshoot this when it goes wrong without access to the machine it is running on'.

Loose coupling

The looser the coupling between different parts of the system the easier it is to change them. Launchpad is pretty good about this in some ways due to the component architecture.

But its not the complete story and I think decreasing the coupling more will help the system.

I've seen some recent work on this such as the jobs system and the buildd queue refactoring, which is excellent - generic pieces that can be used and reused.

The acid test for the coupling of a component is 'how hard is it to reuse?'

And we can go further. For instance, the job system is nice, but its tightly coupled to the launchpad DB, perhaps we could make it possible to use it for other Canonical projects, or other Zope projects. Or perhaps move it to MQ and just have an adapter instead? Tasks running in a job could still talk to the DB.

Of particular note, many changes in one area of the system (e.g. bugs) break tests in other areas (e.g. blueprints) - this adds a lot of developer friction and is a strong sign of overly tight coupling.

Highly cohesive

The more things a component does, the harder it is to reason about it and performance tune it.

So this is "Do one thing well" in another setting.

The way I like to assess this is to look inside the component and see if it is doing one thing, or many things.

One common sign for a problem in this area is attributes (or persistent data) that are not used in many methods - that often indicates there is a separate component embedded in this one.

There are tradeoffs here due to database efficiency and normalisation, but its still worth thinking about: narrower tables can perform better and use less memory, even if we do add extra tables to support them.

On a related note the more clients using a given component, the wider its responsibilities and the more critical it becomes. Thats an easy situation to end up with too much in one component (lots of clients wanting things decreases the cohesion), and then we have a large unwieldy critical component - not an ideal situation.

Testable

We write lots of unit and integration tests at the moment. However its not always easy to test specific components - and the coupling of the components drives this.

The looser the coupling, the better in terms of having a very testable system. However loose coupling isn't enough on its own, so we should consider testability from a few angles:

Can it be tested in isolation? If it can, it can be tested more easily by developers and locally without needing lots of testbed environment every time.

Can we load test it? Not everything needs this, but if we can't load test a component that we reasonably expect to see a lot of use, we may have unpleasant surprises down the track.

Can we test it with broken backends/broken data? It is very nice to be confident that when a dependency breaks (not if) the component will behave nicely.

Its also good to make sure that someone else maintaining the component later can repeat these tests and is able to assess the impact of their changes.

Automation of this stuff rocks!

Predictable

An extension of stability - servers should stay up, database load should be what it was yesterday, rollouts should move metrics in an expected direction.

Predictability isn't very sexy, but its very useful: useful for capacity planning, useful for changing safely, useful for being highly available, and useful for letting us get on and do new/better things.

The closer to a steady state we can get, the more obvious it is when something is wrong.

Design metrics

This is an experiment, an attempt to set a measurable figure on some metrics that hopefully relate well to the goals and values above. The Launchpad review team will be asking about these metrics in reviews - if your code doesn't meet one, thats *OK*: this is an experiment. Please note in the review that the metric seemed nuts/inapplicable, and we'll fold that into evolving things.

Performance

Document how components are expected to perform. Docstrings or doctests are great places to put this. E.g. "This component is expected to deal with < 100 bug tracker types; if we have more this will need to be designed."

Testing

Tests for a class should complete in under 2 seconds. If they aren't, spend at least a little time determining why.

Transparency

Behaviour of components should be analysable in lpnet without doing a 'losa ping' : that is, if a sysadmin is needed to determine whats wrong with something; we've designed it wrong. Lets make sure there is a bug for that particular case, or if possible Just Fix It.

Coupling

No more than 5 dependencies of a component.

Cohesion

Attributes should be used in more than 1/3 of interactions. If they are used less often than that, consider deleting or splitting into separate components.