ArchitectureGuide/ServicesAnalysis

Not logged in - Log In / Register

Revision 25 as of 2011-05-17 10:55:57

Clear message

Credits

This document is a synthesis of many discussions held about Launchpad over the mid 2010-mid 2011 year; all good ideas go to those participants (including but not limited to poolie, gary, jml, deryck, sinzui, bigjools, wgrant, flacoste, statik, stub) - all mistakes are mine (lifeless).

Call to arms

We should invest in splitting our template and public restful API code to a dedicated source tree, running as a front end controller with no SQL access: written entirely as a client of backend APIs. This bootstrap step would enable us to take a much more service based approach over the next few years, simplifying our data models and the code that we use to access data. The initial step would be written up as a LEP and tackled incrementally by feature squads. Over the course of the project our deployment, landing and testing stories would all be updated to fit a design where Launchpad is composed of multiple small services (rather than being primarily monolithic). The initial splits would be on components rather than domain lines. E.g. template processing, Rest API, graph processing rather than bugs/registry/translations/code.

Current state

Launchpad is currently designed as single large python project + postgresql database with component libraries, and a very few additional services: loggerhead, mhonarc.

Some parts of the project run different stacks: the librarian, buildd-manager, and some are deployed very differently - buildd-slave.

Friction attributable to this design

Code coupling

Because optimisations within the data model require complex queries, different domains within the Launchpad code tree get quite tangled. These optimisations are only possible when such domains are available in one database (and conversely only needed when they are in one database - if they are in different databases other problems and solutions are needed).

Test suite

As a result of high code coupling and a lack of reliable contracts many changes have unexpectedly wide consequences. Because the code base is very large the test suite is very large. Because the layers are not amenable to substitution unit tests are very tricky to write and we usually exercise most of the entire stack. Additionally finding the right tests to work on is hard - partly because what one might consider layers are all smooshed together, and partly due to having many different styles of test which are not consistently testing at the same layers - it is hard to tell where to start.

Monolithic downtime

With a single database and deployed tree, most changes require complete downtime.

Poor integration with non-included services

Things like mhonarc and loggerhead have a very poor story in LP because we don't have a good story for skinning them and our monolithic approach drives the story we do have.

Benefits from this design

Schema and code coherency

We never have to deal with a schema from a different tree - all the code that knows about the plumbing is in one place. As such it is (relatively) easy to refactor the system and build small code components.

Atomic relational integrity

All our data is in one place, we can use foreign keys to track deletes and things are never inconsistent.

Less moving parts to learn

As all the code is in one tree, we generally have one set of idioms, one language, one database engine to work with. This helps keep things approachable (and as bugs in-tree generally get more rapid attention than bugs in related trees we have some anecdata to support this benefit).

It's relatively difficult to have cascading failures

Right now when the librarian breaks much of LP goes down. This is an example of a cascading failure - a failed backend fails frontend services. The single-stack of homogeneous servers model avoids this (but we don't have a pure implementation of this because we have the librarian). The more services the more care it takes to monitor and avoid such cascading situations.

Dealing with unexpectedly large fallout from a change is contained

If a change has unexpected consequences they are generally contained to just the one service (because it's all monolithic). This sets a reasonable bound on how hard doing a change can be.

Few styles of parts to monitor

Because we are (mostly) homogeneous monitoring and deployment is a solved problem.

A service based Launchpad

In changing the tradeoff we make we should be clear about the things we want to optimise for, so we can evaluate new tradeoff points.

As a team we want to achieve three key things:

Better isolation of changes permits confident changes with smaller numbers of tests run, and decreases the WTF factor - so we need to look at how we can increase isolation of changes. Improving the overall speed with which requests are serviced helps with latency - so we need to consider whether we will add (or remove) intrinsic limits to performance. The busier a subsystem is the harder it is to take it down for schema changes (and the larger the subsystems is the wider the outage caused by taking it down).

For change isolation we need to look at the entire stack - previously we've only considered contracts within the one code base. But actually sitting on one schema implies one large bucket within which changes can propagate : and we see this regularly when we deal with optimisations and changes to the implementation of common layers (such as the team participation graph-caching optimisation).

One way we could improve isolation is to (gradually, not radically) convert Launchpad to be built from smaller services, each of which has a crisp contract, its own storage and schema. This would permit testing of just the changes within one service - or only the services that talk to that service.

While there are many ways one might try to slice up Launchpad into smaller services, we need to avoid creating silos between components we want to deeply integrate. One way to avoid that when we don't know what we want to integrate (yet) is to focus on layers which can be reused across Launchpad rather than on (for instance) breaking out user-visible components. This doesn't preclude vertical splits but it is a lot easier to reason about and prepare for.

Anatomy of a service

Contracts and protocols

Having clear contracts between services implies that services are mostly self contained. It could also be taken to suggest that we need a homogeneous protocol for accessing different services, with API docs and so forth. Given that we have heterogeneous components to integrate (already - see mhonarc, loggerhead, gpg keyservice, bzr codehosting,...) the benefits of a single protocol would need to be balanced against the overhead of having to write thunks for any existing service that happens to use a different protocol.

We can put a few guidelines in place to support having clear boundaries though:

And we can evaluate and choose a 'sensible default' protocol for new services we write. One possibility is a restful JSON based API, but we couldn't use launchpadlib as a client for that without fixing it to be suitable for use inside a web server. (The use of runtime code generation, disk cache, wadl parsing are all significant obstacles for a server). At this point there is no clear winner, though xmlrpc may well be a reasonable compromise between speed, ease of updates and mocking, existing server-suitable client libraries, existing deployment and so forth.

Technology

We already have a heterogeneous collection of service implementations: pastedeploy(loggerhead), twisted(buildd, librarian), zope(lp core), Perl (mhonarc), bzr's service implementation(custom, with adapters to ssh(via twisted) and wsgi). If we make each service as simple to maintain as possible, we can offset to some degree the costs inherent in having multiple stacks in play. For new services we should spend a little time asking the question 'how big will this get, how responsive does it need to be, and how fast do we need to change it', but for existing services we should focus on making them easier to maintain (because we already know how much load they get). If we bring in a new service with an existing implementation we need to ask whether we can effectively deal with bugs in that service: there is no point having the worlds best bread slicer if we can't fix it when it breaks. This applies to the entire stack: database, language, network protocol.

Identified service opportunities

These are potential things we could pull out to services - they are examples only - detailed analysis of each has not been done, so it is not possible to say that they are all definites: they are merely opportunities.

team participation / directory service

The largest teams-per-person is ~300, the largest persons-per-team is 18K, but discounting the top two drops it to 3.7K, and top ten gets down to 1.6K. The 18K case can be serialised and passed over the network in 300ms though, which makes it feasible to grab and pass between systems. Smaller cases like a 2K membership team can be handled in 40ms (using psql and postgres to assess).

We have a number of significant use cases around the directory service which are poorly satisfied at the moment - we don't permit non-membership relations like 'administers' or 'audits' (e.g. is granted view privileges but not mutation privileges).

Running (minimally) the person-in-team, teams-for-person, persons-for-team facilities as a service would aid the separation of SSO (by providing a high availability service that the SSO web service could back end onto).

blob storage (the librarian)

The librarian stores upwards of 14M distinct files (after coalescing by hash) - but it is tightly coupled to the Launchpad schema. It suffers from cold-cache effects on a regular basis, and we have explicit mechanisms in the schema to let us have weaker-than-actual links (for instance we can delete the blob but keep the reference, and delete the reference but retain the blob for a while).

We could build/bring in a simpler blob store and layer our special needs on top or as an extension to it. For instances needs such as the public-restricted librarian, size-calculations for many objects, or even aggregates (e.g. model a ppa as a bucket of blobs and we can get size data directly aggregated by the blob store)

The current service is difficult to evolve because it is tightly coupled: any attempt to modify the schema runs into the slow-patch-application + high-change-friction issue which primarily exists because dozens of call sites talk directly to the storage schema even though most of them just want url generation.

mhonarc (the lists.launchpad.net UI)

This runs as an external service but our appservers do not use it as a backend - instead end users use it directly. If we modified it to write its archived information as machine readable metadata (e.g. a json file per message, per index page, per list) then our template servers which know about facets and menus and so on could efficiently grab that and render a nice page.

This would be easier than retrofitting an event system to update individual archives with different menus as LP policy and metadata change. It would even - if we want it to - permit robust renames of mailing lists and so on.

loggerhead (bazaar.launchpad.net web UI)

Like with mhonarc this already runs as a separate service; however we don't present any of its content in the main UI, and it is a constant source of poor user experience. Happily it already has a minimal json API we can use and extend.

bzr+ssh (bazaar.launchpad.net bzr protocol)

This doesn't talk to the database at all and is a shoe-in to be split out now.

distribution source package names

The package names that can be used in the bugtracker depend on what packages are present in the distribution - currently this is a non-trivial query, but it could easily be delivered via a web service to the bugtracker UI. Whether a deeper split in the packaging metadata is needed or desirable is a related but as yet unanalyzed question.

A graph database

We have many places in the system - team participation; branch merged-status, package set traversals and probably more. A generic high performance graph database supporting caching, reachability oracles, parallelism could be used to simplify much of the graph using places in Launchpad. While we could use a postgresql datatype, previous searches of these generally are less capable than dedicated graph servers.

A subscription service

There are some things we have subscriptions to (pillars, branches, bugs, questions) all of which are implemented differently; beyond that we'd like to have ephemeral subscriptions to named objects(e.g. when someone has a browser page open on a given url and we want to push updates to the page to them), and possibly even ephemeral subscriptions to anonymous things (which we might use to implement hand-off based callbacks for event driven api scripts).

A subscription service which offers filtering, durations and callbacks on changes could provide a key piece of functionality for implementing lower latency backend services (like browser notifications when a branch has been scanned) all in a single module which can be highly optimised.

A discussions/microlist service

Similar to the subscription service we also have many objects that can be commented on, and we want the same facilities on them all - spam management, searching and indexing, reporting (what has person X commented on); but at the moment we have to do schema changes on every new object we want a discussion facility around. It might even be possible (if we wanted to) to converge one on discussion facility with mailman/mhonarc backending & optimising things.

Reporting / data warehousing

We may well need to build multiple parallel schemas for our site - one cheap-transaction to support changing data rapidly, one search schema for fast lookups, and one data warehousing schema for reporting. While we could place all these in the same DB schema, the different constraints and requirements (for query schemas want denormalised data, cheap-transactions want maximum orthogonality, warehouse schemas want aggregated data into fact tables) mean that we would benefit if we used dedicated tools for these. For instance - we'd avoid contention on memory footprint, be able to use a dedicated warehouse DB if appropriate..

Template/API service (internet facing)

This is probably the key service: the services our users (both browser and launchpadlib) talk to. Currently our least reliable tests are involved with actual service delivery - and probably always will be (the nature of the beast when we're driving browsers programmatically). We could look at a number of possible splitups, but any changes made to this service are likely to be very visible. Our public API serves two masters; the web site (where it does template rendering into fragments for page updates) and and launchpadlib, our programmatic interface for users to drive Launchpad. The launchpadlib API depends heavily on the WADL and lazr.restful zope stack - changing that (for any reason) is going to require considerable care as we have users on stable LTS releases of Ubuntu to cater to.

However, if we treat the templating and api engine as the entry-service rather than as part of the core data access service, we can dramatically simplify the testing story: a clean contract between template rendering/public api and model manipulation/optimisation/refactoring. If care is taken around how information disclosure is managed, this front end service could dispense with the entire zope security model, and with database access also removed, would have no *correctness* related thread-local information: we could use scatter-gather techniques to gather all the needed information for a page upfront concurrently rather than serially. For instance, bug page rendering would (in terms of data gathering) change from sum(time to get tasks, time to get messages, time to get questions, time to get attachments, time to render) into sum(max(time get tasks, time to get messages, time to get questions, time to get attachments), time to render) because we can parallelise obtaining data but not rendering (at least today).

Another possibility is to move all our UI into javascript generated elements on the client as proposed by Gary at the Epic. This requires its own separate analysis because of the interactions between browser compatibility, network latency and concurrent request limits, server access for bug reporting and logging in etc. Such a migration is compatible with a template/API front end service, particularly as we probably need incremental migration.

One thing that would make this service easier to implement is to stop rendering templates in API calls (at all) - and instead generate those things client side if they are being served out in an API response.

Evolving the system: how to make a change in a service based world

As with most code, we would start with a bug - lets say that the change requires both ui and primary database changes: we need to change tables in the big ball of twine we call Postgres, and we need to change the javascript and html page that user are shown. As a for instance, lets say we're working on 323000 - we're going to add a link to a canned search of 'bugs that affect me' in the users bug pages.

Say that this requires two changes: a UI change to add the link, and a backend change to have a search for 'bugs that affect me'. Fixing this bug then would require a change in the backend bug search logic, enough support for that added to the test fake for that service to use it to test the UI, and a UI change to include the link and check that following it works. We'd need the following tests:

Deploying this change would be a two step process: deploy the backend and then deploy the front end.

What about our scripts, job system etc

Case by case analysis of course, but in principle all our scripts would become internal API consumers rather than direct database users. This would prevent all the hung-transaction situations we regularly run into with cron scripts utilities and so forth.

Reporting of resource usage

We would need to make some changes in our DB user management - specifically our mid-layer appservers would need to support both api impersonation (query on behalf of user fred) and api categorisation (connect to the db as db user 'rosetta-stats'). Or we could give up (this very useful) metric of db utilitisation and the related security that having different users can offer.

Friction in this design

Many trees

Single user facing changes may require commits to multiple services. Though we can version our apis quite easily to avoid needing coordinated deploys, it will require more thought.

More testing overhead for developers

The explicit contracts and test fakes mean that developers may need to write very similar code more than once - something that might be seen as a DRY violation.

More visible components to be aware of

Running a number of microservices will increase our monitoring overhead and add complexity to our deployment and QA processes. We can mitigate the deployment and QA issue by keeping an aggressively small window between land+deploy or land+rollback (which a fast testing turnaround should support)

Benefits from this design

Clear model for integration of third party (both in code or actual service) services

Having an explicit design where we integrate services into the UI makes it easier to to reason about how we should integrate new services, as well as helping with the existing integration we have today.

Testing

Individual services can be tested in isolation and a small number of end to end tests - possibly maintained in a separate dedicated tree - used to ensure the validity of the overall integration.

The robust layering will make it hard to write overly-tall tests and encourage testing within the layer affected.

Potential for easier integration with other projects

We could choose to expose some of the back end services directly to other projects, both within Canonical/Ubuntu and externally. The clear surface area of our microservices will make it easier to audit and assess such integration projects.

Incremental schema migration

Once a service is split out, the scope that needs to be considered when doing schema migration is shrunk substantially; we may be able to do schema migrations more easily. Certainly we can do them with less extensive planning.

Smaller hardware

By running smaller dedicated services, we can likely (for many) step down to smaller capacity machines in the datacentre, which gives us more leeway to grow - if we can shrink the primary database down closer to 128GB, we can get closer to fitting all in RAM again. (The specific graph database service is one that would support that goal).

Big picture migration strategy - pay as we go

There is possibly / probably some years of work just to refactor LP from what it is today to having the services that are identified as possibilities here. When and how should we do that?

The reason to do it is to make maintaining and developing Launchpad more efficient: we could just throw money and time at any given problem eventually get there, but if we can get there quicker with our current resources, that would be better.

We need to balance the needs of our users, those of our stakeholders and our own needs.

One way is to consider this overall design change a blueprint and then fund individual changes on it on the basis of achieving some goal more efficiently or cheaply. The risk with this approach is that we may be in a local optimum where any change we make to how we do things will make us less efficient in the short term, even as it makes us more efficient in the long term when it is combined with other changes.

An alternative funding strategy is to put some small percentage of our maintenance time into migration.

That said, we have performance goals that are important to us and our stakeholders: both site performance and delivery-of-change performance. While we cannot justify an investment based on short term delivery-of-change performance, we can based on site performance: some of our functionality will be more efficiently improved by factoring them to be layered on top of high performance dedicated services. Specific examples of this are our notification and graph traversal functions.

We may well need a bootstrap even though, to open the door to doing additional service split outs - right now any attempt to use a middle tier service from the template/api stack is nearly guaranteed to lead to timeouts (because we cannot safely parallelise things). With that done, the risk of a new backend service will be significantly reduced.

So, breaking out the public api + template stack (or alternatively but less attractively fixing thread-localness to permit scatter-gather from views in the current stack) seems to be a key enabler. Similarly having a robust queuing system in the datacentre is a must (but this is a relatively cheap thing to do).