Diff for "ArchitectureGuide/ServicesRoadmap"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2011-05-23 05:15:09
Size: 6308
Editor: lifeless
Comment: skkkketchy
Revision 2 as of 2011-05-30 10:57:26
Size: 8111
Editor: lifeless
Comment: meta
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
= Why a Roadmap? = = About this document =
Line 5: Line 5:
Migrating to a services based design is a large project. We can't do it in one hit, and per [[ArchitectureGuide/ServicesAnalysis|the analysis]] we have a lot of different things we can do. We want to make sure that the project is a net win: we shouldn't put more effort into it than we will save in efficiency on future changes / headaches. Migrating to a services based design is a large project. We can't do it in one hit, and per [[ArchitectureGuide/ServicesAnalysis|the analysis]] we have a lot of different things we can do. We want to make sure that the project is a net win: we shouldn't put more effort into it than we will save in efficiency on future changes / headaches. One way we can do this is by making sure each individual transition we do will either pay for itself in the short term, or be part of a larger set which we expect to pay for itself in aggregate.

== Structure ==

This document provides an overview of the basic decisions we need to reach before starting widespread work on using services, and provides some top level categories we can use to assess service projects.

Each service possibility we are considering should be included in this document: If you encounter service opportunities not listed here please add them. Each service needs to be described - basically what it would do and why.

As bringing up these services is ''by definition'' the creation of a new subproject under /launchpad-project, we don't strictly have a sensible place to file bugs. However for any service possibility which is a clear-cut improvement and involves extracting code from LP itself - please feel free to file that as a bug on Launchpad.

== Level of detail ==

We don't need a huge level of detail in the roadmap: the description of each service possibility needs to be enough to let any of the following decisions be made:
 * A developer could scratch an itch and JFDI
 * A squad needs to work on this for (a time period)
 * Some other defect is best fixed by implementing this service (be that scaling, responsiveness, resilience...)

== Using LEPs ==

All services need to meet the minimum requirements for new services. Some will be very straight forward technical implementations. Others will need some refinement around what the service needs to accomplish. For instance the GPG service has questions around key management and security - a checking only service is a no-brainer but a service which might create keys or do signing needs more thought.

If a service looks complex or hard to pin down its needs with high confidence then a LEP is needed. The normal LEP process will be used, but rather than end user UI the UI is the API within the datacentre, and rather than the product strategist approving the LEP, the technical architect will.
Line 14: Line 35:

= Structure =

This roadmap is laid out in a few sections. Each section is broadly implementable separately. Each thing that we might do has a micro-dump of its possibilities: not quite a LEP. The next step to working on any given thing is either to JFDI it or to LEP it depending on the apparent complexity.

About this document

Migrating to a services based design is a large project. We can't do it in one hit, and per the analysis we have a lot of different things we can do. We want to make sure that the project is a net win: we shouldn't put more effort into it than we will save in efficiency on future changes / headaches. One way we can do this is by making sure each individual transition we do will either pay for itself in the short term, or be part of a larger set which we expect to pay for itself in aggregate.

Structure

This document provides an overview of the basic decisions we need to reach before starting widespread work on using services, and provides some top level categories we can use to assess service projects.

Each service possibility we are considering should be included in this document: If you encounter service opportunities not listed here please add them. Each service needs to be described - basically what it would do and why.

As bringing up these services is by definition the creation of a new subproject under /launchpad-project, we don't strictly have a sensible place to file bugs. However for any service possibility which is a clear-cut improvement and involves extracting code from LP itself - please feel free to file that as a bug on Launchpad.

Level of detail

We don't need a huge level of detail in the roadmap: the description of each service possibility needs to be enough to let any of the following decisions be made:

  • A developer could scratch an itch and JFDI
  • A squad needs to work on this for (a time period)
  • Some other defect is best fixed by implementing this service (be that scaling, responsiveness, resilience...)

Using LEPs

All services need to meet the minimum requirements for new services. Some will be very straight forward technical implementations. Others will need some refinement around what the service needs to accomplish. For instance the GPG service has questions around key management and security - a checking only service is a no-brainer but a service which might create keys or do signing needs more thought.

If a service looks complex or hard to pin down its needs with high confidence then a LEP is needed. The normal LEP process will be used, but rather than end user UI the UI is the API within the datacentre, and rather than the product strategist approving the LEP, the technical architect will.

Considerations

  • We have a lot of learning to do - HA and deployment will be significantly more complex. Our monitoring needs to get a lot better.
  • Some things will be dependent on object-sync facilities (e.g. rabbitmq) in the data centre.
  • Adding backend services will increase latency if done poorly.

Defaults

We need to choose various defaults for backend services.

Some have already be chosen.

Message queues

We're using rabbitMQ as decided mid 2010. Not because its the best, but because its already in use in Canonical, and we're very unlikely to gain enough using a different MQ for now: once we're solidly service based we can revisit this.

In-datacentre backend authentication

This is not decided upon as yet. Some options are:

  • micro services run behind Apache and use ip address + basic auth (must be from a known ip address with a basic auth password to work).
  • OAuth. Possibly with ip limits as per basic.

The big webapp today runs behind haproxy not apache (apache is only at the outer edge) so we should expect to implement whatever we choose directly for it (but not for other microservices).

-- RobertCollins: I'm strongly leaning towards ip address + basic auth. This will permit easy debugging, extremely lightweight client and per-request overhead, and we can add OAuth into Apache whenever we want.

In-datacentre network protocol

In the datacentre we have no latency to worry about, but we do need to worry about efficiency and ease of development. While some services already have protocols, any new service we make will need us to choose a protocol for it. No decisions yet but some options are:

  • XMLRPC: pros: already deploys, batteries included in Python and many other languages. cons: XML, RPC model rather than restful - no opportunity for caching, URLs can be opaque when debugging.
  • adhoc restful json based apis. pros: nice to look at by hand, easy to interact with manually. cons: not included in the Python standard library, optimises for things that don't really affect us.
  • google protobufs: pros: clear contracts, wire level upgrading built-in. cons: not well understood within the LP team, currently somewhat slow [in python].

Excluded options:

  • lazr.restful : Not suitable for rapid development or consumption by other servers.

High availability of backend services

HAProxy. Apache. Linux Virtual Server. (LVS per datacentre -> HAProxy -> backends with apache in each datacentre -> local connection to the microservice).

Independent Services

Things like a GPG service go here. Such services will have various benefits and need separate analysis per service.

GPGHandler

Launchpad has to manage a number of GPG keys in a few different ways:

  • Create new keys for PPA signatures
  • Validate signatures (text in, (signer, status, cleartext) out)
  • Store prepared key revocation certificates in the event of a compromise.
  • Sign new package binaries build in the build cluster.

We may not want to expose all of these as a web service (because a single lying client in the datacentre could get a hostile binary package signed). However exposing the validation of signatures is a no brainer and would save some significant operational complexity. We can iterate to add more as wanted.

Benefits: avoid cold cache warmup for GPG validation (by having a long running gpg cache dir), make GPG validation available to non-lp-zope services without the headache of direct integration. Costs: Migrate over the GPG handler code and create a test fake.

existing backend services

Things like splitting out the codehosting server, the importd engine, rosetta translation import/export services go here. The benefits from splitting these things out will be shorter test run times for the main application; as they are generally already service based the operational change should be modest. Consider these low hanging fruit - easy to do, some benefits, low risk.

script migration

Our scripts all need internal APIs setup and to be migrated to them. They belong here. Many of these scripts routinely cause headaches. We should expect to identify a raft of performance problems and data integrity / access control issues as we migrate them. (Our current scripts have a lobotomised security model in place which we would not want to keep).

optimisations

We have a number of potential optimisations best done using services: examples include

  • search
  • graph traversals/reachability queries
  • batch/aggregate workloads (map reduce)
  • long poll callbacks

These are best done using services because they either need custom databases, layer on a SQL database or will be long running and not subject to our normal transaction timeliness constraints.

integrations

We have other services which are already poorly integrated into Launchpad. Specific examples are:

  • mhonarc
  • mailman
  • loggerhead

Overhauling these so that their UI is entirely done in the LP template engine and they act as pure data sources with real-time lookups would make a significant improvement to user experience.

decoupling

We have functionality that is currently tightly coupled which we would benefit from splitting into dedicated services:

  • directory services
  • subscriptions
  • discussion/messaging
  • rendering/UI (with API bundled in because we render in the public API)

ArchitectureGuide/ServicesRoadmap (last edited 2011-06-27 16:36:30 by mbp)