Persistence Layer

Launchpad suffers greatly by having no simple persistence layer. It has sql layer (Storm) but its regular business logic is intermingled with storage : we cannot test business logic without exercising the storage layer, we cannot test the storage layer without mixing in business logic tests, we cannot migrate components of our model to different storage systems (because of the intermingling). Performance suffers - there is no differentiation between object access and data retrieval, so any failure to preload data results in thousands of backend queries. And our test performance suffers because we're exercising the full stack every time we do anything. Lastly our business logic and storage logic have to change in lockstep because there is no separation between them.

As a Technical Architect
I want a dedicated, substitutable persistence layer
so that writing fast code, testing and evolution of our schema are easier to do than today.

This is not an ORM nor a separate library.

Rationale

Solving the performance problems endemic in Launchpad requires a code structure that fits the needs of our data storage layer; much of the Zope machinery we use is hostile to that structure : it assumes individual Python object and attribute access is cheap or free - which it isn't in our current conflated structure.

We need to achieve a high performance Launchpad to:

  1. Satisfy the needs our users and stakeholders.
  2. make good use of the substantial hardware investment made in running Launchpad

And we need to fix our testing story to be able to test much more rapidly than we do today.

The lack of persistence layer is one driving factor for the problems we suffer, and working around that adds significant burden to writing efficient code or tests.

Because of these considerations we're going to solve this infrastructure level issue now, so that we can (eventually) focus on interesting problems.

Stakeholders

  1. Developers [folk that write code]

Constraints and Requirements

Must

  1. Provide an easy to use layer which exposes plain old python objects on the top and works with Storm / pgsql [as appropriate] underneath.
  2. Provide an alternative backend for testing which we can run our developer-run tests (including integration, excluding persistence layer tests) against.
  3. Permit disabling DB queries once we hit the render pipeline.
  4. Provide a way of loading a comprehensive object graph in an optimal manner.
  5. Be incrementally adoptable.
  6. Encapsulate all details of how we persist the object graph.

Nice to have

  1. Not requiring a cache in the layer
  2. Being able to get a transcript of a an entire problem web request (like our timelines, but with parameters and trivially replayable) would be nice.

Must not

  1. Leak SQL concepts through the layer.
    • e.g. If we end up talking about anything other than an object graph in our business logic (e.g. table, views, etc.) then we've failed. For example, if we say things like "get all Person objects that are teams", or talking about things like "group by", joins and so forth.

Workflows

  1. Writing tests of UI / business logic.
  2. Changing the data storage/retrieval functions.

Success

Bugs are at: Bugs for this LEP (persistencelayer)

How will we know when we are done?

  1. None of our tests of Launchpad functionality require a real database to execute.
  2. All of our pages execute in 5-10 queries.

How will we measure how well we have done?

It Will Be Obvious.

Thoughts?

LEP/PersistenceLayer (last edited 2010-11-27 05:04:51 by lifeless)