LEP/WebservicePerformance

Not logged in - Log In / Register

Status

This is a draft. The parts of the LEP template that haven't been filled in have been left as placeholders. They are in italics.

See https://dev.launchpad.net/Foundations/Webservice/ProposalQnA for background discussion on this proposal.

Contact: Gary Poster

Improved Web Service API

The existing web service and launchpadlib implementations are very easy to write code for, but difficult to write efficient code for, and difficult to understand.

As a Launchpad project participant
I want to idiomatically write efficient scripts to automate project activities
so that I can flexibly manage and observe my project.

Consider clarifying the feature by describing what it is not?

Link this from LEP

Rationale

The only way to filter a collection is to scope it to some entry, or to invoke a named operation. These methods don't cover all, or even most, of the ways clients want to restrict our various datasets. So clients end up getting huge datasets and iterating over the whole thing, filtering them on the client level.

The named operations we do have are not standardized in any way: they're nearly-raw glimpses into our internal Python API. This makes it difficult to learn the web service and even to find a specific thing you want. For instance, this is from Edwin Grubbs's report on the OpenStack Design Summit (warthogs list, 2010-11-15):

Retrieving an entry or collection associated with some other entry (such as a bug's owner or a team's members) requires a new HTTP request. Entries are cached, but we don't send Cache-Control directives, so even when the entry is cached we end up making a (conditional) HTTP request. It's the high-latency request, not the cost of processing it on the server side, that's painful.

Client code that crosses the network boundary (bug.owner) looks exactly like client code that doesn't (bug.id). We need to stop hiding the network boundary from the user, or at least pull back from hiding it so relentlessly. It should be obvious when you write inefficient code, and/or more difficult to write it.

Currently, clients fetch collections in batches, 75 entries at a time. This causes problems when the underlying collections are changing behind the scenes. As the collections change behind the scenes, entries may show up multiple times or fall through the cracks.

Stakeholders

Jonathan Lange, Product Strategist, as proxy for the users who want to script Launchpad, including

Robert Collins, Technical Architect

Robert was one of the voices that led to us working on this proposal.

Constraints

What MUST the new behaviour provide?

Increased webservice usability, via increased uniformity and performance.

What MUST it not do?

Unnecessary Desires

Hypermedia controls to indicate which fields in the object graph can be the target of a ws.restrict.* argument.

I'm not sure that we can explain the ws.restrict.* idea itself using WADL, since it's more complicated than the ws.expand idea. We may have to settle for human-readable documentation explaining how a client can pre-traverse the object graph and send an appropriate HTTP request.

Rather than hard-coding the maximum capacity of the expander resource, we plan to publish that as a bit of information. In the simplest design, a client can get the maximum capacity of the expander resource by sending it a GET request. This information would be cached for the same amount of time as the site WADL document.

Success

Retrieving a detailed object graph still requires O(N) requests. But whereas N used to be the number of objects in the collection, N is now that number divided by approximately 75 (i.e., the maximum capacity of the expander resource).

It allows N to be much smaller than it would otherwise be, and allows most common ways of reducing N to be done on the server instead of the client.

Get rid of many of our existing one-off named operations, simplifying the service.

Clients will no longer overlook or duplicate entries as they page through the batches of a collection.

A batch PATCH allows a write operation to proceed in O(1) requests, rather than O(N).

Implementation Proposal

Names, like "restrict" and "expand", have active counter-proposals. See the client syntax section for details.

Our solution is to effectively get rid of batching. Instead of 75, the batch size will be something huge, allowing thousands or even tens or hundreds of thousands of responses.

Don't panic. For one thing, collections with 100,000 entries will be rare, because the "restrict" operation will make it much easier than it is now to get only a desired subset of a collection.

Huge collections will only occur when client code is poorly written (in which case the incredible slowness of the code will be an obvious problem) or when well-written client code actually does need to operate on a huge collection (in which case the incredible slowness of the code is to be expected).

Besides which, you won't get full representations of all 100,000 entries. When you get a collection, you'll receive a list of collapsed representations.

So, you have 100,000 links. How do you turn those links into representations? Fortunately, the expander resource (located at /expand) is designed to do just that. If you POST it a number of links, it will return a collection of full representations. If the links you POST include ws.expand arguments, the representations will be further expanded according to their ws.expand arguments.

But, the expander resource won't accept 100,000 links. It will only accept some small number, like, say, 75.

Yes, it's a bait and switch. Small-bore batching is still happening; it's just controlled by the client rather than the server. The server dumps the entire *membership* of some collection onto the client in a single atomic operation, but then it's up the client to get details about the membership in little chunks.

By the time the client is finished getting all those details, it's quite possible the membership has changed. But the client can be certain that the membership was accurate _as of the time of the initial request_.

In the current system, most requests are for individual entries, each of which is cached along with its ETag. In the new system, most requests will be for large collections of entries. It's difficult to calculate an ETag for a collection, and difficult to estimate what kind of Cache-Control header to send for one--that's why we don't do those things now and have no plans to do them.

Description

The "expand" operation

The "expand" operation lets you GET an entry or collection, *plus* some of the entries or collections that it links to. The client code will make one big HTTP request and populate an entire object graph, rather than just one object. This will make it possible to access 'bug.owner' and iterate over 'bug.owner.members' as many times as you want, without causing additional HTTP requests.

Possible client-side syntax

The discussion below is valuable because it is in context with the rest of the current document. However, please see https://dev.launchpad.net/LEP/WebservicePerformance/ClientSyntax for more recent thinking on client syntax and names. Of course, when this LEP is not a draft, these will be integrated.

This code acquires a bug's owner, and the owner's members, in a single request. If the owner turns out not to be a team, the collection of members will be empty.

   1 print bug.owner               # Raises ValueError: bug.owner is not available 
   2                               # on this side of the network boundary.
   3 bug = expand(bug, bug.owner, bug.owner.members)
   4 expanded_bug = GET(bug)       # Makes an HTTP request.
   5 expanded_bug.owner            # Does not raise ValueError.
   6 if bug.owner.member.is_team:  # No further HTTP requests.
   7     for member in bug.owner.members:
   8         print member.display_name

This implementation is more conservative: it must specifically request every single bit of expanded data that will be used.

   1 bug = expand(bug, bug.owner.is_team, bug.owner.members.each.display_name)
   2 expanded_bug = GET(bug)       # Makes an HTTP request.
   3 print bug.owner.name          # Raises ValueError: value wasn't expanded.
   4 if bug.owner.is_team:         # No further HTTP requests.
   5     for member in bug.owner.members:
   6         print member.display_name

Of course, these examples assume we have a specific bug we want to expand. Our problematic code makes two requests *per bug*, and plugging this code in would simply bring that number down to one request per bug.

This code takes that down to one request, period. It operates on a scoped collection instead of an individual bug, and expands every object in the collection at once.

   1 bugs = source_package.bugtasks
   2 bugs = expand(bugs, bugs.each.owner, bugs.each.owner.members)
   3 expanded_bugs = GET(bugs)     # Makes an HTTP request
   4 for bug in expanded_bugs:     # No further HTTP requests:
   5     if bug.owner.is_team:
   6         for member in bug.owner.members:
   7             print member.display_name

Possible client-server syntax

The simplest way to support expansion is to add a general ws.expand argument to requests for entries or collections.

  GET /source_package/bugs?ws.expand=each.owner&ws.expand=each.owner.members

Specifying values for ws.expand that don't make sense will result in a 4xx response code.

Specifying values that do make sense will result in a much bigger JSON document than if you hadn't specified ws.expand. This document may take significantly longer to produce--maybe long enough that it would have timed out under the current system--but it will hopefully keep you from making lots of small HTTP requests in the future.

The "restrict" operation

The "expand" operation reduces the need to make an additional HTTP request to follow a link. The "restrict" operation reduces the number of links that need to be followed in the first place, by allowing general server-side filters to be placed on a collection before the data is returned.

The client may request a collection with filters applied to any number of filterable fields. Which fields are "filterable" will be specified through hypermedia: they'll probably be the fields on which we have database indices. The representation returned will be a subset of the collection: the subset that matches the filter(s).

Possible client-side syntax

The discussion below is valuable because it is in context with the rest of the current document. However, please see https://dev.launchpad.net/LEP/WebservicePerformance/ClientSyntax for more recent thinking on client syntax and names. Of course, when this LEP is not a draft, these will be integrated.

This code restricts a project's merge propoals to those with "Merged" status and created after a certain date.

   1 project = launchpad.projects['launchpad']
   2 proposals = project.merge_proposals
   3 proposals = restrict(proposals.each.status, "Merged")
   4 proposals = restrict(proposals.each.date_created, GreaterThan(some_date))
   5 some_proposals = GET(proposals)
   6 for proposal in some_proposals:
   7     ...

Two great features to note:

  1. We can apply the date_created filter on the server side, reducing the time and bandwidth expenditure.
  2. We no longer need to publish the getMergeProposals named operation at all. The only purpose of that operation was to let users filter merge proposals by status, and that's now a general feature. In the aggregate, removal of this and similar named operations will greatly simplify the web service.

You're not restricted to filtering collections based on properties of their entries. You can filter based on properties of entries further down the object graph. This code filters a scoped collection of bugs based on a field of the bug's owner. (There may be better ways to do this particular thing, but this should make it very clear what's going on.)

   1 project = launchpad.projects['launchpad']
   2 bugs = project.bugs
   3 bugs = restrict(bugs.owner.name, 'leonardr')
   4 my_launchpad_bugs = GET(bugs)

Possible client-server syntax

The simplest way to do this is to add a series of ws.restrict query arguments, each of which works similarly to ws.expand.

  GET /launchpad/bugs?ws.restrict.owner.name=leonardr

If your value for a ws.restrict.* argument makes no sense, or you specify a ws.restrict.* argument that doesn't map correctly onto the object graph, you'll get a 4xx error. If your arguments do make sense, you'll get a smaller collection than you would have otherwise gotten.

Potential increments

Either of the two large pieces of functionality (i.e., get the list of URIs for a set of objects, and given a set of URIs expand the details) could be useful in and of themselves. Therefore either could be chosen as an incremental deliverable in order to facilitate a faster feedback cycle and make the feature development more granular.

Workflows

See https://dev.launchpad.net/LEP/WebservicePerformance/ClientSyntax

Risks

What are the risks associated with this implementation? Consider risks that may make the implementation fail to meet its goals, risks that may make the delivery time slip, security risks, performance risks, and usability/documentation risks.

Experiment 1

It will often be appropriate to construct one or more experiments to address the identified risks before proceeding to the implementation step. Make sure that the effort to construct experiments is significantly less than the effort expected to be expended on the implementation!

Goal

What is the goal of the experiment? What risk or risks do you intend to explore?

Design

How will the experiment work?

Result

How did it turn out?

Thoughts?

Example cases

It could be useful to look at current API uses, or to contact authors of API clients, to gather some cases that are currently difficult or slow.

One such thing is Laika which "uses the Launchpad API to get a list of all the bugs you worked on in the past week, separated by bugs you own, bugs you've commented on, and bugs you've reported".

Another is the UDD hottest100 script, which wants to gather a lot of information about many branches and packages in Ubuntu.

... more here ...

For each of these:

Questions

LEP/WebservicePerformance (last edited 2011-01-26 19:36:02 by benji)