Diff for "Foundations/Webservice/DraftProposal"

Not logged in - Log In / Register

Differences between revisions 6 and 15 (spanning 9 versions)
Revision 6 as of 2010-11-16 21:04:25
Size: 13237
Editor: leonardr
Comment:
Revision 15 as of 2010-11-17 19:00:20
Size: 17915
Editor: benji
Comment: Un-wiki-name "OpenStack"
Deletions are marked like this. Additions are marked like this.
Line 20: Line 20:
the OpenStack Design Summit (warthogs list, 2010-11-15): the Open``Stack Design Summit (warthogs list, 2010-11-15):
Line 174: Line 174:
The simplest way to show how to construct an expansion URL is to serve
expansion links alongside the links themselves:

{{{
'members_collection_link' : 'https://.../~team/members'
'members_collection_expand_link' : 'https://.../~team?ws.expand=members'
}}}

If we were serving HTML representations, we could serve an HTML form
allowing you to expand a number of the links in a representation at
one time.

The alternative is to punt, and put information about how to expand
links in the human-readable description of our media type. This would
mean we could never change from the ws.expand system to some other
system. With links or forms we can change it trivially.
Line 178: Line 195:
add new control information indicating which links have it.

What we need to do is update our human-readable description of our
media type to describe this new addition to the HTTP protocol.

Ideally the ws.expand syntax itself would be described using a
hypermedia control (allowing us to change it without changing the
client), but we're not sure how to do this with our existing
hypermedia standard (WADL). This idea effectively turns every single
link into a form.

The URI Templates standard lets us describe forms that look like
links, but the section of the standard that would let us do something
like ws.expand is not defined, and clients that don't understand URI
Templates will incorrectly interpret a URI Template as a URL.
add new control information indicating which links have it. We just
need to program every client with this information.
Line 284: Line 288:
Our solution is to get rid of batching. If you ask for a collection,
and
the collection contains 100,000 entries, you will get a
Our solution is to effectively get rid of batching. Instead of 75, the
batch size will be something huge, like 1 million. I
f you ask for a
collection that contains 100,000 entries, you will get a
Line 299: Line 304:
entries. You'll get collapsed representations.

= Collapsed representations =
entries. When you get a collection, you'll receive a list
of
''collapsed'' representations.

== Collapsed representations ==
Line 304: Line 310:
representation.

An expanded representation of a person contains scalar data about the
person, plus representations of associated entries such as the
person's team memberships.

A collapsed representation of a person contains *no* data about the
person. It's basically just a resource identifier: a URL. If you want
*any* information about the resource, you need to GET its URL, or
visit the expander resource (see below).

= The expander resource =

= PATCHing a collection =
representation. A normal representation of a person contains scalar
data about the person, plus links to related entries and
collections. An expanded representation of a person includes some of
those related entries and collection inline. A collapsed
representation of a person contains *no* data about the person.

A collapsed representation is just a resource identifier: a URL. If
you want *any* information about the resource, you need to GET its
URL, or visit the expander resource (see below).

This makes it relatively easy for us to dump huge numbers of collapsed
representations into a response--we just have to generate the
URLs. For items like bug messages, where calculating the URL is
expensive, we might consider creating an alternate URL that's easier
to calculate.

== The expander resource ==

So, you have 100,000 links. How do you turn those links into
representations? Fortunately, the expander resource (located at
`/expand`) is designed to do just that. If you POST it a number of
links, it will return a collection of full representations. If the
links you POST include ws.expand arguments, the representations will
be further expanded according to their ws.expand arguments.

But, the expander resource won't accept 100,000 links. It will only
accept some small number, like, say, 75.

Yes, it's a bait and switch. Small-bore batching is still happening;
it's just controlled by the client rather than the server. The server
dumps the entire *membership* of some collection onto the client in a
single atomic operation, but then it's up the client to get details
about the membership in little chunks.

By the time the client is finished getting all those details, it's
quite possible the membership has changed. But the client can be
certain that the membership was accurate _as of the time of the
initial request_.

Rather than hard-coding the maximum capacity of the expander resource,
we plan to publish that as a bit of information. In the simplest
design, a client can get the maximum capacity of the expander resource
by sending it a GET request. This information would be cached for the
same amount of time as the site WADL document.

= Batch PATCH =

Any collection URL that you can GET, you can also PATCH. Here's a
simple example: marking all of the bugs assigned to leonardr as fixed.

{{{
 PATCH HTTP/1.1 /launchpad/bugs?ws.restrict.owner.name=leonardr
 Content-type: application/json

 {'status' : 'Fixed'}
}}}

As you can see, this is a potentially dangerous feature. That simple
example does something that probably shouldn't be done. But it's not
much different from examples that would be very useful (such as taking
all the Fix Committed bugs for a given release, and marking them Fix
Released).

We can reduce the danger a little by shunning the normal Python-like
syntax for modifying objects, in favor of something that makes it more
clear that you're modifying an entire dataset at once.

{{{#!python
project = launchpad.projects['launchpad']
bugs = project.bugs
bugs = restrict(bugs.owner.name, 'leonardr')
PATCH(bugs, {'status' : 'Fixed'})
}}}

Unanswered questions:

 * What should the HTTP response be? What if some of the operations succeed and others fail? Should the entire PATCH be rolled back, or should we use the 207 Multi-Status response code defined by WebDAV?
 * Can the server reject a batch PATCH that it feels operates on too wide a dataset? I really don't think it should be possible to PATCH every single bug in Launchpad with a single operation. If the server can reject a PATCH, how does the client respond? Can this negotiation be done automatically? (If it can, what's the point in ever rejecting a batch PATCH? It will happen anyway, it'll just go slower.)

= Conclusion: What do we get? =

 * The "expand" operation solves the problem of "too many requests". It allows us to retrieve an object graph in O(1) requests, rather than in O(N) requests.
 * Since entries in collections are served as collapsed representations, retrieving a ''detailed'' object graph still requires O(N) requests. But whereas N used to be the number of objects in the collection, N is now that number divided by 75 (ie. the maximum capacity of the expander resource).
 * The "restrict" operation solves the problem of "too much data, too few ways to slice it". It allows N to be much smaller than it would otherwise be, and allows most common ways of reducing N to be done on the server instead of the client.
 * The "restrict" operation also allows us to get rid of many of our existing one-off named operations, simplifying the service.
 * Since the batch size is now so high to be meaningless, clients will no longer overlook or duplicate entries as they page through the batches of a collection.
 * In some cases, the batch PATCH allows a write operation to proceed in O(1) requests, rather than O(N). This is a much less common scenario than a batch GET.

Statement of the problem

The existing web service and launchpadlib implementations are very easy to write code for, but difficult to write _efficient_ code for, and difficult to understand. Our users' performance problems have two main causes:

Too much data, too few ways to slice it

The only way to filter a collection is to scope it to some entry, or to invoke a named operation. These methods don't cover all, or even most, of the ways clients want to restrict our various datasets. So clients end up getting huge datasets and iterating over the whole thing, filtering them on the client level.

The named operations we do have are not standardized in any way: they're nearly-raw glimpses into our internal Python API. This makes it difficult to learn the web service and even to find a specific thing you want. For instance, this is from Edwin Grubbs's report on the OpenStack Design Summit (warthogs list, 2010-11-15):

"I also answered some questions about searching for bugs via the API. The fact that the method is named project.searchTasks() may have caused it to be ignored when reading the API docs."

Our batching system was designed to stop clients who don't need an entire dataset from having to get the whole thing. But almost all clients do need the entirety of whatever dataset they're operating on. Unlike website users, they generally aren't satisfied with the first 75 items.

The real problem is that because of our poor filtering support, the dataset a client can actually get is much larger than the dataset they need to operate on. In real usage, the batching system actually results in more HTTP requests and lower efficiency. (Though it does reduce timeouts.)

Example of problematic client code

   1 for mp in launchpad.projects['launchpad'].getMergeProposals(status='Merged'):
   2    if mp.date_created > some_date:
   3        process(mp)

We get a huge number of merge proposals and immediately discard almost all of them. If there was some way of applying the date_created filter on the server side, we could avoid this.

Too many requests

Retrieving an entry or collection associated with some other entry (such as a bug's owner or a team's members) requires a new HTTP request. Entries are cached, but we don't send Cache-Control directives, so even when the entry is cached we end up making a (conditional) HTTP request. It's the high-latency request, not the cost of processing it on the server side, that's painful.

Client code that crosses the network boundary (bug.owner) looks exactly like client code that doesn't (bug.id). We need to stop hiding the network boundary from the user, or at least pull back from hiding it so relentlessly. It should be obvious when you write inefficient code, and/or more difficult to write it.

Sending Cache-Control directives when we serve entries will mitigate this problem somewhat. Thus far we haven't done so, out of concerns about data freshness.

Example of problematic client code

   1 for bug in source_package.getBugTasks():
   2     if bug.owner is not None:
   3         if bug.owner.is_team:
   4             for member in bug.owner.members:
   5                 ...

This code looks innocuous but it has big problems. We make a separate HTTP request for every 'bug.owner' -- that's three subsequent requests for the same data. The second two requests are conditional, but that doesn't help much.

A simple Cache-Control header saying it's okay to cache an entry for five seconds would alleviate this problem. But a Cache-Control header can't stop the need to make at least _one_ request for '.owner' and '.owner.members'. This means two HTTP requests for every bug in the 'bugs' list. Let's say there are one hundred bugs and an HTTP request takes one second. This code will run in 6:40 without Cache-Control headers, and in 3:20 if we add Cache-Control headers.

It would be nice to get the running time closer to 0:10.

Predecessor documents

The "expand" operation

The "expand" operation lets you GET an entry or collection, *plus* some of the entries or collections that it links to. The client code will make one big HTTP request and populate an entire object graph, rather than just one object. This will make it possible to access 'bug.owner' and iterate over 'bug.owner.members' as many times as you want, without causing additional HTTP requests.

Possible client-side syntax

This code acquires a bug's owner, and the owner's members, in a single request. If the owner turns out not to be a team, the collection of members will be empty.

   1 bug.owner                     # Raises ValueError: bug.owner is not available 
   2                               # on this side of the network boundary.
   3 bug = expand(bug, bug.owner, bug.owner.members)
   4 expanded_bug = GET(bug)       # Makes an HTTP request.
   5 expanded_bug.owner            # Does not raise ValueError.
   6 if bug.owner.member.is_team:  # No further HTTP requests.
   7     for member in bug.owner.members:
   8         print member.display_name

This implementation is more conservative: it must specifically request every single bit of expanded data that will be used.

   1 bug = expand(bug, bug.owner.is_team, bug.owner.members.each.display_name)
   2 expanded_bug = GET(bug)       # Makes an HTTP request.
   3 bug.owner.name                # Raises ValueError: value wasn't expanded.
   4 if bug.owner.is_team:         # No further HTTP requests.
   5     for member in bug.owner.members:
   6         print member.display_name

Of course, these examples assume we have a specific bug we want to expand. Our problematic code makes two requests *per bug*, and plugging this code in would simply bring that number down to one request per bug.

This code takes that down to one request, period. It operates on a scoped collection instead of an individual bug, and expands every object in the collection at once.

   1 bugs = source_package.bugtasks
   2 bugs = expand(bugs, bugs.each.owner, bugs.each.owner.members)
   3 expanded_bugs = GET(bugs)     # Makes an HTTP request
   4 for bug in expanded_bugs:     # No further HTTP requests:
   5     if bug.owner.is_team:
   6         for member in bug.owner.members:
   7             print member.display_name

Possible client-server syntax

The simplest way to support expansion is to add a general ws.expand argument to requests for entries or collections.

  GET /source_package/bugs?ws.expand=each.owner&ws.expand=each.owner.members

Specifying values for ws.expand that don't make sense will result in a 4xx response code.

Specifying values that do make sense will result in a much bigger JSON document than if you hadn't specified ws.expand. This document may take significantly longer to produce--maybe long enough that it would have timed out under the current system--but it will hopefully keep you from making lots of small HTTP requests in the future.

Hypermedia controls

The simplest way to show how to construct an expansion URL is to serve expansion links alongside the links themselves:

'members_collection_link' : 'https://.../~team/members'
'members_collection_expand_link' : 'https://.../~team?ws.expand=members'

If we were serving HTML representations, we could serve an HTML form allowing you to expand a number of the links in a representation at one time.

The alternative is to punt, and put information about how to expand links in the human-readable description of our media type. This would mean we could never change from the ws.expand system to some other system. With links or forms we can change it trivially.

We already use semantic indicators to distinguish between data attributes that are scalar (like person['name']) and attributes that are links which can be followed (like bug['owner_link']). Every single link-style attribute will get this new ability, so there's no need to add new control information indicating which links have it. We just need to program every client with this information.

The "restrict" operation

The "expand" operation reduces the need to make an additional HTTP request to follow a link. The "restrict" operation reduces the number of links that need to be followed in the first place, by allowing general server-side filters to be placed on a collection before the data is returned.

The client may request a collection with filters applied to any number of filterable fields. Which fields are "filterable" will be specified through hypermedia: they'll probably be the fields on which we have database indices. The representation returned will be a subset of the collection: the subset that matches the filter(s).

Possible client-side syntax

This code restricts a project's merge propoals to those with "Merged" status and created after a certain date.

   1 project = launchpad.projects['launchpad']
   2 proposals = project.merge_proposals
   3 proposals = restrict(proposals.each.status, "Merged")
   4 proposals = restrict(proposals.each.date_created, GreaterThan(some_date))
   5 some_proposals = GET(proposals)
   6 for proposal in some_proposals:
   7     ...

Two great features to note:

  1. We can apply the date_created filter on the server side, reducing the time and bandwidth expenditure.
  2. We no longer need to publish the getMergeProposals named operation at all. The only purpose of that operation was to let users filter merge proposals by status, and that's now a general feature. In the aggregate, removal of this and similar named operations will greatly simplify the web service.

You're not restricted to filtering collections based on properties of their entries. You can filter based on properties of entries further down the object graph. This code filters a scoped collection of bugs based on a field of the bug's owner. (There may be better ways to do this particular thing, but this should make it very clear what's going on.)

   1 project = launchpad.projects['launchpad']
   2 bugs = project.bugs
   3 bugs = restrict(bugs.owner.name, 'leonardr')
   4 my_launchpad_bugs = GET(bugs)

Possible client-server syntax

The simplest way to do this is to add a series of ws.restrict query arguments, each of which works similarly to ws.expand.

  GET /launchpad/bugs?ws.restrict.owner.name=leonardr

If your value for a ws.restrict.* argument makes no sense, or you specify a ws.restrict.* argument that doesn't map correctly onto the object graph, you'll get a 4xx error. If your arguments do make sense, you'll get a smaller collection than you would have otherwise gotten.

Hypermedia controls

We need to add hypermedia controls to indicate which fields in the object graph can be the target of a ws.restrict.* argument.

I'm not sure that we can explain the ws.restrict.* idea itself using WADL, since it's more complicated than the ws.expand idea. We may have to settle for human-readable documentation explaining how a client can pre-traverse the object graph and send an appropriate HTTP request.

Don't batch collections

Currently, clients fetch collections in batches, 75 entries at a time. This causes problems when the underlying collections are changing behind the scenes. As the collections change behind the scenes, entries may show up multiple times or fall through the cracks.

The entry that, five seconds ago, was item #76, may now be item #74, meaning that you'll miss it. Or the entry that used to be item #74 may now be item #76, meaning that you'll see it twice.

Our solution is to effectively get rid of batching. Instead of 75, the batch size will be something huge, like 1 million. If you ask for a collection that contains 100,000 entries, you will get a representation of all 100,000 entries.

Don't panic. For one thing, collections with 100,000 entries will be rare, because the "restrict" operation will make it much easier than it is now to get only a desired subset of a collection.

Huge collections will only occur when client code is poorly written (in which case the incredible slowness of the code will be an obvious problem) or when well-written client code actually does need to operate on a huge collection (in which case the incredible slowness of the code is to be expected).

Besides which, you won't get full representations of all 100,000 entries. When you get a collection, you'll receive a list of collapsed representations.

Collapsed representations

A collapsed representation is the opposite of an expanded representation. A normal representation of a person contains scalar data about the person, plus links to related entries and collections. An expanded representation of a person includes some of those related entries and collection inline. A collapsed representation of a person contains *no* data about the person.

A collapsed representation is just a resource identifier: a URL. If you want *any* information about the resource, you need to GET its URL, or visit the expander resource (see below).

This makes it relatively easy for us to dump huge numbers of collapsed representations into a response--we just have to generate the URLs. For items like bug messages, where calculating the URL is expensive, we might consider creating an alternate URL that's easier to calculate.

The expander resource

So, you have 100,000 links. How do you turn those links into representations? Fortunately, the expander resource (located at /expand) is designed to do just that. If you POST it a number of links, it will return a collection of full representations. If the links you POST include ws.expand arguments, the representations will be further expanded according to their ws.expand arguments.

But, the expander resource won't accept 100,000 links. It will only accept some small number, like, say, 75.

Yes, it's a bait and switch. Small-bore batching is still happening; it's just controlled by the client rather than the server. The server dumps the entire *membership* of some collection onto the client in a single atomic operation, but then it's up the client to get details about the membership in little chunks.

By the time the client is finished getting all those details, it's quite possible the membership has changed. But the client can be certain that the membership was accurate _as of the time of the initial request_.

Rather than hard-coding the maximum capacity of the expander resource, we plan to publish that as a bit of information. In the simplest design, a client can get the maximum capacity of the expander resource by sending it a GET request. This information would be cached for the same amount of time as the site WADL document.

Batch PATCH

Any collection URL that you can GET, you can also PATCH. Here's a simple example: marking all of the bugs assigned to leonardr as fixed.

 PATCH HTTP/1.1 /launchpad/bugs?ws.restrict.owner.name=leonardr
 Content-type: application/json

 {'status' : 'Fixed'}

As you can see, this is a potentially dangerous feature. That simple example does something that probably shouldn't be done. But it's not much different from examples that would be very useful (such as taking all the Fix Committed bugs for a given release, and marking them Fix Released).

We can reduce the danger a little by shunning the normal Python-like syntax for modifying objects, in favor of something that makes it more clear that you're modifying an entire dataset at once.

   1 project = launchpad.projects['launchpad']
   2 bugs = project.bugs
   3 bugs = restrict(bugs.owner.name, 'leonardr')
   4 PATCH(bugs, {'status' : 'Fixed'})

Unanswered questions:

  • What should the HTTP response be? What if some of the operations succeed and others fail? Should the entire PATCH be rolled back, or should we use the 207 Multi-Status response code defined by WebDAV?
  • Can the server reject a batch PATCH that it feels operates on too wide a dataset? I really don't think it should be possible to PATCH every single bug in Launchpad with a single operation. If the server can reject a PATCH, how does the client respond? Can this negotiation be done automatically? (If it can, what's the point in ever rejecting a batch PATCH? It will happen anyway, it'll just go slower.)

Conclusion: What do we get?

  • The "expand" operation solves the problem of "too many requests". It allows us to retrieve an object graph in O(1) requests, rather than in O(N) requests.
  • Since entries in collections are served as collapsed representations, retrieving a detailed object graph still requires O(N) requests. But whereas N used to be the number of objects in the collection, N is now that number divided by 75 (ie. the maximum capacity of the expander resource).

  • The "restrict" operation solves the problem of "too much data, too few ways to slice it". It allows N to be much smaller than it would otherwise be, and allows most common ways of reducing N to be done on the server instead of the client.
  • The "restrict" operation also allows us to get rid of many of our existing one-off named operations, simplifying the service.
  • Since the batch size is now so high to be meaningless, clients will no longer overlook or duplicate entries as they page through the batches of a collection.
  • In some cases, the batch PATCH allows a write operation to proceed in O(1) requests, rather than O(N). This is a much less common scenario than a batch GET.

Foundations/Webservice/DraftProposal (last edited 2010-11-18 17:18:57 by leonardr)