Foundations/Webservice/DraftProposal

Not logged in - Log In / Register

Revision 2 as of 2010-11-15 20:28:38

Clear message

Draft of Webservice Plans

This is a place Leonard, Benji and Gary sometimes are using to scribble about ideas while we discuss. We will present it to a larger audience for discussion when we all think it hangs together well, with a basic syntax we agree on and science fiction examples of how we might rewrite existing real-world code.

Goals

Proposed approach

Leonard has called it "filter and expand". Because "filter" is a Python built-in, I'm calling it "refine and expand" here. The names are up for discussion, like everything else.

Leonard's most recent draft is here: http://pastebin.ubuntu.com/529604/

Question about the proposal: why does the proposal say that we POST to the expander? The request does not mutate, so I would think GET would be a better verb. [URL length limitations. We may be asking to expand thousands of URLs at once.]

Q & A

Why refine and expand?
See leonardr's draft.
Why get identities (possibly filtered or refined) in one request and get data for batches of identities (expanded) in multiple subsequent requests?
see leonardr's draft. but in bullet points...
it makes getting a filtered set transactional.
it makes working with the result--len, walking in reverse, walking in step, and so on--easy, natural, and efficient.
it can be regular, uniform and unlimited, if we think we can always return the set no matter how big it is, or at least with *really big* sets (we are currently envisioning a limit of somewhere between hundreds of thousands and a million).
it continues the basic refine/expand model of encouraging users to think about working with collections, and not individual entities.
Why prefer not to use strings for identifying sub-elements?
Prettier
Easier/more obvious spelling correction
help() and dir() will work on them
Tab completion
Why prefer immutable query?
Easier to reason about derived queries: always a new copy.
Unlikely to present a memory problem
Why prefer immutable results?
Snapshot of response; shows that it is not transactional
Clear parallel to a browser response
No partial updates of sets (i.e., the data in a given response collection will always be self-consistent because it was fetched inside a transaction. This is actually stated a little strongly, and therefore the argument is weakened: the set membership is transactional, but the expansions are batched.)

Potential worries:

may cause memory problems. Can come up with ways to alleviate if a problem (e.g., unchanged bits are kept) but hopefully we won't have to.
people may want to union results. To alleviate, provide an explicit function that does this, returning a new collection. Members will be reused, so the cost should be cheap.
Why do we need to be able to make a query from a response? Or (essentially the same question/same answer)...Why do we need arbitrary collections of items, instead of always filtering?
The decision on what objects to get next may need to be made after the request. Example: User chooses items from checkboxes, and we want those.
Why is it valuable to have arbitrary (local) Python callables as a filter?
We can request only the parts we need for the filter
Why do we need slicing in addition to Python callables?
Explicit statement of what ones you want, beginning or end of group or even every N items (for statistical purposes).
What syntax does benji want other than [::] for slicing?
He's not sure. Maybe it's just another server-side filter type: Slice(0, 20). His main concern is that [::] suggests too strongly that they are list-like, which they aren't so we want to keep from confusing users. I wonder if results really can be list-like-enough now, but I understand the concern, even though I don't agree yet.
Why would you want to filter a query based on a result?
result may be too big

you may want a sub-query. This raises a HTTP syntax worry: what if we effectively want to filter against a very large set? the query string will not support this because of practical url size limits (2K for IE, for instance; see http://www.boutell.com/newfaq/misc/urllength.html).

When might you want to get a single specific item, expanded?
You are looking for a specific project, like Ubuntu; or specific person, like yourself; and just want to expand it.
When might you want to get a bunch of specific heterogeneous items, all at once, expanded?

You are building a "page" for GroundControl or a hypothetical all-AJAX Launchpad. You need a bunch of things for the page, and you want it as efficiently as possible. Leonard's syntax proposes a generic "expand" resource, so we can do it in the webservice.

Why do we want to have links "typed"? Or (essentially the same question/same answer)...Why do we want the links in homogenous groups?
Because that way we can use the same proposed mechanism for expanding and filtering as we always have.
Why do we prefer to make filters specify fields for their Python callables, not entire objects?

Benji needs to make this argument, because I forget what he said, possibly because I was not completely convinced.

When you do this, you may really just want to iterate a result.
Benji also asks, "Are we initially going to allow people to expand non-leaf-node values?"

My answer is, yeah, probably, or at the very least that is intended initially. I'm not sure where Benji stands on this.

Why does specifying a field in a filter not include it in the Python representation as if you had expanded it?
reduce surprise. If you want it, include it in the expansion.

Some related questions:

A number of people (martin and lifeless among them, I think) have said that it would be really nice to be able to use the same syntax for using the webservice and for writing LP code itself (in-process, not over the network). Are we considering that?
The transactional semantics of writing LP code in-process are valuable and we will probably not want to try to make transactional-ish semantics for the webservice ever. This will quite possibly (but not necessarily, granted) lead to different syntaxes.
That said, if out-of-process had certain operations that were transactional (get(), patch(), etc.) it seems that having a start_transaction() and end_transaction() that you could wrap them with when you were in-process would be doable. But we still think that even if we had it no one would want to use it. The launchpadlib API is going to be a shadow of what's possible when you have a fast connection to the DB.

Examples

These are various competing drafts of a Python syntax, reflecting the above thoughts in various ways.

For the current thoughts (and competing syntaxes) on the HTTP side, see Leonard's draft.

Immutable request, very explicit version

Approaches in this one: - request and response are immutable - network calls are as explicit as possible

As an easy-to-see convention, function calls that make network calls are in capitals.

   1 base_req = ids(launchpad.people['canonical-ubuntu'].assigned_bugs)

This creates a request for the identities of bugs in the canonical-ubuntu team. We can make refined versions of this request. Note that nothing has gone over the wire yet, so we have no validation that there is actually a team or person named "canonical-ubuntu". In fact, "base_req" is just going to be the basis of requests that we actually do send.

   1 req1 = refine(base_req.milestone.name,  OneOf('milestone1', 'milestone2'))
   2 req2 = refine(base_req.specifications.target_milestone.name, OneOf('milestone1', 'milestone2'))

req1 and req2 are refined subsets, or filtered subsets, of the "base_req". Available filters are OneOf, AnyOf, GreaterThan, LessThan, EqualTo, and XXX (would be nice to have draft list; can we check types on the client side, and do we want to?). They can be passed specific values, as seen here, or OneOf and and AnyOf can use collections generated from previous webservice interactions, as we will see below.

(Note that we "refine" the request, rather than "filter" it. This is just because "filter" is a Python built-in. Note also that, contrary to other proposals, I did not specify making expansion hints before you make the identity request. I'll get into that more below.)

Now let's get some responses.

   1 response1 = GET(req1)
   2 response2 = GET(req2)

Each GET is a single network call, and the two responses are not yet expanded, so all we know about each are the identities (links) and types (which we knew because of what we requested, not because of any server response).

As an aside, we could have just gotten the first 50 items from one of those requests using slice notation on the request.

   1 truncated_response = GET(req1[:50])

In this case, req1[:50] returns another request object, which is not iterable. The truncated_response is.

(Benji has some valid concerns over using the standard slice spelling, and might prefer something like using a standard Python slice object as an argument to refine: truncated_response = GET(refine(req1, slice(50))). However, since the request can't be iterated, I think the semantics are harder to misuse than other getitem hacks; and it is very concise and convenient. I prefer it.)

Let's get back to the main example, in which we have response1 and response2, obtained from req1 and req2. We will GET and union the responses in one line of Python.

   1 identities = union(response1, response2)

We unioned the two responses, showing that the union is done locally. The response is now a union of the identities (links) from the two filters.

(Note that, if we had specified expansion desires earlier as is done in other proposals, I think this union would be much trickier, because we would have to keep track of what had been requested for each merged response, and somehow do a merge of requests when they overlap. This might result in a collection that would be expanded in a heterogeneous way. Alternatively, we could enforce homogenous expansion annotations in merged sets, though that seems like it could become very annoying; or merge all expansion requests into one for the unioned set, though this seems a bit too automatic. It makes more sense to me to specify what you want to expand when you make the expansion request, as I will show below.)

Now let's make a request to expand the unioned response, so we have some actual data to work with.

   1 request = expand(identities, identities.assignee, identities.milestone)

We're saying that we are interested in the top-level data of the bugs we found, the top-level data of each bug's assignee, and the top-level data for each associated milestone. For a few more examples, expand(response) would just specify the top-level data, while expand(response.assignee) would omit the top-level data, getting only the assignees.

That's an expansion request. Unlike a identity request, you can't use refine with it.

On the other hand, you can call expand on another expansion request to make a new request adding additional data requests. You can use request.identities to get to the original response collection that makes up the request. Also, as we'll discuss below, you can combine multiple expansion requests. We'll show that below. Anyway, for now, let's get the expanded values.

   1 first_fifty = GET(request[:50])
   2 second_fifty = GET(request[50:100])

Each of those collections of fifty bugs would have objects that had all their scalar top-level data, and the scalar data for the assignee and milestone.

Something to note about any expanded collection is that the result may not end up matching the original filters; the collection did, back when it was made, but it might not now. The separate request highlights this distiction. (I thought about proposing that we provide something that would locally filter as it expands, but then the expansion has to also request everything that was used as a filter initially. I decided that this was too tricky; however, for this and other stories, I am pretty sure that responses should remember the requests that generated them, so clever things can be built later.)

Why did we batch the request? Because expansion must be batched.

Why is this expansion separate from the original request? Because it shows that we can only guarantee to expand in batches--it's not transactional. It also clearly shows that the expansion is a separate request.

One downside to an expansion call that is separate from the initial filtering call for identity is that it doesn't allow optimizations of the server sending both the identifiers and the expansions at once, as other proposals have allowed. That's true; it's a trade off against making the API clearer. We were not sure we wanted to do that anyway.

Another potential downside is that it will probably be tedious to expand with explicit slices many times, for each batch. To help with that, we could provide a batching convenience for this, like the below.

   1 expanded = batchGET(expand(response, response.assignee, response.milestone))

That would give you a collection of the union of the two sets, with the expanded top-level data for those bugs.

The collection from a batchGET would be lazy, and expand batches as they are requested. This would give back some of the automation that people enjoy from the existing webservice API, while using the spelling to clarify expectations.

What if you want to collect arbitrary items from a response and make them into a new expansion request? Make a collection.

   1 coll = response.collection([response[0], response[3], response[5], response[10]])
   2 coll.add(response[-10])
   3 request = expand(coll, coll.assignee)

Collections have the semantics of a set of identities of homogenous type. The type is determined from the object from which they originate. Therefore, from a request or response of bugs, the collection is of bugs. Similarly, launchpad.poeple.collection() will make a collection of people.

What about a Python callable for a filter? We don't provide this. Instead, follow these steps.

  1. Get a collection of identities (e.g. identities = GET(request(launchpad.people)) or identities = GET(refine(request(launchpad.people).project_membership.name, AnyOf('Launchpad', 'Landscape', 'Storm')))).

  2. Expand the identities to get only the fields you need to filter the identities locally (e.g., data = batchGET(expand(identities.name)))

  3. Add every item that meets your requirements to a collection (e.g., wanted = identities.collection(o for o in identities if sounds_french(o.name))).

  4. Expand the wanted items as desired (e.g., data = batchGET(expand(wanted))) and do stuff with it.

What if a response should be part of a filter? No problem. An identity response, an expansion response, a batchGET object, or a collection can all be passed as one of the arguments to AnyOf or OneOf (e.g., AnyOf(identities).

What if you want to expand only one thing? I think practicality must trump purity here. launchpad.people['gary'] should give you an object that you can use as an expand request, so you can say gary = GET(launchpad.people['gary']). gary = GET(launchpad.me) should work too. You should also be able to say gary_request = launchpad.people['gary']; GET(expand(gary_request, gary_request.assigned_bugs)).

What if you want to expand heterogenous elements efficiently? GET and batchGET can take multiple expansion requests and aggregate them. when you provide more than one argument to GET and batchGET, you get a tuple of responses back equal to the number of requests. For instance, you can say {{{(person,), bugs = GET(request(launchpad.people['gary'])

Should we support sorting? How could we syntactically in HTTP? Would it be efficient enough on the server? Let's assume the answers to the above are "yes," "some solvable way," and "yes," and think for a moment about Python syntax. So then we are left with more questions. What should the Python syntax for sorting be? Can you union/merge sorted items and keep the sorting if they were sorted the same way? How do you get only the first N/last N/step-by N when the request is for a sorted response?

Would this work?

identities = GET(sorted(ids(launchpad.people).name)[:50])

What if you want to patch something? XXX You use the patch function. It takes an identity request or a response from identity or expand, along with

   1 items = ids(launchpad.people['canonical-ubuntu'].assigned_bugs)
   2 items = refine(items.assignee, ...)
   3 result = PATCH(items.status, "Won't Fix", items.assignee, None) # N pairs

You can use a .collection() too, and you can use items from the query tree:

(Did not think this through clearly; I think this would work.)

   1 result = PATCH(launchpad.people['gary'].name, "Yrag Retsop")

(What is the result of a PATCH? I forget what we return now.)

DELETE would work very similarly.

What if you want to create something? This is entirely separate from filter, expand, patch, and all that. It's probably some API that the collection provides, like launchpad.people.create(...) or something.

Mutable request version

XXX

Mutable response version

XXX