LEP/WebservicePerformance/ClientSyntax

Not logged in - Log In / Register

See https://dev.launchpad.net/Foundations/Webservice/ProposalQnA for background discussion on this proposal.

Writing scripts against Launchpad

The basic pattern for using the Launchpad scripting interface (launchpadlib) is to "fetch" references to what you want, usually via a query; and then to act on those references. You can "deref"erence them (that is, get their details), change the objects to which they refer ("update" them), or "delete" them.

In this introduction, note that functions in all-capitals ("fetch", ""deref"", "update", "delete") denote code that connects to Launchpad over the network. Efficient code will do this as infrequently as possible, batching work together. For example, calling a network-traversing function inside a loop is often the wrong approach.

Querying for references

SUMMARY: This section shows how to query Launchpad for collections of references. You can refine your query, ask for sorted results, and ask for only a subset of the query.

Launchpadlib exposes several top-level collections of objects (bugs, people, etc.) that you can query.

For example, here's how we could get references to all the bugs in Launchpad:

   1     from launchpadlib import Launchpad, Connection
   2     connection = Connection(...)
   3     launchpad = Launchpad(connection)
   4     bug_query = launchpad.bugs
   5     bug_refs = connection.fetch(bug_query)

Note how "fetch" accepts a query object and returns a list of references.

The bug_refs variable will contain references to all Launchpad bugs, if there are less than 10,000. [We will provide a channel to "bless" users so they can get more. Perhaps Canonical employees are auto-blessed.] Otherwise the request will generate a traceback, and you need to change your code to make a smaller query.

References are returned in a Python list.

   1     count = len(bug_refs)
   2     first_fifty = bug_refs[:50]
   3     first = bug_refs[0]

We may want to pare down the list of bugs to just the ones that have been marked as "won't fix". Query objects (e.g., Laucnhpad.bugs) can have "restrictions" applied to them that select which items will be included when the query is executed (via "fetch"):

   1     from launchpadlib import restrict
   2     bug_query = launchpad.bugs
   3     bug_query = restrict(bug_query.status, "won't fix")
   4     bug_refs = connection.fetch(bug_query)

Since launchpadlib knows that bug status can only have one of a small set of valid values, using an invalid value will generate an error.

You can use the "AnyOf" modifier to make a more inclusive filter.

   1     bug_query = launchpad.bugs
   2     bug_query = restrict(bug_query.status, AnyOf("won't fix", "Incomplete"))
   3     bug_refs = connection.fetch(bug_query)

[How about "refine" instead of "restrict". We talk/think about refining queries pretty often.] [I also prefer "refine". I forget what Leonard's argument was for "restrict": he had one.]

Other modifiers (such as LessThan, GreaterThan, LessThanOrEqualTo, GreaterThanOrEqualTo, and Between, MostRecent) are also available.

There are two other features to making queries within the fetch function. First, you can pass a field name to sort by; and, second, you can pass a start and a limit.

Note that the sorting functionality is limited to particular attributes. See the Fine Documentation to determine what fields are supported.

This would get references to the most recently created 50 bugs.

   1     from launchpadlib import Descending
   2     bug_refs = connection.fetch(
   3         bug_query, sort_on=Descending(bug_query.creation_date), limit=50)

[Future iterations of the webservice may allow sub-sorting (e.g., sort_on=[Descending(bug_query.creation_date), Ascending(bug_query.title)]); for now, only sorting at one level is supported.]

[We're assuming sorting on indexed columns is cheap. If not we might not be able to do this.]

The only other way of generating references is described below in the section titled "Named Objects".

Dereferencing

SUMMARY: This section shows how to dereference collections of references. You can ask for dereferencing of the returned objects, to make the a single dereferencing call over the network get more of what you need at once, potentially increasing your program's efficiency and speed.

References to bugs are nice, but not something we can work with directly. To get all the data about the bugs we can call "deref" on the references. There's a connection method for this: "deref".

   1     bugs = connection.deref(bug_refs)

Now we can iterate over some bugs and inspect their values:

   1     for bug in bugs:
   2         print bug.title

The "deref" method represents a single web call. However, you typically won't use it, because it is not required to return the data for all of the references, only the first "batch," where the size of the batch is determined by the server.

[As an internal optimization, only the first N (where N is around 1000) references will actually be sent to the server in the above example.]

Since this is a hassle, the "deref" function is rarely used. The "batchderef" function (which is built on top of the primitive "deref") is used instead. The batchderef function requests batches of results* [reference is to note about non-transactionality] but hides the batching operation so client code only sees an iterable of results.

   1     from launchpadlib import batchderef
   2     for bugs in batchderef(bug_refs, connection):
   3         print bug.title

Note that batchderef is a function that takes a Connection instance as an argument. That's because the methods of Connection instances are restricted (by convention) to doing only a single HTTP request. This helps script writers better understand the performance characteristics of their applications (e.g., using a Connection method in a loop is likely to perform poorly). Higher-level functionality (like batchderef) is then built on top of Connection methods.

In the above example all the attributes of the bug objects were returned. It would be better to ask only for the attributes we're interested in. That's done by passing an "select" argument to batchderef.

   1     for bugs in batchderef(bug_refs, select=(bug_query.title,), connection):
   2         print bug.title

[We've also talked about a "structderef". Not described here.]

In some situations there may not be a top-level collection of the items we're interested in. We may instead want "deep" information about an item. For example, if we wanted to know the names and ISO 3166-2 codes of all the countries in which there exist mirrors for any distribution:

   1    distro_query = launchpad.distributions
   2    distro_refs = connection.fetch(distro_query)
   3    distros = connection.deref(distro_refs, select=(
   4        distro_query.cdimage_mirrors_collection.country.name,
   5        distro_query.cdimage_mirrors_collection.country.iso3166code2))
   6    for distro in distros:
   7        print distro.cdimage_mirrors_collection.country.iso3166code2,
   8        print distro.cdimage_mirrors_collection.country.name

Note that the collection of items to be dereferenced may be heterogeneous, in which case the selection requests may be heterogeneous. They will be applied as appropriate. [We think heterogeneous requests will actually be easier to implement than enforcing homogeneity, and they can encourage fewer dereferencing requests, and we believe that we will still be able to make efficient queries for them.]

The last dereferencing feature is described in the section below titled "Named Objects."

Updating

SUMMARY: both queries and references can be passed to update, along with descriptions of what to change.

XXX Limit?

Deleting

SUMMARY: both queries and references can be passed to delete.

XXX Limit?

Named objects

Sometimes objects have well-known names (users have user names, bugs have bug numbers, etc.) and we want to referr to those objects by name. We can do that:

   1     me_ref = launchpadlib.people['benji']

Note that launchpadlib.people['benji'] results in a reference, while launchpadlib.people is a query (which you can "fetch" to turn into references).

[The phrasing above suggests to me that "run" might be a better name than "fetch". On the other hand, "run" implies an arbitrary operation, while "fetch" implies actually getting a result.]

If we deref one of these single items, we get all of its top-level attributes.

   1     me = connection.deref(me_ref)
   2     print me.name

You can also pass any iterable of these references, even combined with references from a query response if so desired, to deref, update, or delete.

Re-constraining a collection for deref, update, and delete

SUMMARY: deref, update, and delete can accept an alternate query argument when you pass in a collection. This means that only members of the collection that match the query will be modified.

[XXX Probably not implemented in first iteration; this just records an interesting idea.]

Change log

2011-01-27

LEP/WebservicePerformance/ClientSyntax (last edited 2011-01-26 19:28:01 by benji)