Diff for "LEP/WebservicePerformance/ClientSyntax"

Not logged in - Log In / Register

Differences between revisions 3 and 4
Revision 3 as of 2011-01-26 16:10:01
Size: 8609
Editor: benji
Comment: Reformat very long lines and superfluous whitespace prior to editing.
Revision 4 as of 2011-01-26 19:28:01
Size: 10027
Editor: benji
Comment: Updated with the results of a conversation between Martin Pool, Gary Poster, and Benji York.
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
to GET references to what you want, usually via a query; and then to act on
those references. You can DEREFerence them (that is, get their details),
change the objects to which they refer (PATCH them), or DELETE them.

In this introduction, note that functions in all-capitals ("GET", "DEREF",
"PATCH", "DELETE
") denote code that connects to Launchpad over the network.
to "fetch" references to what you want, usually via a query; and then to act on
those references. You can "deref"erence them (that is, get their details),
change the objects to which they refer ("update" them), or "delete" them.

In this introduction, note that functions in all-capitals ("fetch", ""deref"",
"update", "delete
") denote code that connects to Launchpad over the network.
Line 29: Line 29:
    from launchpadlib import Launchpad, GET
    launchpad = Launchpad(...)
    from launchpadlib import Launchpad, Connection
    connection = Connection(...)

    launchpad = Launchpad(connection)
Line 32: Line 33:
    bug_refs = GET(bug_query)
}}}

Note how GET accepts a query object and returns a list of references.
    bug_refs = connection.fetch(bug_query)
}}}

Note how "fetch" accepts a query object and returns a list of references.
Line 52: Line 53:
marked as "won't fix". Query objects (e.g., laucnhpad.bugs) can have marked as "won't fix". Query objects (e.g., Laucnhpad.bugs) can have
Line 54: Line 55:
the query is executed (via GET): the query is executed (via "fetch"):
Line 60: Line 61:
    bug_refs = GET(bug_query)     bug_refs = connection.fetch(bug_query)
Line 71: Line 72:
    bug_refs = GET(bug_query)     bug_refs = connection.fetch(bug_query)
Line 79: Line 80:
GreaterThanOrEqualTo, and Between) are also available.

There are two other features to making queries within the GET function. First,
GreaterThanOrEqualTo, and Between, MostRecent) are also available.

There are two other features to making queries within the fetch function. First,
Line 92: Line 93:
    bug_refs = GET(bug_query, sort_on=Descending(bug_query.creation_date), limit=50)     bug_refs = connection.fetch(
        
bug_query, sort_on=Descending(bug_query.creation_date), limit=50)
Line 114: Line 116:
get all the data about the bugs we can call DEREF on the references. There's a
primitive for this: DEREF.

{{{#!python
    bugs = DEREF(bug_refs)
get all the data about the bugs we can call "deref" on the references. There's a
connection method for this: "deref".

{{{#!python
    bugs = connection.deref(bug_refs)
Line 128: Line 130:
DEREF represents a single web call. However, you typically won't use it,
because it is not required to return the data for all of the references, only
the first "batch," where the size of the batch is determined by the server.
The "deref" method represents a single web call. However, you typically won't
use it, because it is not required to return the data for all of the
references, only the first "batch," where the size of the batch is determined
by the server.
Line 135: Line 138:
Since this is a hassle, the DEREF function is rarely used. The batchDEREF
function (which is built on top of the primitive DEREF) is used instead. The
batchDEREF function requests batches of results* [reference is to note about
Since this is a hassle, the "deref" function is rarely used. The "batchderef"
function (which is built on top of the primitive "deref") is used instead. The
batchderef function requests batches of results* [reference is to note about
Line 142: Line 145:
    from launchpadlib import batchDEREF
    for bugs in batchDEREF(bug_refs):
    from launchpadlib import batchderef
    for bugs in batchderef(bug_refs, connection):
Line 146: Line 149:

Note that batchderef is a function that takes a Connection instance as an
argument. That's because the methods of Connection instances are restricted
(by convention) to doing only a single HTTP request. This helps script writers
better understand the performance characteristics of their applications (e.g.,
using a Connection method in a loop is likely to perform poorly). Higher-level
functionality (like batchderef) is then built on top of Connection methods.
Line 149: Line 159:
done by passing an "select" argument to batchDEREF.

{{{#!python
    for bugs in batchDEREF(bug_refs, select=(bug_query.title,)):
done by passing an "select" argument to batchderef.

{{{#!python
    for bugs in batchderef(bug_refs, select=(bug_query.title,), connection):
Line 156: Line 166:
[We've also talked about a structDEREF. Not described here.] [We've also talked about a "structderef". Not described here.]
Line 165: Line 175:
   distro_refs = GET(distro_query)
   distros = DEREF(distro_refs, select=(
   distro_refs = connection.fetch(distro_query)
   distros = connection.deref(distro_refs, select=(
Line 185: Line 195:
== Patching ==

''SUMMARY: both queries and references can be passed to PATCH, along with
== Updating ==

''SUMMARY: both queries and references can be passed to update, along with
Line 195: Line 205:
''SUMMARY: both queries and references can be passed to DELETE.'' ''SUMMARY: both queries and references can be passed to delete.''
Line 210: Line 220:
launchpadlib.people is a query (which you can GET to turn into references).

[The phrasing above suggests to me that RUN might be a better name than GET.
On the other hand, RUN implies an arbitrary operation, while GET implies
actually getting a result.]

If we DEREF one of these single items, we get all of its top-level attributes.

{{{#!python
    me = DEREF(me_ref)
launchpadlib.people is a query (which you can "fetch" to turn into references).

[The phrasing above suggests to me that "run" might be a better name than
"fetch".
On the other hand, "run" implies an arbitrary operation, while
"fetch" implies
actually getting a result.]

If we deref one of these single items, we get all of its top-level attributes.

{{{#!python
    me = connection.deref(me_ref)
Line 224: Line 234:
references from a query response if so desired, to DEREF, PATCH, or DELETE.


== R
e-constraining a collection for DEREF, PATCH, and DELETE ==

''SUMMARY: DEREF, PATCH, and DELETE can accept an alternate ``query`` argument
references from a query response if so desired, to deref, update, or delete.


==
Re-constraining a collection for deref, update, and delete ==

''SUMMARY: deref, update, and delete can accept an alternate ``query`` argument
Line 235: Line 245:

= Change log =

= 2011-01-27 =

 * add a Connection object which does authentication, and is the facade network requests

 * change "GET" to "fetch"

 * change "DEREF" to "deref"

 * change "PATCH" to "update"

 * change "DELETE" to "DELETE"

 * make "fetch", "deref", "update", and "delete" methods of Connection

 * the Launchpad now takes a connection object (this is so it can fetch the WADL without having to get the user credentials again)

 * added a note about how Connection methods are the building-blocks on which higher-level functionality can be built

 * added an optional filter type: MostRecent that returns the N most recent items

See https://dev.launchpad.net/Foundations/Webservice/ProposalQnA for background discussion on this proposal.

Writing scripts against Launchpad

The basic pattern for using the Launchpad scripting interface (launchpadlib) is to "fetch" references to what you want, usually via a query; and then to act on those references. You can "deref"erence them (that is, get their details), change the objects to which they refer ("update" them), or "delete" them.

In this introduction, note that functions in all-capitals ("fetch", ""deref"", "update", "delete") denote code that connects to Launchpad over the network. Efficient code will do this as infrequently as possible, batching work together. For example, calling a network-traversing function inside a loop is often the wrong approach.

Querying for references

SUMMARY: This section shows how to query Launchpad for collections of references. You can refine your query, ask for sorted results, and ask for only a subset of the query.

Launchpadlib exposes several top-level collections of objects (bugs, people, etc.) that you can query.

For example, here's how we could get references to all the bugs in Launchpad:

   1     from launchpadlib import Launchpad, Connection
   2     connection = Connection(...)
   3     launchpad = Launchpad(connection)
   4     bug_query = launchpad.bugs
   5     bug_refs = connection.fetch(bug_query)

Note how "fetch" accepts a query object and returns a list of references.

The bug_refs variable will contain references to all Launchpad bugs, if there are less than 10,000. [We will provide a channel to "bless" users so they can get more. Perhaps Canonical employees are auto-blessed.] Otherwise the request will generate a traceback, and you need to change your code to make a smaller query.

References are returned in a Python list.

   1     count = len(bug_refs)
   2     first_fifty = bug_refs[:50]
   3     first = bug_refs[0]

We may want to pare down the list of bugs to just the ones that have been marked as "won't fix". Query objects (e.g., Laucnhpad.bugs) can have "restrictions" applied to them that select which items will be included when the query is executed (via "fetch"):

   1     from launchpadlib import restrict
   2     bug_query = launchpad.bugs
   3     bug_query = restrict(bug_query.status, "won't fix")
   4     bug_refs = connection.fetch(bug_query)

Since launchpadlib knows that bug status can only have one of a small set of valid values, using an invalid value will generate an error.

You can use the "AnyOf" modifier to make a more inclusive filter.

   1     bug_query = launchpad.bugs
   2     bug_query = restrict(bug_query.status, AnyOf("won't fix", "Incomplete"))
   3     bug_refs = connection.fetch(bug_query)

[How about "refine" instead of "restrict". We talk/think about refining queries pretty often.] [I also prefer "refine". I forget what Leonard's argument was for "restrict": he had one.]

Other modifiers (such as LessThan, GreaterThan, LessThanOrEqualTo, GreaterThanOrEqualTo, and Between, MostRecent) are also available.

There are two other features to making queries within the fetch function. First, you can pass a field name to sort by; and, second, you can pass a start and a limit.

Note that the sorting functionality is limited to particular attributes. See the Fine Documentation to determine what fields are supported.

This would get references to the most recently created 50 bugs.

   1     from launchpadlib import Descending
   2     bug_refs = connection.fetch(
   3         bug_query, sort_on=Descending(bug_query.creation_date), limit=50)

[Future iterations of the webservice may allow sub-sorting (e.g., sort_on=[Descending(bug_query.creation_date), Ascending(bug_query.title)]); for now, only sorting at one level is supported.]

[We're assuming sorting on indexed columns is cheap. If not we might not be able to do this.]

The only other way of generating references is described below in the section titled "Named Objects".

Dereferencing

SUMMARY: This section shows how to dereference collections of references. You can ask for dereferencing of the returned objects, to make the a single dereferencing call over the network get more of what you need at once, potentially increasing your program's efficiency and speed.

References to bugs are nice, but not something we can work with directly. To get all the data about the bugs we can call "deref" on the references. There's a connection method for this: "deref".

   1     bugs = connection.deref(bug_refs)

Now we can iterate over some bugs and inspect their values:

   1     for bug in bugs:
   2         print bug.title

The "deref" method represents a single web call. However, you typically won't use it, because it is not required to return the data for all of the references, only the first "batch," where the size of the batch is determined by the server.

[As an internal optimization, only the first N (where N is around 1000) references will actually be sent to the server in the above example.]

Since this is a hassle, the "deref" function is rarely used. The "batchderef" function (which is built on top of the primitive "deref") is used instead. The batchderef function requests batches of results* [reference is to note about non-transactionality] but hides the batching operation so client code only sees an iterable of results.

   1     from launchpadlib import batchderef
   2     for bugs in batchderef(bug_refs, connection):
   3         print bug.title

Note that batchderef is a function that takes a Connection instance as an argument. That's because the methods of Connection instances are restricted (by convention) to doing only a single HTTP request. This helps script writers better understand the performance characteristics of their applications (e.g., using a Connection method in a loop is likely to perform poorly). Higher-level functionality (like batchderef) is then built on top of Connection methods.

In the above example all the attributes of the bug objects were returned. It would be better to ask only for the attributes we're interested in. That's done by passing an "select" argument to batchderef.

   1     for bugs in batchderef(bug_refs, select=(bug_query.title,), connection):
   2         print bug.title

[We've also talked about a "structderef". Not described here.]

In some situations there may not be a top-level collection of the items we're interested in. We may instead want "deep" information about an item. For example, if we wanted to know the names and ISO 3166-2 codes of all the countries in which there exist mirrors for any distribution:

   1    distro_query = launchpad.distributions
   2    distro_refs = connection.fetch(distro_query)
   3    distros = connection.deref(distro_refs, select=(
   4        distro_query.cdimage_mirrors_collection.country.name,
   5        distro_query.cdimage_mirrors_collection.country.iso3166code2))
   6    for distro in distros:
   7        print distro.cdimage_mirrors_collection.country.iso3166code2,
   8        print distro.cdimage_mirrors_collection.country.name

Note that the collection of items to be dereferenced may be heterogeneous, in which case the selection requests may be heterogeneous. They will be applied as appropriate. [We think heterogeneous requests will actually be easier to implement than enforcing homogeneity, and they can encourage fewer dereferencing requests, and we believe that we will still be able to make efficient queries for them.]

The last dereferencing feature is described in the section below titled "Named Objects."

Updating

SUMMARY: both queries and references can be passed to update, along with descriptions of what to change.

XXX Limit?

Deleting

SUMMARY: both queries and references can be passed to delete.

XXX Limit?

Named objects

Sometimes objects have well-known names (users have user names, bugs have bug numbers, etc.) and we want to referr to those objects by name. We can do that:

   1     me_ref = launchpadlib.people['benji']

Note that launchpadlib.people['benji'] results in a reference, while launchpadlib.people is a query (which you can "fetch" to turn into references).

[The phrasing above suggests to me that "run" might be a better name than "fetch". On the other hand, "run" implies an arbitrary operation, while "fetch" implies actually getting a result.]

If we deref one of these single items, we get all of its top-level attributes.

   1     me = connection.deref(me_ref)
   2     print me.name

You can also pass any iterable of these references, even combined with references from a query response if so desired, to deref, update, or delete.

Re-constraining a collection for deref, update, and delete

SUMMARY: deref, update, and delete can accept an alternate query argument when you pass in a collection. This means that only members of the collection that match the query will be modified.

[XXX Probably not implemented in first iteration; this just records an interesting idea.]

Change log

2011-01-27

  • add a Connection object which does authentication, and is the facade network requests
  • change "GET" to "fetch"
  • change "DEREF" to "deref"
  • change "PATCH" to "update"
  • change "DELETE" to "DELETE"
  • make "fetch", "deref", "update", and "delete" methods of Connection
  • the Launchpad now takes a connection object (this is so it can fetch the WADL without having to get the user credentials again)
  • added a note about how Connection methods are the building-blocks on which higher-level functionality can be built
  • added an optional filter type: MostRecent that returns the N most recent items

LEP/WebservicePerformance/ClientSyntax (last edited 2011-01-26 19:28:01 by benji)