Diff for "Foundations/Webservice/Performance"

Not logged in - Log In / Register

Differences between revisions 18 and 19
Revision 18 as of 2010-05-03 18:39:31
Size: 14256
Editor: leonardr
Comment:
Revision 19 as of 2010-05-07 13:32:24
Size: 12290
Editor: leonardr
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
This page tracks the work to discover and implement the best performance improvements to the Launchpad web service.

= Process =

== First step: quantify performance ==

We want to be able to measure our performance. Ideally, this would be both end-to-end and subdivided into our network performance, our performance on the client, and our performance on the server. These have four goals.

 * Help us more accurately guess the potential effectiveness of a given solution, to help us winnow and prioritize the list.
 * Help us evaluate the effectiveness of a given solution after a full or partial implementation, to validate our efforts.
 * Help us determine what quantifiable performance level gives our users a qualitatively positive experience.
 * Help us move quickly.

The last goal means we need to find a balance between thoroughness and expediency in our construction of tests.

 [[https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py||launchpadlib script to test performance of individual requests.]]

== Second step: collect, evaluate, winnow, and prioritize possible solutions ==

We are particularly responsible for the systemic performance of the webservice. This means that we want the average performance to be good. We need to work with the Launchpad team to create good performance within individual requests, but we are more interested here with things that can make the whole webservice faster. Tools that can help developers make individual pages faster easily, but with some effort and customization, are also of interest.

Again, our solutions will focus on different aspects of the end-to-end performance of the webservice. We then have three basic areas to attack.

 * Reduce and speed network requests.
 * Make the launchpadlib requests faster systemically on the server.
 * Make the launchpadlib client faster.

== Third step: implement the next solution ==

The next solution is TBD.

== Next... ==

Rinse and repeat back to the first step, trying to determine if our quantifiable performance gives an qualitative experience that we find acceptable.
This page tracks the work to discover and implement performance improvements to the Launchpad web service.
Line 64: Line 31:
== Reinstate mod_compress on the 'api' vhost' ==

This will greatly reduce bandwidth usage, especially when serving WADL documents (1.2M to 100k). mod_compress was disabled in 2008 due to a bug which has since been fixed. We had a workaround in place, but it never worked.
Line 66: Line 37:
I hacked lazr.restful to cache completed representations
in memcached, and to use them if they were cached. This will not work
in a real situation, but it provides an upper bound on how much time
we can possibly save by using memcached. I used the [[https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py||launchpadlib test script]] throughout.
After constructing a JSON representation, we can store it in memory.
Line 71: Line 39:
=== Entries === === Initial memcached experiment ===
Line 73: Line 41:
Here's the script retrieving an entry 30 times. (I had to disable
conditional requests.)
I hacked lazr.restful to serve representations directly from memcached, without any consideration for permission control. These results gives a theoretical lower bound on how fast we can can possibly make lazr.restful by using memcached. I used the [[https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py||launchpadlib test script]] throughout. In most cases I'll only report the final number from that test: mean time for 30 fetches.
Line 76: Line 43:
{{{
Import cost: 0.44 sec
Startup cost: 1.27 sec
First fetch took 0.18 sec
First five fetches took 0.66 sec (mean: 0.13 sec)
All 30 fetches took 3.13 sec (mean: 0.10 sec)
For entries, the 30-fetch mean time was 0.11 seconds without memcached and 0.09 seconds with memcached. So fetching a single entry took about the same time whether or not the entry's representation was cached. I also found no benefit for using memcached to cache entry ETags. But for collections, the 30-fetch mean time was 0.61 seconds with memcached and 0.18 seconds without. There may be an unmeasurably small single-entry benefit that's multiplied out over a large number of entries.
Line 83: Line 45:
Import cost: 0.44 sec
Startup cost: 0.84 sec
First fetch took 0.10 sec
First five fetches took 0.50 sec (mean: 0.10 sec)
All 30 fetches took 3.31 sec (mean: 0.11 sec)
}}}
== Real (non-memcached) implementation ==
Line 90: Line 47:
I introduce memcached and here are the results: I've implemented most of a real representation caching system within lazr.restful, but I'm currently using a Python dict instead of memcached. Here's how it works: if it turns out that a client can see the full representation of an entry (there are no redacted fields), the entry's string representation is stored in the cache. Upon subsequent requests, the string will be pulled from the cache and either served whole or incorporated into a JSON representation of a collection. If the client cannot see the full representation of a cached entry, we pull the cached string, simplejson.loads() it, redact the appropriate fields, and dump it again. A collection is built from (possibly cached) entry representations, using string operations rather than simplejson.dumps().
Line 92: Line 49:
{{{
Import cost: 0.47 sec
Startup cost: 1.27 sec
First fetch took 0.17 sec
First five fetches took 0.58 sec (mean: 0.12 sec)
All 30 fetches took 2.80 sec (mean: 0.09 sec)
With no representation cache present, the 30-fetch mean time was 0.09 seconds for the entry test and 0.74 seconds for the collection test. With a Python dict serving as the representation cache, the the 30-fetch mean time was 0.09 seconds for the entry test and 0.11 seconds for the collection test. In other words, it's about as fast to get one page of a collection as it is to get a single bug. These times will go up a little when I implement a memcached representation cache, since keeping data in a Python dict is faster than keeping it in memcached.
Line 99: Line 51:
Import cost: 0.44 sec
Startup cost: 0.86 sec
First fetch took 0.08 sec
First five fetches took 0.43 sec (mean: 0.09 sec)
All 30 fetches took 2.86 sec (mean: 0.10 sec)
}}}

As you can see, there's no significant benefit to caching a single
entry representation over not caching it.

=== Collections ===

Here's the script retrieving the first page of a collection 30 times.

{{{
Import cost: 1.34 sec
Startup cost: 2.73 sec
First fetch took 0.77 sec
First five fetches took 3.01 sec (mean: 0.60 sec)
All 30 fetches took 18.28 sec (mean: 0.61 sec)
}}}

I introduce memcached and here are the results:

{{{
Import cost: 0.99 sec
Startup cost: 2.67 sec
First fetch took 0.91 sec
First five fetches took 1.98 sec (mean: 0.40 sec)
All 30 fetches took 5.26 sec (mean: 0.18 sec)
}}}

Here there is a very significant benefit to using memcached.

=== ETags ===

Then I wanted to see how much benefit would flow from caching entry
ETags. I reinstated the conditional GET code and ran another entry
test. This time I did 300 fetches.

{{{
Import cost: 0.42 sec
Startup cost: 1.22 sec
First fetch took 0.17 sec
First five fetches took 0.62 sec (mean: 0.12 sec)
All 300 fetches took 31.22 sec (mean: 0.10 sec)
}}}

Then I added code that would store the calculated ETag in
memcached. The result:
{{{
Import cost: 0.42 sec
Startup cost: 0.81 sec
First fetch took 0.13 sec
First five fetches took 0.56 sec (mean: 0.11 sec)
All 300 fetches took 32.85 sec (mean: 0.11 sec)
}}}

Again, there was no significant difference on the level of individual
entries.

== Implementation ==

I implemented most of a real representation caching system within lazr.restful. I'm currently using a Python dict instead of memcached. Here's how it works: if it turns out that a client can see the full representation of an entry (there are no redacted fields), the entry's string representation is stored in the cache. Upon subsequent requests, the string will be pulled from the cache and either served whole or incorporated into a JSON representation of a collection. If the client cannot see the full representation of a cached entry, we pull the cached string, simplejson.loads() it, redact the appropriate fields, and dump it again.

Here's the theoretical maximum performance we can get.

Entry test:

{{{
Import cost: 0.19 sec
Startup cost: 0.74 sec
First fetch took 0.11 sec
First five fetches took 0.81 sec (mean: 0.16 sec)
All 30 fetches took 3.15 sec (mean: 0.10 sec)
}}}

Collection test:

{{{
Import cost: 0.20 sec
Startup cost: 1.66 sec
First fetch took 0.85 sec
First five fetches took 1.28 sec (mean: 0.26 sec)
All 30 fetches took 3.21 sec (mean: 0.11 sec)
}}}

In other words, it's about as fast to get a page of a collection as it is to get a single bug. These tests run faster than the memcached tests because keeping data in a Python dict is faster than keeping it in memcached.

There is one big problem, though. These tests were run after I disabled the code that checks to see which fields need to be redacted. Making those permission checks is, by far, the most expensive part of this code--adding them back in negates almost all of the benefit of the representation cache.

In the vast majority of cases these permission checks are redundant. Not only can the client see the field's value, the value is public--_anyone_ can see it. If lazr.restful can be told this ahead of time, it can skip the check. If we only check the fields that could conceivably take on a value of 'redacted', we can approach the very fast times given above without compromising privacy. We could possibly generate this information as a byproduct of our Zope security definition. We could definitely do it with custom annotations (@public_fields), but unless we do it as a whitelist this could cause security problems as new fields are added to the schema.

Another possibility is to make the permission checks faster. Since they are per-user, I don't think it makes sense to cache them across requests.
The main problem is invalidation. My implementation invalidates an object's representation cache on an ObjectModifiedEvent (though this also needs a little work), but Launchpad isn't good about sending out ObjectModifiedEvents. We might get good results by doing cleanup on the Launchpad side and then tagging the corresponding objects as being okay to cache. Do this for bugs+bugtasks+people+branches and we'll have covered the most common access scenarios.
Line 240: Line 99:

= Process =

== First step: quantify performance ==

We want to be able to measure our performance. Ideally, this would be both end-to-end and subdivided into our network performance, our performance on the client, and our performance on the server. These have four goals.

 * Help us more accurately guess the potential effectiveness of a given solution, to help us winnow and prioritize the list.
 * Help us evaluate the effectiveness of a given solution after a full or partial implementation, to validate our efforts.
 * Help us determine what quantifiable performance level gives our users a qualitatively positive experience.
 * Help us move quickly.

The last goal means we need to find a balance between thoroughness and expediency in our construction of tests.

 [[https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py||launchpadlib script to test performance of individual requests.]]

== Second step: collect, evaluate, winnow, and prioritize possible solutions ==

We are particularly responsible for the systemic performance of the webservice. This means that we want the average performance to be good. We need to work with the Launchpad team to create good performance within individual requests, but we are more interested here with things that can make the whole webservice faster. Tools that can help developers make individual pages faster easily, but with some effort and customization, are also of interest.

Again, our solutions will focus on different aspects of the end-to-end performance of the webservice. We then have three basic areas to attack.

 * Reduce and speed network requests.
 * Make the launchpadlib requests faster systemically on the server.
 * Make the launchpadlib client faster.

== Third step: implement the next solution ==

The next solution is TBD.

== Next... ==

Rinse and repeat back to the first step, trying to determine if our quantifiable performance gives an qualitative experience that we find acceptable.

This page tracks the work to discover and implement performance improvements to the Launchpad web service.

Solutions implemented

Request service root conditionally

Due to a bug in httplib2, launchpadlib was never making conditional requests for the service root even though the lazr.restfulclient tests worked. We changed the headers Launchpad serves and the problem went away.

Benefit: launchpadlib now downloads WADL only in very rare cases (when we upgrade Launchpad). Benefit accrues to existing launchpadlib installations.

In a live test, this reduced startup time from 3.9 seconds to 0.8 seconds.

Cache the service root client-side

We changed lazr.restful to serve Cache-Control headers along with the service root (WADL and JSON). For frozen versions of the web service (beta and 1.0) the Cache-Control max-age is one week; for devel it's one hour. We can tweak this further in the future.

Benefit: launchpadlib now makes HTTP requests on startup only once a week (or hour). Due to a bug in httplib2, benefit only accrues to installations with an up-to-date lazr.restfulclient.

Remove lazr.restfulclient's dependency on lazr.restful

This wasn't done for performance reasons, but it seems to be what brought launchpadlib import time from 0.36 seconds to 0.20 seconds due to time saved in pkg_resources.

Fix bug 568301

Named operations on collections were fetching the collections before invoking the operation. In a test against /edge, it took 13-19 seconds to invoke getByUniqueName on /branches. With the fix in place, it took about 1.3 seconds.

(This might not be completely fixed yet.)

Worthwhile but not implemented

Reinstate mod_compress on the 'api' vhost'

This will greatly reduce bandwidth usage, especially when serving WADL documents (1.2M to 100k). mod_compress was disabled in 2008 due to a bug which has since been fixed. We had a workaround in place, but it never worked.

Store representations in memcached

After constructing a JSON representation, we can store it in memory.

Initial memcached experiment

I hacked lazr.restful to serve representations directly from memcached, without any consideration for permission control. These results gives a theoretical lower bound on how fast we can can possibly make lazr.restful by using memcached. I used the https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py throughout. In most cases I'll only report the final number from that test: mean time for 30 fetches.

For entries, the 30-fetch mean time was 0.11 seconds without memcached and 0.09 seconds with memcached. So fetching a single entry took about the same time whether or not the entry's representation was cached. I also found no benefit for using memcached to cache entry ETags. But for collections, the 30-fetch mean time was 0.61 seconds with memcached and 0.18 seconds without. There may be an unmeasurably small single-entry benefit that's multiplied out over a large number of entries.

Real (non-memcached) implementation

I've implemented most of a real representation caching system within lazr.restful, but I'm currently using a Python dict instead of memcached. Here's how it works: if it turns out that a client can see the full representation of an entry (there are no redacted fields), the entry's string representation is stored in the cache. Upon subsequent requests, the string will be pulled from the cache and either served whole or incorporated into a JSON representation of a collection. If the client cannot see the full representation of a cached entry, we pull the cached string, simplejson.loads() it, redact the appropriate fields, and dump it again. A collection is built from (possibly cached) entry representations, using string operations rather than simplejson.dumps().

With no representation cache present, the 30-fetch mean time was 0.09 seconds for the entry test and 0.74 seconds for the collection test. With a Python dict serving as the representation cache, the the 30-fetch mean time was 0.09 seconds for the entry test and 0.11 seconds for the collection test. In other words, it's about as fast to get one page of a collection as it is to get a single bug. These times will go up a little when I implement a memcached representation cache, since keeping data in a Python dict is faster than keeping it in memcached.

The main problem is invalidation. My implementation invalidates an object's representation cache on an ObjectModifiedEvent (though this also needs a little work), but Launchpad isn't good about sending out ObjectModifiedEvents. We might get good results by doing cleanup on the Launchpad side and then tagging the corresponding objects as being okay to cache. Do this for bugs+bugtasks+people+branches and we'll have covered the most common access scenarios.

Not worthwhile/too much work

Speed up launchpadlib startup time

This is dominated by pkg_resources setup, so there's not that much we can do. We did improve this a bit by accident (see above).

Speed up wadllib parse time

I ran an experiment to see whether it would be faster to load the wadllib Application object from a pickle rather than parsing it every time. To get pickle to work I had to use elementtree.ElementTree (pure Python) instead of cElementtree. This made the initial parse go from about .3 seconds to about 3 seconds. Unpickling the pickle took about .63 seconds, twice the time it took to just parse the XML. It doesn't seem worth it. (Though I don't really see how it can be faster to create the Application from XML than from a pickle--maybe cElementtree is just really really fast.)

Cache collections in a noSQL database

Like MongoDB. The point of this story is to support keeping questions about collections from hitting postgres. That is much more expensive than just getting the values for a single row. If we can get the collections very fast from a noSQL db, that might be a big win. It would also support getting "nested" requests (see idea below) quickly. The proposed implementation is similar to the memcached story, except that triggers in postgres would completely maintain the pre-rendered data in the persistent noSQL db, rather than invalidating cached data. We would then use indexes in mongoDB to get the nested collections back. (The problem with this is we don't have good rules for collection cache invalidation.)

Use HTTP 1.1 KeepAlive

According to Gary, getting the Launchpad stack to support HTTP 1.1 is too risky: it fails in production under as-yet-unknown circumstances.

Ideas not tested yet

Cache entries and/or collections on the client side

When serving a representation of an entry or collection, send a Cache-Control header that specifies max-age. The client will not make subsequent requests for the entry until the max-age expires. Since the client requested this representation once, it's at least somewhat likely to do it again later. How much time this saves depends on what we choose for max-age.

Collections are currently never cached, but I think that's just because we don't serve any information that would let httplib2 make a conditional request or know when the cache would expire.

This is easy to implement, but to benefit we must accept some level of cache staleness on the client side. It has to be okay for a client to spend a while ignorant of some change to an entry, and (for a collection) ignorant of entries' addition to or removal from the collection.

Caching bug comments client-side is an clear win, since they never change, but it's rare for a client to request a specific bug comment, so it's a very small win. Caching the _collection_ of a bug's comments would be a much bigger win, but then clients would go a while without knowing that a new comment was added to a bug, which we don't like.

Because we are so insistent on providing up-to-date data, I believe the scope for this solution is very small. It's possible some of the HWDB resources could use this.

Collection-specific ETags

We have no general way of calculating an ETag or a Last-Modified for a collection, but it might be possible to set up a hook method that calculates these values for specific collections. This would let launchpadlib make conditional requests for collections. _This_ would let us cache a bug's comments on the client side.

Others

  • [Network] Switch many requests to HTTP, to avoid SSL handshake costs. Since Launchpad is doing this, we should see how much time this would save and how much work it would be to piggyback on Launchpad's success.
  • [Network] Support "nested" requests: e.g., get a list of bugs, each with its 'bugtask' field expanded to contain the actual bugtask rather than a link. (ie. "bugtask" instead of "bugtask_link"). This would save one HTTP request for every bug to look up the bugtask. Less important case: get a person and a page of her bugs in a single request ("assigned_bugs_collection" instead of "assigned_bugs_collection_link""). This would save one HTTP request, period (and you'd have to make more HTTP requests to get the bugtasks, unless you could expand that inline as well).
  • Examine actual usage of launchpadlib in popular scripts to find broken abstractions, cost savings through named operations, etc.
  • [Client] Profile client and examine
  • [Server] Profile server code and examine. Maybe add zc.zservertracelog notes for when lazr.restful code starts, when it passes off to launchpad code, when it gets the result from launchpad, and when it hands the result back to the publisher. First and last values may be unnecessary--equivalent to already-existing values.
  • [Client?] Bug 274074 - It's not only more annoying, but slower to get the size of a collection returned by a named operation. This might be a client- or a server-side fix.
  • [Client] The WADL for Launchpad is 1.2 M (106 K gzipped). Without the <doc> tags, it's only 750 K (40 K gzipped). By serving a stripped-down version of the WADL by default, we could save a bit of startup time.

Process

First step: quantify performance

We want to be able to measure our performance. Ideally, this would be both end-to-end and subdivided into our network performance, our performance on the client, and our performance on the server. These have four goals.

  • Help us more accurately guess the potential effectiveness of a given solution, to help us winnow and prioritize the list.
  • Help us evaluate the effectiveness of a given solution after a full or partial implementation, to validate our efforts.
  • Help us determine what quantifiable performance level gives our users a qualitatively positive experience.
  • Help us move quickly.

The last goal means we need to find a balance between thoroughness and expediency in our construction of tests.

Second step: collect, evaluate, winnow, and prioritize possible solutions

We are particularly responsible for the systemic performance of the webservice. This means that we want the average performance to be good. We need to work with the Launchpad team to create good performance within individual requests, but we are more interested here with things that can make the whole webservice faster. Tools that can help developers make individual pages faster easily, but with some effort and customization, are also of interest.

Again, our solutions will focus on different aspects of the end-to-end performance of the webservice. We then have three basic areas to attack.

  • Reduce and speed network requests.
  • Make the launchpadlib requests faster systemically on the server.
  • Make the launchpadlib client faster.

Third step: implement the next solution

The next solution is TBD.

Next...

Rinse and repeat back to the first step, trying to determine if our quantifiable performance gives an qualitative experience that we find acceptable.

Foundations/Webservice/Performance (last edited 2010-06-25 11:46:00 by leonardr)