This page tracks the work to discover and implement the best performance improvements to the Launchpad web service.
Process
First step: quantify performance
We want to be able to measure our performance. Ideally, this would be both end-to-end and subdivided into our network performance, our performance on the client, and our performance on the server. These have four goals.
- Help us more accurately guess the potential effectiveness of a given solution, to help us winnow and prioritize the list.
- Help us evaluate the effectiveness of a given solution after a full or partial implementation, to validate our efforts.
- Help us determine what quantifiable performance level gives our users a qualitatively positive experience.
- Help us move quickly.
The last goal means we need to find a balance between thoroughness and expediency in our construction of tests.
Second step: collect, evaluate, winnow, and prioritize possible solutions
We are particularly responsible for the systemic performance of the webservice. This means that we want the average performance to be good. We need to work with the Launchpad team to create good performance within individual requests, but we are more interested here with things that can make the whole webservice faster. Tools that can help developers make individual pages faster easily, but with some effort and customization, are also of interest.
Again, our solutions will focus on different aspects of the end-to-end performance of the webservice. We then have three basic areas to attack.
- Reduce and speed network requests.
- Make the launchpadlib requests faster systemically on the server.
- Make the launchpadlib client faster.
Third step: implement the next solution
The next solution is TBD.
Next...
Rinse and repeat back to the first step, trying to determine if our quantifiable performance gives an qualitative experience that we find acceptable.
Solutions implemented
Request service root conditionally
Due to a bug in httplib2, launchpadlib was never making conditional requests for the service root even though the lazr.restfulclient tests worked. We changed the headers Launchpad serves and the problem went away.
Benefit: launchpadlib now downloads WADL only in very rare cases (when we upgrade Launchpad). Benefit accrues to existing launchpadlib installations.
In a live test, this reduced startup time from 3.9 seconds to 0.8 seconds.
Cache the service root client-side
We changed lazr.restful to serve Cache-Control headers along with the service root (WADL and JSON). For frozen versions of the web service (beta and 1.0) the Cache-Control max-age is one week; for devel it's one hour. We can tweak this further in the future.
Benefit: launchpadlib now makes HTTP requests on startup only once a week (or hour). Due to a bug in httplib2, benefit only accrues to installations with an up-to-date lazr.restfulclient.
Remove lazr.restfulclient's dependency on lazr.restful
This wasn't done for performance reasons, but it seems to be what brought launchpadlib import time from 0.36 seconds to 0.20 seconds due to time saved in pkg_resources.
Worthwhile but not implemented
Store representations in memcached
I hacked lazr.restful to cache completed representations in memcached, and to use them if they were cached. This will not work in a real situation, but it provides an upper bound on how much time we can possibly save by using memcached. I used the https://dev.launchpad.net/Foundations/Webservice?action=AttachFile&do=view&target=performance_test.py throughout.
Entries
Here's the script retrieving an entry 30 times. (I had to disable conditional requests.)
Import cost: 0.44 sec Startup cost: 1.27 sec First fetch took 0.18 sec First five fetches took 0.66 sec (mean: 0.13 sec) All 30 fetches took 3.13 sec (mean: 0.10 sec) Import cost: 0.44 sec Startup cost: 0.84 sec First fetch took 0.10 sec First five fetches took 0.50 sec (mean: 0.10 sec) All 30 fetches took 3.31 sec (mean: 0.11 sec)
I introduce memcached and here are the results:
Import cost: 0.47 sec Startup cost: 1.27 sec First fetch took 0.17 sec First five fetches took 0.58 sec (mean: 0.12 sec) All 30 fetches took 2.80 sec (mean: 0.09 sec) Import cost: 0.44 sec Startup cost: 0.86 sec First fetch took 0.08 sec First five fetches took 0.43 sec (mean: 0.09 sec) All 30 fetches took 2.86 sec (mean: 0.10 sec)
As you can see, there's no significant benefit to caching a single entry representation over not caching it.
Collections
Here's the script retrieving the first page of a collection 30 times.
Import cost: 1.34 sec Startup cost: 2.73 sec First fetch took 0.77 sec First five fetches took 3.01 sec (mean: 0.60 sec) All 30 fetches took 18.28 sec (mean: 0.61 sec)
I introduce memcached and here are the results:
Import cost: 0.99 sec Startup cost: 2.67 sec First fetch took 0.91 sec First five fetches took 1.98 sec (mean: 0.40 sec) All 30 fetches took 5.26 sec (mean: 0.18 sec)
Here there is a very significant benefit to using memcached.
ETags
Then I wanted to see how much benefit would flow from caching entry ETags. I reinstated the conditional GET code and ran another entry test. This time I did 300 fetches.
Import cost: 0.42 sec Startup cost: 1.22 sec First fetch took 0.17 sec First five fetches took 0.62 sec (mean: 0.12 sec) All 300 fetches took 31.22 sec (mean: 0.10 sec)
Then I added code that would store the calculated ETag in memcached. The result:
Import cost: 0.42 sec Startup cost: 0.81 sec First fetch took 0.13 sec First five fetches took 0.56 sec (mean: 0.11 sec) All 300 fetches took 32.85 sec (mean: 0.11 sec)
Again, there was no significant difference on the level of individual entries.
Conclusion
If we're going to get benefits from using memcached it will have to be small benefits multiplied across the large number of entries found in a collection.
The test was intended to show the maximum possible benefit using memcached. Because of field-level permissions we can't actually serve the same representation to everybody. I was planning on storing the field-level representations in memcached and assembling them at runtime, cutting the benefits of memcached to almost nothing. But, in the vast majority of cases it will turn out people can see all of an object's fields.
So, we can run all the permission checks, and if they succeed, grab a preformatted JSON representation out of memcached and send it off, just like in this example. If some permission checks fail, we load the JSON representation into a dict (probably faster than building the dict from scratch) redact the fields whose checks fail, and dump the dict back into JSON for delivery. This will give us benefits similar to the best-case scenario seen in this test.
Not worthwhile/too much work
Speed up launchpadlib startup time
This is dominated by pkg_resources setup, so there's not that much we can do. We did improve this a bit by accident (see above).
Speed up wadllib parse time
I ran an experiment to see whether it would be faster to load the wadllib Application object from a pickle rather than parsing it every time. To get pickle to work I had to use elementtree.ElementTree (pure Python) instead of cElementtree. This made the initial parse go from about .3 seconds to about 3 seconds. Unpickling the pickle took about .63 seconds, twice the time it took to just parse the XML. It doesn't seem worth it. (Though I don't really see how it can be faster to create the Application from XML than from a pickle--maybe cElementtree is just really really fast.)
Cache collections in a noSQL database
Like MongoDB. The point of this story is to support keeping questions about collections from hitting postgres. That is much more expensive than just getting the values for a single row. If we can get the collections very fast from a noSQL db, that might be a big win. It would also support getting "nested" requests (see idea below) quickly. The proposed implementation is similar to the memcached story, except that triggers in postgres would completely maintain the pre-rendered data in the persistent noSQL db, rather than invalidating cached data. We would then use indexes in mongoDB to get the nested collections back. (The problem with this is we don't have good rules for collection cache invalidation.)
Use HTTP 1.1 KeepAlive
According to Gary, getting the Launchpad stack to support HTTP 1.1 is too risky: it fails in production under as-yet-unknown circumstances.
Ideas not tested yet
- [Network] Switch many requests to HTTP, to avoid SSL handshake costs. Since Launchpad is doing this, we should see how much time this would save and how much work it would be to piggyback on Launchpad's success.
- [Network] Support "nested" requests: e.g., get a list of bugs, each with its 'bugtask' field expanded to contain the actual bugtask rather than a link. (ie. "bugtask" instead of "bugtask_link"). This would save one HTTP request for every bug to look up the bugtask. Less important case: get a person and a page of her bugs in a single request ("assigned_bugs_collection" instead of "assigned_bugs_collection_link""). This would save one HTTP request, period (and you'd have to make more HTTP requests to get the bugtasks, unless you could expand that inline as well).
- Examine actual usage of launchpadlib in popular scripts to find broken abstractions, cost savings through named operations, etc.
- [Client] Profile client and examine
- [Server] Profile server code and examine. Maybe add zc.zservertracelog notes for when lazr.restful code starts, when it passes off to launchpad code, when it gets the result from launchpad, and when it hands the result back to the publisher. First and last values may be unnecessary--equivalent to already-existing values.