Foundations/SystemPerformance/MeasuringPagePerformance

Not logged in - Log In / Register

Measuring Page Performance

This page provides an overview of tools and techniques for measuring page load times in Launchpad.

The Ultimate Measure: Time To Interact

We evaluate our performance in terms of Time to Interact, or TTI. This value represents the soonest that a person can use a Launchpad page.

TTI has three components: network time, spent on the wire; generative time, spent on the server; and render time, spent in the browser. This is the approach used by Facebook when they optimize their pages.

How Do I Measure Time To Interact?

Time To Interact can be roughly measured as the time from when the first byte of the first connection is sent to a site, until the time that the page's DOM load event has fired and all of the JavaScript page enhancements have been installed. Since most pages have minimal JavaScript, the load event works well as the end point of the measure.

The simplest way to measure TTI is to test the page in webpagetest.org using IE8 as the browser, or to use Firebug to measure the total page load time on your local system.

Keep in mind that there are a few key factors that have a major effect on Time To Interact for an individual user:

The Launchpad.net web site statistics, Google Analytics, and #pingdom can help us segment the user base, and to gauge the impact that a particular change will have.

Measuring Network Time

Network time is any time spent waiting for the network connection to be negotiated so that a HTTP request can be sent to the server, and time spent actively returning bytes of data to us across the network. Network time often includes DNS lookups, the SSL handshake protocol, and time responding to 301 and 302 response codes. Network time does not include the time spent waiting for a server to return the first byte for a request. The webpagetest.org Waterfall and Connection graphs are very useful for calculating this metric.

Network time is the most difficult metric to improve because so many of the factors involved are not under our control.

Modern browsers allow up to six simultaneous connections per domain when loading a page. The time spent waiting for DNS and SSL negotiation on each of those connections may be time that could have been spent downloading resources or rendering the page. The first connection to a server always involves connection setup time, and thus always counts towards network time. Visual inspection of the Connection or Waterfall graphs is often needed to determine if the following five connections blocked the page load or not.

A note on SSL negotiation: the first connection takes the longest to set up due to the handshake procedure. The SSL negotiation for the following five connections to the server happen much faster because the client already has some of server information it needs. In testing we observed that the initial SSL handshake can take 750ms, but the following five handshakes only take 250ms.

Measuring Generation Time

Generation time is time spent waiting for the application's server to process a request and return the first byte of the response to us. Generation time is directly under our control, and therefore easy to improve.

Generation time can be measured with the zserver trace logs. This is the most accurate measurement technique as it does not suffer from network effects.

On webpagetest.org Generation Time is the bright green bar in the Waterfall graph. It is labelled "Time to First Byte".

When doing client-side measurements of Generation Time, keep in mind that anonymous requests will be cached by Squid, so running multiple times will result in the first test registering a large page generation time, say 3.5 seconds, and the following requests will have very very small generation times: 200ms. Tests with an authenticated user will always hit the application server itself: no caching will take place.

Measuring Render Time

Render Time is time spent in the browser processing the server's response to our request. It is the time spent waiting for a page's CSS and JavaScript to finish adding presentation and behaviour to the page markup.

Render Time is under our control but can vary greatly between the modern browsers and the ancient ones. It is also sensitive to network conditions. When testing, be sure to execute multiple test runs, and if you are using webpagetest.org, be sure that you have selected IE8 as the test browser.

On most Launchpad pages the DOM load event can serve as a proxy for the page being ready for interaction. The JavaScript on most pages executes after the load event and only takes a few hundred milliseconds to run. The delay is not noticeable to the user. As a result the Render Time of most Launchpad pages is negligible: a few hundred milliseconds at most.

We do not have a good way to measure the time spent waiting for dynamic JavaScript to make a Launchpad page usable. That time is technically render time, and it does lengthen the page's Time To Interact.

What Pages Should I Measure?

We have picked out a list of ten key pages that represent the most trafficed parts of Launchpad.

When measuring a page we prefer to take the "First View" numbers. The first view represents a new user hitting the page, and represents between 50% and 75% of the total page visits to Launchpad.net.

If there is a page that you feel is more appropriate than the ones on this list, then feel free to bring it up with the Foundations team.

Tools for Measuring Performance

PySlow

PySlow is a tool for automated collection and recording of YSlow test results. We have gathered the YSlow scores for the key launchpad pages together in one report:

webpagetest.org

It is best to use the following settings with http://webpagetest.org:

Note: Keep in mind that Launchpad has a Squid caching proxy server in front of the application servers, and the cache will capture any anonymous requests and replay them. This means that the Generation Time of the first test run using webpagetest.org will often hit the app server itself, and take a while to return, but following requests will hit the cache, and the Generation Time will be very very small: 200ms.

We have a script that generates webpagetest.org stats for the Launchpad.net key pages. The results can be found at:

Wireshark

Wireshark is one of the most powerful network traffic analysis tools available - a standard part of every sysadmin's toolkit. We can use it to dissect the Network Time and Generation Time portions of the page load.

There is a simple script, parse-ssl-times.py, that can extract the network connection times from a Packet Details Markup Language file, or PDML. The procedure is simple:

  1. Shut down all browsers that may be talking to launchpad.net
  2. Start wireshark: Alt-F2 -> gksudo wireshark

  3. Start capturing data on Port 443 (SSL) to launchpad.net (Capture filter: host launchpad.net and tcp port https)

  4. Start your browser on a blank page: about:blank (if Firebug is not doing it for you, you will also want to clear the browser cache)

  5. Load the page in the browser
  6. Stop the capture in Wireshark
  7. Export the captured packets as a PDML file: File -> Export -> 'as XML - PDML (packet details) file'

  8. Run the PDML file through the SSL stream parser script: http://people.canonical.com/~mars/ssl-study/parse-ssl-times.py

  9. (Optional) You will probably want to copy the results to a spreadsheet

Here is a spreadsheet containing the results for a number of tests: http://spreadsheets.google.com/pub?key=tXgibx8Ux70EjOMgDhrq6ZQ&output=html

zc.servertracelog

The zc.servertracelog package logs per-request execution times on the Zope application server. We have a script for parsing these log times, picking out requests to interesting URLs, and producing a histogram of the results.

Example results are here: http://people.canonical.com/~stub/lpperf.html

The script lives as utilities/page-performance-report.py along with a configuration file, page-performance-report.ini. More information about the script is available by running page-performance-report.py --help.

To write a new report configuration file, ask a LOSA for the location of the trace logs for one or a few servers. Next, look for a line that begins with an E and ends with the URL you are interested in. You want to write your regular expression so that it finds the URLs on these E lines. I suggest firing up the Python interpreter and using the re module to test your search string (be sure to escape the \+ characters!)

It is also possible to modify the report program so that individual request times are printed instead of an aggregate number.

pingdom

Pingdom can be used to test response times to Launchpad.net pages from Europe and North America. According to Pingdom's documentation, the times given are to download the HTML for the given page.

Pingdom can be used to gauge Network Time from around the globe. For example, testing of the response times for the Launchpad.net homepage reveals an interesting trend:

    London, UK
    Avg. Response Time
    45 ms

    Frankfurt, Germany
    Avg. Response Time
    160 ms

    Stockholm, Sweden
    Avg. Response Time
    304 ms

    New York, NY
    Avg. Response Time
    550 ms

    Herndon, VA
    Avg. Response Time
    570 ms

    Montreal, Canada
    Avg. Response Time
    624 ms

    Chicago, IL
    Avg. Response Time
    714 ms

    Dallas 4, TX
    Avg. Response Time
    855 ms

    Houston 3, TX
    Avg. Response Time
    888 ms

    Los Angeles, CA
    Avg. Response Time
    1,183 ms 

Zope Server Log Files

The Zope application servers have a special log format that includes information such as the time spent on SQL requests, rendering, and request count.

The exact details of the log line format may be found under lib/canonical/launchpad/webapp/servers.py, in the LaunchpadAccessLogger class.

Here is the current format, taken from the class' documentation string:

        task IP address
        X_FORWARDED_FOR
        HOST
        datetime task started
        request string  (1st line of request)
        response status
        response bytes written
        number of sql statements
        request duration
        number of ticks during traversal
        number of ticks during publication
        launchpad user id
        launchpad page id
        REFERER
        USER_AGENT

lpstats

There are various lpstats graphs available about our production environment. These can give you a general idea about the performance of systems such as our database servers. lpstats can also tell you about application server loads.

Foundations/SystemPerformance/MeasuringPagePerformance (last edited 2010-04-01 18:05:12 by mars)