Diff for "LEP/DisklessArchives"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2012-06-12 07:35:21
Size: 4444
Editor: wgrant
Comment: first stab
Revision 2 as of 2012-06-12 08:28:13
Size: 4900
Editor: lifeless
Comment: humanise
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Diskless Soyuz Archives = = Diskless Apt Archives =
Line 6: Line 6:
'''On Launchpad:''' ''Link to a blueprint, milestone or (best) a bug tag search across launchpad-project''

''Consider clarifying the feature by describing what it is not?''

''Link this from [[LEP]]''
'''On Launchpad:''' https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=diskless-archives
Line 14: Line 10:
All downloads from `ppa.launchpad.net` are presently served by a single machine, `germanium`, which also runs the uploader, the publisher, and other PPA functions. The load has recently become too much for a single old machine to handle, so we'd like to separate the roles and split the download load across multiple machines. This project aims to solve scaling and latency problems that impact the user experience for publishing and consuming software for a fee.
Line 16: Line 12:
Due to the current design, the publishing and download serving roles require access to a multi-terabyte filesystem tree of archives. A shared SAN volume is an easy way out, but it's expensive and harder to scale as well as we'd like. The librarian is already reliable, capable of handling reasonable load, and contains all the package files that make up the majority of the archive size, so serving downloads from it conveniently eliminates the need to share a synchronized multi-terabyte tree to several machines. Such software is built into ''commercial'' PPAs, which like all PPAs are currently hosted on a single machine `germanium`. Access to a specific piece of software is granted by the system writing a separate access control file in each archive, which occurs an arbitrary amount of time after the API call to grant access completes, leading to delays and a poor experience.
Line 18: Line 14:
The Apache-served static tree design also creates difficulties with granting access to private PPAs. We currently maintain a textual htpasswd file in each private archive, and use htaccess to limit access. This file can become slow to generate with tens of thousands of subscriptions, and the cron-based updates mean there is always some delay before new subscribers can actually download packages. This is most visible when users purchase apps from Software Center, as there may be a delay of several minutes before they are able to install the application. The current architecture leads to a very large footprint for any machine wanting to be `ppa.launchpad.net`: it needs multiple TB of disk, and enough IO and CPU bandwidth to run all the PPA maintenance functions (uploading, publication, access control, log analysis). We are struggling with load today, and while a newer machine would defer that struggle, the lack of a scaling story means it would be only moderate amount of time before we face the same problem again, but without the ability to fix it by upgrading. uploading is not (at present) a scaling problem for us, though it is bound to the same hostname which means we need to change how uploading is handled to be able to scale `ppa.launchpad.net`.

The project will be successful if our sysadmins can easily and effectively add capacity to handle rapid and substantial increases in the number of PPAs and number of users of PPAs, and software centre users get their purchases immediately without hassles (introduced by Launchpad).
Line 37: Line 35:
'''so that ''' my users and I aren't waiting unnecessarily.<<BR>> '''so that ''' my users can use my new package as soon as possible.<<BR>>
Line 44: Line 42:
'''I want ''' to add new PPA download capacity without requiring terabytes of disk<<BR>> '''I want ''' to add new PPA download capacity easily and rapidly on modest hardware<<BR>>
Line 53: Line 51:
 * Permit people access to private PPAs within 10 seconds of activating their subscription.  * Let private PPAs scale to additional 10's of thousands of archives.
 * Permit people access to private PPAs immediately after activating their subscription.
 * Commission and activate a new scalable `ppa.launchpad.net` node in less than one hour (after base OS install).
 * Run scalable nodes on our stock hardware build without requiring special RAM or disk configuration.
Line 57: Line 58:
 * Redundantly spread archive publication load across multiple machines.
 * Redundantly distribute uploading and upload processing between machines.
 * Reduce package publication delay.
 * Scale PPA archive publication by adding machines.
 * Scale PPA uploading and upload processing by adding machines.
 * Reduce PPA package publication delay.
 * Decrease or eliminate downtime for `ppa.launchpad.net`. Ideally it becomes a regular nodowntime target.
Line 64: Line 66:
 * Break (S)FTP uploads to the existing overloaded `ppa.launchpad.net` hostname.
 * Require inordinate amounts of local disk or data copying to commision a new frontend.
 * Require O(subscriber) operations for granting access to a private PPA.
 * Break collection of PPA download statistics.
 * Stop the Ubuntu archive from being published as a classical on-disk archive.
 * Increase downtime of `ppa.launchpad.net` unnecessarily.
 * Interfere with other parts of Launchpad (e.g. PPA statistics, Ubuntu main and universe being regular archives on disk)

=== Undesirable ===

 * Breaking (S)FTP uploads to the existing overloaded `ppa.launchpad.net` hostname. There are 4000 distinct uploaders over all of 2011, so contacting them is doable if we need to.
Line 77: Line 78:
Users can download packages from private PPAs seconds after activating their subscription. Users can download packages from private PPAs immediately after activating their subscription.
Line 81: Line 82:
We have load graphs for frontend servers and publication delay, SCA probably has graphs of subscription latency,  * SC stops seeing user pain related to the performance (download rate, subscription activation latency) of ppa.launchpad.net.
   E.g. no more bug reports or questions.
 * Publication latency for all PPAs drops down to <= 60 seconds 99% of the time.
 * Upload latency remains constant or decreases.

Diskless Apt Archives

Serve PPA files directly from the librarian, rather than from a single machine's multi-terabyte filesystem

Contact: William Grant
On Launchpad: https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=diskless-archives

Rationale

This project aims to solve scaling and latency problems that impact the user experience for publishing and consuming software for a fee.

Such software is built into commercial PPAs, which like all PPAs are currently hosted on a single machine germanium. Access to a specific piece of software is granted by the system writing a separate access control file in each archive, which occurs an arbitrary amount of time after the API call to grant access completes, leading to delays and a poor experience.

The current architecture leads to a very large footprint for any machine wanting to be ppa.launchpad.net: it needs multiple TB of disk, and enough IO and CPU bandwidth to run all the PPA maintenance functions (uploading, publication, access control, log analysis). We are struggling with load today, and while a newer machine would defer that struggle, the lack of a scaling story means it would be only moderate amount of time before we face the same problem again, but without the ability to fix it by upgrading. uploading is not (at present) a scaling problem for us, though it is bound to the same hostname which means we need to change how uploading is handled to be able to scale ppa.launchpad.net.

The project will be successful if our sysadmins can easily and effectively add capacity to handle rapid and substantial increases in the number of PPAs and number of users of PPAs, and software centre users get their purchases immediately without hassles (introduced by Launchpad).

Stakeholders

  • Consumer Apps
  • IS

User stories

As a Software Center customer
I want my download to start immediately after purchase
so that I can use my new application as soon as possible.

As a Software Center customer
I want my downloads to be quick
so that I can use my new application as soon as possible.

As a package uploader
I want my archive to be updated quickly
so that my users can use my new package as soon as possible.

As a commercial application provider
I want Launchpad PPAs to scale easily to cope with my app's downloads
so that I can worry about more important things than distribution.

As a Launchpad sysadmin
I want to add new PPA download capacity easily and rapidly on modest hardware
so that I can quickly respond to and mitigate high load situations.

Constraints and Requirements

Must

  • Allow ppa.launchpad.net HTTP(S) download frontends to scale to handle additional load without service disruption.

  • Let private PPAs scale to millions of subscribers.
  • Let private PPAs scale to additional 10's of thousands of archives.
  • Permit people access to private PPAs immediately after activating their subscription.
  • Commission and activate a new scalable ppa.launchpad.net node in less than one hour (after base OS install).

  • Run scalable nodes on our stock hardware build without requiring special RAM or disk configuration.

Nice to have

  • Scale PPA archive publication by adding machines.
  • Scale PPA uploading and upload processing by adding machines.
  • Reduce PPA package publication delay.
  • Decrease or eliminate downtime for ppa.launchpad.net. Ideally it becomes a regular nodowntime target.

Must not

  • Break compatibility with existing apt sources.list entries, including private PPA credentials.

  • Interfere with other parts of Launchpad (e.g. PPA statistics, Ubuntu main and universe being regular archives on disk)

Undesirable

  • Breaking (S)FTP uploads to the existing overloaded ppa.launchpad.net hostname. There are 4000 distinct uploaders over all of 2011, so contacting them is doable if we need to.

Success

How will we know when we are done?

We can seamlessly increase capacity to handle additional PPA downloads, without downtime or other service disruption.

Users can download packages from private PPAs immediately after activating their subscription.

How will we measure how well we have done?

  • SC stops seeing user pain related to the performance (download rate, subscription activation latency) of ppa.launchpad.net.
    • E.g. no more bug reports or questions.
  • Publication latency for all PPAs drops down to <= 60 seconds 99% of the time.

  • Upload latency remains constant or decreases.

Thoughts?

Put everything else here. Better out than in.

LEP/DisklessArchives (last edited 2012-07-02 02:53:14 by lifeless)