Diskless Apt Archives

Serve PPA files directly from the librarian, rather than from a single machine's multi-terabyte filesystem

Contact: William Grant
On Launchpad: https://bugs.launchpad.net/launchpad-project/+bugs?field.tag=diskless-archives

Rationale

This project aims to solve scaling and latency problems that impact the user experience for publishing and consuming software for a fee.

Such software is built into commercial PPAs, which like all PPAs are currently hosted on a single machine germanium. Access to a specific piece of software is granted by the system writing a separate access control file in each archive, which occurs an arbitrary amount of time after the API call to grant access completes, leading to delays and a poor experience.

The current architecture leads to a very large footprint for any machine wanting to be ppa.launchpad.net: it needs multiple TB of disk, and enough IO and CPU bandwidth to run all the PPA maintenance functions (uploading, publication, access control, log analysis). We are struggling with load today, and while a newer machine would defer that struggle, the lack of a scaling story means it would be only moderate amount of time before we face the same problem again, but without the ability to fix it by upgrading. uploading is not (at present) a scaling problem for us, though it is bound to the same hostname which means we need to change how uploading is handled to be able to scale ppa.launchpad.net.

The project will be successful if our sysadmins can easily and effectively add capacity to handle rapid and substantial increases in the number of PPAs and number of users of PPAs, and software centre users get their purchases immediately without hassles (introduced by Launchpad).

Stakeholders

User stories

As a Software Center customer
I want my download to start immediately after purchase
so that I can use my new application as soon as possible.

As a Software Center customer
I want my downloads to be quick
so that I can use my new application as soon as possible.

As a package uploader
I want my archive to be updated quickly
so that my users can use my new package as soon as possible.

As a commercial application provider
I want Launchpad PPAs to scale easily to cope with my app's downloads
so that I can worry about more important things than distribution.

As a Launchpad sysadmin
I want to add new PPA download capacity easily and rapidly on modest hardware
so that I can quickly respond to and mitigate high load situations.

Constraints and Requirements

Must

Nice to have

Must not

Undesirable

Success

How will we know when we are done?

We can seamlessly increase capacity to handle additional PPA downloads, without downtime or other service disruption.

Users can download packages from private PPAs immediately after activating their subscription.

How will we measure how well we have done?

Thoughts?

LEP/DisklessArchives (last edited 2012-07-02 02:53:14 by lifeless)