DisklessArchives

Not logged in - Log In / Register

We want to serve PPAs via a virtual file system rather than materialising them into a persistent cache on germanium.

See The LEP for the project constraints. The implementation approach presented here is (as usual) subject to evolution if surprises are encountered. A prototype of the highest risk components has been done (and thrown away for good measure).

Diskless Archives design and implementation

ppa.launchpad.net will become a horizontally scaling collection of HTTP/HTTPS/SFTP/FTP nodes. Each node will have a txpkgupload instance, a squid instance with Launchpad specific helpers that can serve both launchpadlibrarian.net and ppa/private-ppa.launchpad.net, cron tasks for PPA log file analysis. The Launchpad script servers will take care of PPA archive publishing, publishing all the data into the librarian. Upload processing may be centralised, or execute on each machine.

A new API daemon may be introduced, depending on the cheapest way to deliver two small internal API's the project will need. If introduced, it will run on the existing Launchpad appserver nodes, with haproxy frontending it.

this (blue/white boxes are new components).

Frontends

Squid will receive requests on HTTP or HTTPS and process them.

Squid will:

We will either need squid 3.1 with the EXT_TAG patch backported. This is available: lp:~lifeless/squid/3.1-ext-tag http://bazaar.launchpad.net/~lifeless/squid/3.1-ext-tag/revision/10456

Implementation

We need custom Squid addon helpers, and a small set of webservice API calls to Launchpad.

Squid will operate in reverse proxy ("accelerator") mode, using external_acl_type and url_rewrite_program helpers to provide decoupled authentication/authorization and path resolution.

Initially, squid will run with no synchronisation between the instances: things will be cached on multiple nodes. If load testing or scaling gives reason to be concerned, we'll introduce some form of consistent hashing: either a non-caching squid, or haproxy as a front tier (on the same hardware) that routes to a single node (while its up) for a given URL, removing all duplication from the cache farm and providing more effective horizontal scaling, at the cost of routing requests within the cluster.

PPA requests

For private-ppa and ppa requests: all requests will go through a mapper to determine the PPA from the URL. This will be purely local, no DB access, and its results are cached on the URL of the request by squid.

All requests then go through a helper to check authentication details are correct for that PPA:

The results for both of these helpers are cached by squid, so we only do one backend lookup per cache timeout per PPA+credentials tuple. Common case will be one per session per user. Auth failures won't be cached at all, so attempting 'too early' will not prevent logins.

Them, for all requests, our url_rewrite_program handler will map the requested URL to a Launchpad API for the corresponding internal librarian URL (which will take the LFC id or possibly even the file path), and Squid will stream that file (or return a 404 if the file has been deleted / does not exist).

There's a proof of concept Squid config and corresponding trivial helpers.

launchpadlibrarian requests

The request will go through a url rewrite mapper that does an LFA->LFC lookup and determines the backend data to use. This will be in the same format as for the PPA requests; (it will be the same helper in fact, but a different code path).

For restricted.launchpadlibrarian.net requests, a separate helper will be used to validate the time-limited-token on the request. That acl helper will have caching disabled.

Librarian

A new listening port will be added to the librarian, which serves data out for the new backend API call - e.g. takes LFC only, or path + file hash. This will be used by the new service as it comes up, and the two current download ports will be entirely removed once the new system takes over full operation of the launchpadlibrarian domain.

Caching (all sorts)

The responses delivered from Squid will need their cache control headers removed / replaced, to ensure that downstream http proxies do not cache content in the archive url hierarchy inappropriately. This is a problem we are well versed in for archive.ubuntu.com and the same set of rules we use there should work well.

However, the url rewrite mapper will also be taking the user supplied URL and determining a Librarian URL from it. This mapping, if cached, would have the same net effect as poor HTTP caching would, and so needs to be aware of our policies around these files:

Expected load

HTTP caching and horizontal scaling of frontends should resolve most issues with scaling to enormous volumes of data. What the new design doesn't handle well -- in fact, what it handles far worse than the existing static disk archives -- is enormous numbers of requests: where it was previously a filesystem path traversal, it's now a set of remote database queries. The caching points above are aimed at reducing the amount of redundant mapping work we do.

It's also important to note that Launchpad's database slaves have ample spare capacity, so they should be able to handle a lot of lookup requests. Most commonly hit lookups will be cached, but the Soyuz publication schema wasn't designed for efficient path resolution, so it's likely that we'll need some denormalisation or at least some creative new queries to achieve excellent performance.

Backend (API service)

The frontends will talk to the database via a webservice API of some kind. While ideally we'd use our existing lazr.restful or XML-RPC infrastructure, it's likely that the Zope stack is unacceptably slow for the request volume we expect and performance we desire (100ms response time to users 99% of the time). An independent lightweight WSGI service which just exposes the relevant API methods, directly implemented as SQL without our slow infrastructure, is an effective option to achieve this, given the narrow schema and needs of the system.

There are three calls to make:

We can run arbitrarily many instances, caching is not required, and it's read-only so it can balance across the slave databases in a fault-tolerant fashion. It will want to check missing auth creds against the master DB to handle replication latency, but they will be rare (except in a DOS situation). Even then, such lookups will be extraordinarily cheap, and our regular concurrency capping in haproxy will protect the core infrastructure from meltdown.

Backend (non-interactive)

The main backend components implicated in these changes are the Soyuz publisher jobs: process-accepted, publish-distro, and process-death-row.

These scripts currently serve both PPAs and the Ubuntu primary archive. Any changes done to them to support the new PPA system need to:

One approach is to hybridise these scripts to be controllable in a fine grained manner, and we can turn off the disk-updates when we decommission germanium, and enable the PPA publishing in a new home (e.g a Launchpad scripts server). Another approach is to produce a new (celery based perhaps) variant of these scripts dedicated to the new publishing data, and have them run even if germaniums existing publisher has published content on disk. This would avoid adding load to germanium, and make the migration process require less ops coordination (as well as reducing latency for the new system immediately, as LP script startup time could be eliminated [this last point is irrelevant for process-death-row]). A third, recommended approach is be to split the work in these scripts between on-disk logic and non-disk logic, and migrate the non-disk logic component off of germanium early in the development process, reducing its current load, and giving a single source of logic for the new schema as it comes of age. Some complexity is entailed due to the primary archive using apt-ftp archive, but the needed special cases are less than those to handle two different publication mechanisms working on the same data for PPAs.

There are a couple of other jobs of interest:

expire-archive-files is unaffected, since process-death-row or its replacement will still set dateremoved, just like now.

Uploads

PPA uploads are currently over FTP or SFTP to ppa.launchpad.net, the same hostname used by public HTTP downloads. We would like to avoid breaking this, to avoid disrupting the ~4000 people that use it. This means we have to either run upload daemons on each frontend or forward the services to other hosts, at least until we can get updated versions of e.g. dput to all users.

In terms of concurrency, it's safe to run the (S)FTP daemon (txpkgupload) on each machine. It's less safe to run process-upload on a separate upload queue on each machine, but this is easily fixed (there's no archive lock, so two uploads can be accepted concurrently when consistency checks would reject them if they were processed serially).

Availability

ppa.launchpad.net's availability has traditionally depended only on germanium's Apache and the network between there and the Internet. Diskless archives change all that. As shown in the earlier architectural diagram, we will now depend on not just the frontends but also the full Launchpad librarian and database stacks, with all the service, machine, and network dependencies that they entail.

Fault-tolerance of all non-HTTP/HTTPS PPA services will be increased, as redundant instances can be brought up on multiple machines.

PPA services should be able to move into the nodowntime set, eliminating another source of LP production variation.

New Hostname

Idea from James Troup:

Run this stuff on a new hostname. software-center-agent starts handing out the new hostname to new subscriptions when we are happy to go live. Old clients still hit the current setup on ppa.launchpad.net.

Makes testing easy, and can rollback with dns change.

Once it is proven bulletproof and all stakeholders are happy with implications, {private-,}ppa.launchpad.net could change in DNS.

Concerns:

Are there implications on the LP side?

DisklessArchives (last edited 2015-01-21 11:39:16 by cjwatson)