Soyuz/TechnicalDetails

Not logged in - Log In / Register

This page gives you a technical overview of Soyuz.

Soyuz Technical Overview

Soyuz is a distribution package management system for Launchpad, encompassing the build system, package management and archive publishing. It allows users to upload packages, have them built on a variety of processor architectures and then published for others to download. (Recently, the build system was generalised and it is now also used to build source code recipes into source packages and process translation template imports.)

Whenever you upload a package to Ubuntu, or need build information for that package, or download a package from the archive, you are using Soyuz. PPAs are also built around Soyuz.

UPLOAD + BUILD + PUBLISH = SOYUZ 

Workflow

This is the lifecycle of a typical package through Soyuz.

Soyuz/TechnicalDetails/Uploading

Details of txpkgupload

Soyuz/TechnicalDetails/UploadProcessor

How the upload processor and distroseries queues work

Soyuz/TechnicalDetails/Building

The build farm, chroots and build processing

Soyuz/TechnicalDetails/Publishing

Info about the publisher and related processes

Other important concepts

Soyuz/TechnicalDetails/Components

Components and how they are used

Soyuz/TechnicalDetails/Pockets

Pockets and how they are used

Soyuz/TechnicalDetails/PackageCopier

The package copier, why and where it's used

Soyuz/TechnicalDetails/DerivativeDistroSyncing

How syncing packages across derivatives works

Database Overview

Soyuz Model

(If you want to edit this diagram, the source Dia file is here: SoyuzDatabase.dia)

Uploading

Uploading is done in several, discrete steps:

Uploading Stages

txpkgupload Server

lp:txpkgupload

This is a Twisted SFTP and FTP service. It takes an upload and creates a directory containing the upload's content. |

Upload Parsing

scripts/process-upload.py

Run via cron every 5 minutes. This takes the content in the directory created by txpkgupload and parses it as a package upload. Various things are validated such as the presence of a changes file, GPG key etc. process-upload.py is a small file itself, the real work is done in changesfile.py, customupload.py, ddtp_tarball.py, debian_installer.py, dist_upgrader.py, dscfile.py, nascentupload.py, nascentuploadfile.py, uploadpolicy.py and uploadprocessor.py.
This stage creates some database entries: pairs of PackageUpload and one of PackageUpload{Build,Custom,Binary} depending on the type of upload (For a PPA upload, the PackageUpload is automatically in the 'ACCEPTED' state, otherwise New->Accepted->Done). Also created are SourcePackageRelease, BinaryPackageRelease, SourcePackageReleaseFile, BinaryPackageReleaseFile, SourcePackageReleaseName and BinaryPackageReleaseName. Which ones that are created depends on the type of package that was uploaded and what was in it (sources and/or binaries).
Any uploaded files are added to the librarian.

Vetting

lib/lp/soyuz/scripts/queue.py

This is a manual process. It allows a real person known as an Archive Admin to check that an uploaded package is valid. This will change its queue state from NEW to ACCEPTED once done. In certain circumstances process-upload.py will have set the status to ACCEPTED immediately, e.g. when the submitter/package is already trusted. Archive admins without shell access to the data centre can also use the web UI at http://launchpad.net/ubuntu/natty/+queue for example.

Final acceptance

scripts/process-accepted.py

This is an hourly cron job script that takes PackageUpload rows with a status of ACCEPTED and creates corresponding SourcePackagePublishingHistory and BinaryPackagePublishingHistory rows (depending on the upload) in the PENDING state, and sets the PackageUpload status to DONE. This script will also take a custom upload, unpack its tar file and add resulting files to the archive.

publish-distro.py will look for PENDING publishing history rows, take the corresponding files from the librarian and publish them in the archive (setting the publishing history to 'PUBLISHED').

Building

A build is handed off to one of a number of builders on separate machines, which will perform the build in a chroot on the appropriate processor architecture.

At the top level, building is controlled by a "buildd manager" daemon daemons/buildd-manager.tac which is a Twisted application. It is responsible for choosing the next build item (which are "queued" in the BuildQueue table), sending all the files to the builder and initiating the build. It also polls the builders to get a log tail for display on the build and builder pages. When the build is finished, it downloads the resulting files from the builder and throws them into a staging area for processing later by process-upload.py.

The builders for PPAs are all virtual machines that are ripped out and restarted before each build, since we run untrusted code in the builds. The buildd-manager is responsible for this operation, if it determines that the build is virtual, by ssh-ing and calling a script on the virtual machine's host.

Ubuntu Publishing

Ubuntu publishing is run from cron at intervals of 1 hour using the file cronscripts/publishing/cron.publish. This will do the following:

PPA Publishing

This is largely similar to Ubuntu except:

Glossary

Terms used in Soyuz that might confuse outsiders!

Term

Description

BPR

BinaryPackageRelease. A database table that stores details of a binary package at a particular version.

BPB

BinaryPackageBuild. Stores the relation between a SourcePackageRelease, a PackageBuild and a DistroArchSeries

BFJ

BuildFarmJob. A table that stores the generic details of every job processed on the build farm.

BPRF

BinaryPackageReleaseFile. A database table that links a BinaryPackageRelease to all its files in the librarian.

BPPH

BinaryPackagePublishingHistory. See SPPH, except this is for binary packages.

ChangesFile

Part of a package upload, this file describes the upload (e.g. the file list, checksums, maintainer/changer names).

Deathrow

Process all files for a publication marked as SUPERSEDED or DELETED. If they are not used by another package they are deleted from the repository.

Domination

The act of modifying publishing records to a state of SUPERSEDED if more recent publications exist.

ogre-model

it's a concept that controls build-dependencies in a layered model, i.e. sources published in 'main' component can only fetch 'build-dependencies' from the 'main' component; sources published in 'universe' only have access to 'main & universe', and so on.

p-a

process-accepted.py - a script that takes accepted uploads and creates publishing records for them.

p-u

process-upload.py - a script that scans the files stored by txpkgupload and attempts to load them into the database.

PU

PackageUpload. A database table that stores details from the changes file for an upload.

SPR

SourcePackageRelease. A database table that holds details of a source package at a particular version. Also could be SourcePackageRecipe depending on the context!

SPRF

SourcePackageReleaseFile. A database table that links a SourcePackageRelease to all its files in the librarian.

SPPH

SourcePackagePublishingHistory. A database table that records current and historical publishing status for SourcePackageRelease. The non-secure variant is just a view on the secure table, where embargoed is not false.

txpkgupload

An (S)FTP server for package uploads.

Soyuz/TechnicalDetails (last edited 2015-01-21 11:36:30 by cjwatson)