This page gives you a technical overview of Soyuz. |
Soyuz Technical Overview
Soyuz is a distribution package management system for Launchpad, encompassing the build system, package management and archive publishing. It allows users to upload packages, have them built on a variety of processor architectures and then published for others to download. (Recently, the build system was generalised and it is now also used to build source code recipes into source packages and process translation template imports.)
Whenever you upload a package to Ubuntu, or need build information for that package, or download a package from the archive, you are using Soyuz. PPAs are also built around Soyuz.
UPLOAD + BUILD + PUBLISH = SOYUZ
Workflow
This is the lifecycle of a typical package through Soyuz.
Details of txpkgupload |
|
How the upload processor and distroseries queues work |
|
The build farm, chroots and build processing |
|
Info about the publisher and related processes |
Other important concepts
Components and how they are used |
|
Pockets and how they are used |
|
The package copier, why and where it's used |
|
How syncing packages across derivatives works |
Database Overview
(If you want to edit this diagram, the source Dia file is here: SoyuzDatabase.dia)
Uploading
Uploading is done in several, discrete steps:
Uploading Stages |
||
txpkgupload Server |
lp:txpkgupload |
This is a Twisted SFTP and FTP service. It takes an upload and creates a directory containing the upload's content. | |
Upload Parsing |
scripts/process-upload.py |
Run via cron every 5 minutes. This takes the content in the directory created by txpkgupload and parses it as a package upload. Various things are validated such as the presence of a changes file, GPG key etc. process-upload.py is a small file itself, the real work is done in changesfile.py, customupload.py, ddtp_tarball.py, debian_installer.py, dist_upgrader.py, dscfile.py, nascentupload.py, nascentuploadfile.py, uploadpolicy.py and uploadprocessor.py. |
Vetting |
lib/lp/soyuz/scripts/queue.py |
This is a manual process. It allows a real person known as an Archive Admin to check that an uploaded package is valid. This will change its queue state from NEW to ACCEPTED once done. In certain circumstances process-upload.py will have set the status to ACCEPTED immediately, e.g. when the submitter/package is already trusted. Archive admins without shell access to the data centre can also use the web UI at http://launchpad.net/ubuntu/natty/+queue for example. |
Final acceptance |
scripts/process-accepted.py |
This is an hourly cron job script that takes PackageUpload rows with a status of ACCEPTED and creates corresponding SourcePackagePublishingHistory and BinaryPackagePublishingHistory rows (depending on the upload) in the PENDING state, and sets the PackageUpload status to DONE. This script will also take a custom upload, unpack its tar file and add resulting files to the archive. |
publish-distro.py will look for PENDING publishing history rows, take the corresponding files from the librarian and publish them in the archive (setting the publishing history to 'PUBLISHED').
Building
A build is handed off to one of a number of builders on separate machines, which will perform the build in a chroot on the appropriate processor architecture.
At the top level, building is controlled by a "buildd manager" daemon daemons/buildd-manager.tac which is a Twisted application. It is responsible for choosing the next build item (which are "queued" in the BuildQueue table), sending all the files to the builder and initiating the build. It also polls the builders to get a log tail for display on the build and builder pages. When the build is finished, it downloads the resulting files from the builder and throws them into a staging area for processing later by process-upload.py.
The builders for PPAs are all virtual machines that are ripped out and restarted before each build, since we run untrusted code in the builds. The buildd-manager is responsible for this operation, if it determines that the build is virtual, by ssh-ing and calling a script on the virtual machine's host.
Ubuntu Publishing
Ubuntu publishing is run from cron at intervals of 1 hour using the file cronscripts/publishing/cron.publish. This will do the following:
- Take files from the librarian and put them in the archive's pool tree.
- Generate the archive's dist tree (the archive's indexes) using apt-ftparchive.
Expire old packages by giving them a status of SUPERSEDED. (domination.py)
Looks for packages no longer referenced by any archive index, delete the files on disk and set the package status to REMOVED. (deathrow.py)
PPA Publishing
This is largely similar to Ubuntu except:
cronscripts/publishing/cron.ppa is run every 5 minutes.
apt-ftparchive is not used, native code is in the publisher to do it.
Glossary
Terms used in Soyuz that might confuse outsiders!
Term |
Description |
BPR |
BinaryPackageRelease. A database table that stores details of a binary package at a particular version. |
BPB |
BinaryPackageBuild. Stores the relation between a SourcePackageRelease, a PackageBuild and a DistroArchSeries |
BFJ |
BuildFarmJob. A table that stores the generic details of every job processed on the build farm. |
BPRF |
BinaryPackageReleaseFile. A database table that links a BinaryPackageRelease to all its files in the librarian. |
BPPH |
BinaryPackagePublishingHistory. See SPPH, except this is for binary packages. |
ChangesFile |
Part of a package upload, this file describes the upload (e.g. the file list, checksums, maintainer/changer names). |
Deathrow |
Process all files for a publication marked as SUPERSEDED or DELETED. If they are not used by another package they are deleted from the repository. |
Domination |
The act of modifying publishing records to a state of SUPERSEDED if more recent publications exist. |
ogre-model |
it's a concept that controls build-dependencies in a layered model, i.e. sources published in 'main' component can only fetch 'build-dependencies' from the 'main' component; sources published in 'universe' only have access to 'main & universe', and so on. |
p-a |
process-accepted.py - a script that takes accepted uploads and creates publishing records for them. |
p-u |
process-upload.py - a script that scans the files stored by txpkgupload and attempts to load them into the database. |
PU |
PackageUpload. A database table that stores details from the changes file for an upload. |
SPR |
SourcePackageRelease. A database table that holds details of a source package at a particular version. Also could be SourcePackageRecipe depending on the context! |
SPRF |
SourcePackageReleaseFile. A database table that links a SourcePackageRelease to all its files in the librarian. |
SPPH |
SourcePackagePublishingHistory. A database table that records current and historical publishing status for SourcePackageRelease. The non-secure variant is just a view on the secure table, where embargoed is not false. |
txpkgupload |
An (S)FTP server for package uploads. |