Overview
Launchpad entry: https://blueprints.edge.launchpad.net/soyuz/+spec/buildd-generalisation
Created: 2008-07-16 by JulianEdwards
Contributors: CelsoProvidelo, MuharemHrnjadovic
Depends on:
Overall Summary
Summary: Generalise the soyuz build system so that any job can be sent to the build farm, not just build jobs. Create a new BuilderRequest table to link the BuildQueue to specific job types, e.g. a new table LiveImageBuild that will contain LiveCD ISOs, or BranchBuild that will contain Source Package Branch builds
Goal/Deliverables: The initial deliverable will let us build source package branches in Launchpad.
We will know we have finished when we can build arbitrary jobs on the existing build farm, e.g. source package branches.
Release Note
Rationale
- Replace a very manual task with an automated one
- Better use of the infrastructure
- Results are stored in the librarian, which is visible and managed through Launchpad.
Use cases
James the Ubuntu developer has a source package branch in Launchpad. He currently has to download the branch and run bzr builddeb on it to create a Debian source package, and then manually upload it to start it building a binary. With the buildd generalisations he can request the whole operation take place in Launchpad.
- When Buzz Lightyear wishes to make a live CD image, he currently has to log into a spare machine, run a script manually, wait for it to complete and then copy the generated ISO to a safe place. With the buildd generalisations, he can request the generation via Launchpad's web UI and the generated ISO will be stored in the librarian for easy access via the web.
Assumptions
User Interface
Not strictly related to buildd-generalisation, but for source package branch building:
- The PPA page/source package page should have a link to the source package branch somewhere and record the revision that was built
- The source package branch page should have a PPA picker (initially, we can add distro pickers later) and a "build me" button
High-level implementation points
The current build farm manager is a twistd app and will dispatch jobs asynchronously. However, completed jobs are uploaded synchronously which can take a minute to complete. This makes scaling the build farm to the number of builders required impossible without fixing this issue first. This is quite complicated as it depends on process-upload.py being able to run in parallel; it currently can not as it makes all sorts of assumptions about the state of the archive (yes this is woolly, we need concrete issues to fix).
- Alternatively, de-couple the build upload from its post-processing in the buildd-manager.
- The build-start ETA calculations will have to be revised in light of the fact that we now have new job types. The new jobs need to have predictable run times based on previous runs, so they need to be categorised similar to the way we have different packages that generally take the same time to build from version to version.
- Generalised build jobs are require to always run on a virtual builder.
- The protocol between the manager and builder has to change to cope with more job types, or perhaps just one extra job type to start a "general" job and the manager passes a command to run.
- The current IBuilder has a _dispatchBuildToSlave() method which is very package-build specific. It should be refactored, generalised and the package-specific code moved out. We could have IPackageBuilder, IBranchBuilder etc classes that drive the generic IBuilder.
If we write the system to run generic jobs (should we?) then "builder" seems like the wrong term. -- jml
I would much rather keep the existing nomenclature for builders, at least for now, to reduce confusion and keep my sanity -- Julian
Implementation Plan
Maybe a good early step here is to take the existing implementation and refactor it so that it's in terms of Jobs? -- jml
I think this is a lot of work with little benefit. If you want to use an IJob interface to it, it would be better to add that as a layer on top of the existing implementation so it pushes jobs at the build farm. -- Julian
And in fact this is exactly what I'm going to do now. -- Julian
- Decide on a schema implementation and implement the changes along with the required code. At this point there will be no functional change but the foundations are set to add more job types.
- Agree on a chroot format with IS and what is acceptable in terms protocol to run jobs.
- Either: make process-upload able to run in parallel with other instances, or make the buildd manager able to dump the build in a queue and forget about it so that it doesn't have to wait for the upload and process the results.
- Refactor the builder.py file / IBuilder class so that it's generic. Move the package build specific code to IPackageBuilder (non-model class).
- Fix the builder protocol to support the generic job type (this is dependent the above point about chroots)
Add BranchBuildBehavior class to drive IBuilder with a branch build (implementing IBuildFarmJobBehavior)
- Add code to *asynchronously* deal with the results of a branch build. This will probably mean dumping the results in a queue and letting something else deal with it.
A diagram would help a lot here, I think -- jml
Code Changes
BuildQueue
Remove build column.
Remove these properties, and make call sites cope:
- archseries
- urgency
- archhintlist
- name
- version
- files
- builddependsindep
Delegate score() through IXXXJob classes so job-specific criteria is used to score.
BuildQueueSet
- calculateCandidates() - Move to IPackageBuildJob - each Job class will also need one of these.
- getForBuilds() - not sure, might be able to leave it, it's only used for the UI right now.
Build
- getEstimatedBuildStartTime() / _getHeadjobDelay() both assume only builds are in the queue. We need a new place to put these methods that work with all job types. IJob has start/end time columns so think about moving the build-specific timestamps to there.
- createBuildQueueEntry() - Should also create an IJob and an IPackageBuildJob.
- retryDepWaiting() calls buildqueue.score(), will need to call the right place when score() moves.
IBuilder refactoring
Create IBuilderPackage (created IBuildFarmJobBehavior and implemented it with BinaryPackageBuildBehavior).
- Move cachePrivateSourceOnSlave() (Done - now private method on BPBuildBehavior)
- Move checkCanBuildForDistroArchSeries() (This stayed on Builder as the builder itself has-a given arch).
- Move _verifyBuildRequest() (Done - now on IBuildFarmJobBehavior and implemented by BPBuildBehavior).
- Move _dispatchBuildToSlave() (or potentially split between them as other jobs might find some of it useful) (Done - now on IBFJB and implemented by BPBuildBehavior)
- startBuild() should do common jobs and then allow the specific job model class to do the rest; soyuz-specific stiff to move (Done - it's very simple now and calls methods which are implemented by the specific behavior).
- status() is job-specific and should do the same (Done: mixture of builder and behavior. If builder is ok, the behaviors implementation is called).
- Move getBuildRecords() (Not done - created bug 491330)
- Move slaveStatus() (or perhaps partially refactor to IBuilderPackage, not sure yet) (Done: mix of builder and behavior responsibility, Builder adds basic info and the behavior can update the returned dict with specific info).
- [_]findBuildCandidate() - NIGHTMARE. leave for now.
New model classes
- IBuildJobPackage
- IBuildJobPackageBranch
- IBuildJobPackageRecipe
- IBuildJobTranslation
Schema Changes
--- Variant One ---
We need to generalise builder jobs. In order to model different types of builder requests (while avoiding nullable foreign keys in the BuildQueue table) This schema is proposed:
Amend BuildQueue to add one new column:
builderrequest (FK) |
... |
In light of the fact that the BuildQueue table now contains rows which are not necessarily build jobs, it should be given a more suitable name like JobQueue or BuilderQueue.
Amend Build so that its 'id' column becomes a foreign key to the BuilderRequest parent table:
id (int) (FK to BuilderRequest (on DELETE CASCADE)) |
... |
New table BuilderRequest:
id (int) |
request_type (text) |
requested_by (text) |
date_requested (date) |
duration (interval) |
date_finished (date) |
New table LiveImageBuild:
id (int) (FK to BuilderRequest (on DELETE CASCADE)) |
archive (FK) |
distroarchseries (FK) |
description (text) |
image_file (FK to LFA) |
status (int) |
Index: btree(archive, distroarchseries)
The BuilderRequest table is a parent table. It thus delegates its key values to the child tables that contain data pertinent to specific builder requests. Please note that the deletion of a parent table row should result in the deletion of the corresponding child table row as well.
Jobs for builders are queued in the BuildQueue table as they currently are, except it will not have a FK directly pointing to a Build. Instead it has a FK to a BuilderRequest row. That row contains an ID and a type. The type tells which child table to join to, and the child table contains a foreign key pointing back to the BuilderRequest.
Advantages:
avoidance of nullable foreign keys in the JobQueue table
Disadvantages:
when deploying the new schema we need to create a BuilderRequest row for each Build row in existence.
in conjunction to the schema change the code in the Python domain needs to change to create BuilderRequests first and (Build|LiveImageBuild) instances second i.e. we would have to roll out schema changes and Python code changes at the same time
--- Variant two ---
Similar to Variant One, but:
the JobQueue table takes on the role of the parent table.
the BuilderRequest table is not needed.
the (Build|LiveImageBuild) tables share 'id' values with the JobQueue table without a foreign key constraint since the JobQueue rows will be deleted once a job completes.
The 'build' FK is removed from the table Build/Jobqueue and a 'job_type' column is added:
... |
job_type (text) |
... |
The Build and LiveImageBuild tables are amended to contain the following additional columns:
... |
requested_by (text) |
date_requested (date) |
duration (interval) |
date_finished (date) |
... |
Advantages:
avoidance of nullable foreign keys in the JobQueue table
No need to create gazillions of parent table rows for the Build rows in existence
Disadvantages:
- we would have to roll out schema changes and Python code changes at the same time
- after the schema change we would need to
- define a sequence (that starts with max(Build.id)+1) and that (Build|LiveImageBuild) instances (created from the point on) draw 'id' values from jointly. - change the Python code so that
(Build|LiveImageBuild) instances are created first
JobQueue instance are created subsequently with their 'id' values set to the value used for the corresponding (Build|LiveImageBuild) instance and with an appropriate job type value.
--- Variant three ---
We don't use any parent/child table techniques but add a nullable foreign key for each job type to the JobQueue table.
Advantages:
- this is a simple change
- schema changes and Python code changes can be rolled out independent of each other
Disadvantages:
- this is ugly and becomes even more so as the number of job types grows (we want to generalise the build daemons after all)
--- Variant four --- Another parent/child relationship, but this one lets us keep the current meme that BuildQueue rows are temporary.
New table BuildJob that represents the common job data for all jobs and has an enum "type" column for the type of job
BuildQueue gains a new FK column to BuildJob
BuildJob has child tables that represent the job-specific data, such as BuildJobBranch and BuildJobCDImage
The child tables' IDs are the same as the relevant parent BuildJob table's and are in fact a FK to BuildJob.
Advantages:
Keeps the current meme that BuildQueue rows are temporary.
- No nullable FKs for each job type
Disadvantages:
The current BuildQueue code/model would have to move to a new table BuildJobPackage or somesuch
Estimations
Schema changes & required code XXX
- Buildd changes to support new job type XXX
- slave-scanner changes to pick up the live CD requests, dispatch them, probe running jobs, collect results and upload results XXX
- UI work, UNKNOWN, need to finalise requirements. XXX
Migration
Include:
- data migration, if any
- redirects from old URLs to new ones, if any
how users will be pointed to the new way of doing things, if necessary. (If your change is big enough, consider using the rollout template.)
Unresolved issues
- What is the desired workflow for creating live CDs? Do we only want to only create images for snapshot archives?