Diff for "LEP/BuildFarmScalability"

Not logged in - Log In / Register

Differences between revisions 10 and 11
Revision 10 as of 2010-11-16 17:58:16
Size: 4354
Comment:
Revision 11 as of 2010-12-07 15:34:19
Size: 4652
Editor: jml
Comment:
Deletions are marked like this. Additions are marked like this.
Line 86: Line 86:
     * https://lpstats.canonical.com/graphs/BuilddLagPPASupportedArch/
     * https://lpstats.canonical.com/graphs/BuilddLagPrivatePPA/
     * https://lpstats.canonical.com/graphs/BuilddLagProductionSupportedArch/
     * https://lpstats.canonical.com/graphs/BuilddLagProductionUnsupportedArch/

Build Farm Scalability

This LEP describes the constraints and what is an acceptable level of performance for the build farm.

As a package uploader
I want my build to be dispatched as soon as a builder is free and I am next in the queue
so that I don't wait unnecessarily for my build to be finished

As a person who pays for new hardware
I want to see builders always busy if there are jobs in the queue
so that I know I am getting the best performance I can for my money.

As a buildd admin
I want to add new build slaves without adversely affecting dispatch times to other builders
so that the build farm scales.

This LEP does not describe a new messaging system etc.,.

Rationale

This LEP is to guide development in the right direction such that we don't waste resources making changes that we don't really need.

It is being done now because the load on the build farm is increasing quite rapidly due to rebuilds etc., and the current manager does not scale and leaves slave resources wastefully idle for long periods.

It would bring value in terms of increased throughput of jobs on the build farm which would make PPA users, buildd-admins and the purse-carriers happier.

Stakeholders

  • PPA users
  • Ubuntu Team
  • IS (LaMont Jones)

  • Canonical Shareholders
  • Linaro Team

Constraints and Requirements

Must

  • When a builder becomes free, we must dispatch a queued build to it within a maximum of 30 seconds.
  • Misbehaving jobs must not affect the rest of the build farm
  • Misbehaving builders must not affect the rest of the build farm
  • When a build is ready on a builder, it must be collected1 and passed on to the next stage within 30 seconds of reaching the ready state.

  • When adding new builders, each builder must not degrade the overall responsiveness by more than half a second per builder.
  • Design for a system with 200 builders.
  • Not starve low-scored builds when there are higher-scored builds in the queue (though low-scored builds may be performed at a lower rate)

XXX How realistic are these numbers? I pulled numbers put of the air because it's better than the current delay of 20-30 minutes. -- Julian

Nice to have

  • Even faster dispatch and collection, say sub 10 second
  • 10 millisecond response degradation for new builders
  • The ability to dynamically alter the queue positions of jobs based on:
    • job type
    • archive (PPA)

Must not

Subfeatures

None (yet).

Workflows

Success

How will we know when we are done?

Bugs for this feature: https://launchpad.net/launchpad-project/+bugs?field.tag=buildd-scalability

  • When we meet the *Must* criteria above, or get acceptably close to them.
  • We can actually process "daily builds" daily.

How will we measure how well we have done?

Thoughts?

  • This LEP is essential to the completion of LEP/SourcePackageRecipeBuilds -- jml

  • A nice upshot of some of these metrics is that it will make the impact of removing builders more obvious -- jml

    • Perhaps we should put the graphs in Launchpad itself, rather than lpstats? -- jml
  1. collected is a term specific to the build farm, and refers to the explicit step performed by the build master to fetch the results from the builder (1)

LEP/BuildFarmScalability (last edited 2010-12-10 12:33:53 by jml)