Diff for "LEP/BuildFarmScalability"

Not logged in - Log In / Register

Differences between revisions 1 and 2
Revision 1 as of 2010-08-16 15:22:35
Size: 2747
Comment:
Revision 2 as of 2010-08-17 15:31:43
Size: 2797
Comment:
Deletions are marked like this. Additions are marked like this.
Line 70: Line 70:
 * We can actually process "daily builds" daily.

Build Farm Scalability

This LEP describes the constraints and what is an acceptable level of performance for the build farm.

As a package uploader
I want my build to be dispatched as soon as a builder is free and I am next in the queue
so that I don't wait unnecessarily for my build to be finished

As a person who pays for new hardware
I want to see builders always busy if there are jobs in the queue
so that I know I am getting the best performance I can for my money.

As a buildd admin
I want to add new build slaves without adversely affecting dispatch times to other builders
so that the build farm scales.

This LEP does not describe a new messaging system etc.,.

Rationale

This LEP is to guide development in the right direction such that we don't waste resources making changes that we don't really need.

It is being done now because the load on the build farm is increasing quite rapidly due to rebuilds etc., and the current manager does not scale and leaves slave resources wastefully idle for long periods.

It would bring value in terms of increased throughput of jobs on the build farm which would make PPA users, buildd-admins and the purse-carriers happier.

Stakeholders

  • PPA users
  • Ubuntu Team
  • IS (LaMont Jones)

  • Canonical Shareholders
  • Linaro Team

Constraints and Requirements

Must

  • When a builder becomes free, we must dispatch a queued build to it within 30 seconds.
  • It must be robust to failures. That is, failures dealing with one builder should not affect any other builder.
  • When a build is ready on a builder, it must be collected within 30 seconds of reaching the ready state.
  • When adding new builders, each builder must not degrade the overall responsiveness by more than half a second per builder.

XXX How realistic are these numbers? I pulled numbers put of the air because it's better than the current delay of 20-30 minutes. -- Julian

Nice to have

  • Even faster dispatch and collection, say sub 10 second
  • 10 millisecond response degradation for new builders
  • The ability to dynamically alter the queue positions of jobs based on:
    • job type
    • archive (PPA)

Must not

Subfeatures

None (yet).

Workflows

Success

How will we know when we are done?

  • When we meet the *Must* criteria above, or get acceptably close to them.
  • We can actually process "daily builds" daily.

How will we measure how well we have done?

  • Graphing response times, build farm throughput and examining the build manager's log file.

Thoughts?

LEP/BuildFarmScalability (last edited 2010-12-10 12:33:53 by jml)