Differences between revisions 1 and 2

Build Farm Scalability

This LEP describes the constraints and what is an acceptable level of performance for the build farm.

As a package uploader
I want my build to be dispatched as soon as a builder is free and I am next in the queue
so that I don't wait unnecessarily for my build to be finished

As a person who pays for new hardware
I want to see builders always busy if there are jobs in the queue
so that I know I am getting the best performance I can for my money.

As a buildd admin
I want to add new build slaves without adversely affecting dispatch times to other builders
so that the build farm scales.

This LEP does not describe a new messaging system etc.,.

Rationale

This LEP is to guide development in the right direction such that we don't waste resources making changes that we don't really need.

It is being done now because the load on the build farm is increasing quite rapidly due to rebuilds etc., and the current manager does not scale and leaves slave resources wastefully idle for long periods.

It would bring value in terms of increased throughput of jobs on the build farm which would make PPA users, buildd-admins and the purse-carriers happier.

Stakeholders

PPA users
Ubuntu Team
IS (LaMont Jones)
Canonical Shareholders
Linaro Team

Constraints and Requirements

Must

When a builder becomes free, we must dispatch a queued build to it within 30 seconds.
It must be robust to failures. That is, failures dealing with one builder should not affect any other builder.
When a build is ready on a builder, it must be collected within 30 seconds of reaching the ready state.
When adding new builders, each builder must not degrade the overall responsiveness by more than half a second per builder.

XXX How realistic are these numbers? I pulled numbers put of the air because it's better than the current delay of 20-30 minutes. -- Julian

Nice to have

Even faster dispatch and collection, say sub 10 second
10 millisecond response degradation for new builders
The ability to dynamically alter the queue positions of jobs based on:
- job type
- archive (PPA)

Must not

Subfeatures

None (yet).

Workflows

Success

How will we know when we are done?

When we meet the *Must* criteria above, or get acceptably close to them.
We can actually process "daily builds" daily.

How will we measure how well we have done?

Graphing response times, build farm throughput and examining the build manager's log file.

-  ⇤ ← Revision 1 as of 2010-08-16 15:22:35 → 
  Size: 2747
  Editor: julian-edwards
  Comment:
+   ← Revision 2 as of 2010-08-17 15:31:43 → ⇥
  Size: 2797
  Editor: julian-edwards
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 70:
+ * We can actually process "daily builds" daily.

launchpad development

Build Farm Scalability

Rationale

Stakeholders

Constraints and Requirements

Must

Nice to have

Must not

Subfeatures

Workflows

Success

How will we know when we are done?

How will we measure how well we have done?

Thoughts?

launchpad development

Diff for "LEP/BuildFarmScalability"

Build Farm Scalability

Rationale

Stakeholders

Constraints and Requirements

Must

Nice to have

Must not

Subfeatures

Workflows

Success

How will we know when we are done?

How will we measure how well we have done?

Thoughts?