Soyuz/JobDispatchTimeEstimation

Not logged in - Log In / Register

Revision 10 as of 2010-03-14 10:15:30

Clear message

Dispatch time estimation for build farm jobs

Introduction

Due to technical limitations (the art of writing psychic software is not very well established yet :-) job dispatch times are estimations only.

For the purpose of this description a 'platform' is considered to be the combination of a

Build farm jobs can either target a specific platform (e.g. binary builds) or be platform-independent (e.g. "generate a source package from a recipe" builds). The former can only make use of build machines (or "builders" in Soyuz parlance) of the given platform while platform-independent jobs may run on any available builder.

Jobs with an unspecified virtualization setting will be dispatched to virtual builders only.

Builders can -- roughly speaking -- either be idle or building. For any job running on a particular builder its estimated duration and its start time are available allowing us to estimate the job's remaining execution time.

By the way, did I already mention that job dispatch times are an estimation only?

Problem definition

Given:

Wanted: the estimated dispatch time for a specific job (the job of interest (aka JOI)) in the pending queue.

Solution overview

There are two questions we need to answer in order to come up with a dispatch time estimation for the job of interest (JOI):

  1. how long will the jobs ahead of the JOI (in the pending queue) take to run? This is the predecessor lead time (PLT).

  2. how long will it take until the job at the head of the pending queue is dispatched to a builder? This is the time to next builder (TNB).

The dispatch time estimation for the JOI is then calculated as follows: now() + PLT + TNB

Time to next builder

The time to next builder (TNB) is estimated for the head job which is the job at the head of the pending queue.

Given the head job's platform (processor: P, virtualization setting: V) The TNB is taken to be the minimum remaining job execution time across all builders providing (P,V).

Example: the head job's platform is (i386,true) and we have the following builders:

builder

estimated duration

job start time

Africa

10 minutes

-2 minutes

Americas

12 minutes

-4 minutes

Antarctica

8 minutes

-2 minutes

Australia

22 minutes

-8 minutes

The resulting TNB would be 6 minutes since the Antarctica builder is estimated to finish its job in that time.

Sometimes jobs overdraw their estimated duration i.e. they run longer than estimated. In such cases we assume that the corresponding builder will finish in 2 minutes. This is somewhat of a .. *cough* .. educated guess but has worked reasonably well in the past.

Predecessor lead time

Overview

The predecessor set is comprised by jobs that fulfil the following criteria: they are

The predecessor lead time for the JOI is then estimated as follows:

  1. sum up the estimated duration of the jobs in the predecessor set. This results in a lead time total (LTT).

  2. divide the LTT by the smaller of these two values: the size of the

    1. predecessor set

    2. pool of builders available to run the jobs in the predecessor set

Example A: 10 builders can run the JOI and the predecessor set is comprised of jobs with estimated durations of 2, 4 and 6 minutes respectively. This results in a predecessor lead time of 4 minutes.

The idea here being that although we have 10 builders only 3 of these can be used to run the jobs in the predecessor set.

Example B: 3 builders can run the JOI and the predecessor set is comprised of jobs with estimated durations of 2, 3, 4 and 6 minutes respectively. This results in a predecessor lead time of 5 minutes.

Build farm generalization

Before the build farm generalization we only had one job type (binary builds) and could hence make the assumption that all jobs in the predecessor set share the same builder pool.

With the introduction of processor-independent build farm jobs that assumption ceased to be true.

Example C:

Builders:

builder pool size

processor

virtual

4

i386

false

3

i386

true

2

amd64

false

1

hppa

false

Jobs:

Job

score

processor

virtual

J1

99

i386

true

J2

98

i386

true

J3

97

null

null

J4

96

i386

true

J5

95

hppa

true

J6

94

i386

true