Aaron's Todo
needs abel workers: one-time Launchpad init
- per job-type init
- Possibly not needed, because not expensive.
post landing Can be implemented in Job.run
- Resource limits
post landing Maybe per-celeryd, since available resources doesn't change...
- Oops if memory limit exceeded
- Perhaps needs to be NIHed-- don't accept jobs while too much memory is in use.
- Fast lane/slow lane
- Slow lane time_limit
Resources
- Main machine is ackee
- jjo suggests we use 1.5-2 cores and 1.5 G on ackee
- therefore, probably one worker process each for fast and slow lanes
- loganberry is also available, but heavily loaded
Open Questions
Currently, each job type has its own config. Do we retain that? If so, how do we associate jobs with their configs?
How do we support nodowntime/fastdowntime upgrades?
- Send SIGTERM to current celeryds and start new ones?
- Less coding
- more chance of exceeding resource limits as new workers take jobs while old workers are still working.
- On message or signal, workers stop accepting jobs, reschedule any running jobs, and exit?
- More coding
- No chance of exceeding resource limits
- Delays slow-running jobs further