Differences between revisions 1 and 7 (spanning 6 versions)

Celery Jobs

Celery Jobs are used in production, however the mechanism for auto-fallback between 'fast' and 'slow' lanes is currently broken. Long running jobs may hit the 300 second timeout and fail. All active job types have been migrated to support running under Celery.

Data flow

The database remains the authoritative source of information about a job. Celery is merely a mechanism for running that job. Each attempt to run a job will use its own Celery task. As that task starts, runs, and terminates the job, the status and lease fields in the database are updated as normal.

This may change in the future, because Celery's Task class and Launchpad's Job class are very similar. They could merge, or one could replace the other.

Additional Requirements

Job factory methods should call Job.celeryRunOnCommit(). This schedules the job to run if/when the current transaction is committed. It only affects jobs whose class names are in the jobs.celery.enabled_classes feature flag.

To ensure that jobs cannot fall through the cracks, job sources for jobs that are (or are expected to be) run via celery should be added to lp.services.job.celeryjob.run_missing_ready()

Jobs must provide a 'config' member, which is meant to be a config section, e.g. lp.services.config.config.branchscanner. This member must have a dbuser attribute. The dbuser for each job type should be unique, to aid debugging production problems.

Database job classes such as BranchJob must provide a 'makeDerived' member which returns an IRunnableJob implementation. If there are no derived classes, as with POFileStatsJob, it should simply return itself.

Base classes for specific IRunnableJob classes based on an enum, such as BranchJobDerived, should use EnumeratedSubclass as their metaclass. This will cause each of their subclasses to be registered with the base class when that subclass is declared. EnumeratedSubclass automatically provides the makeSubclass method, which returns an instance of the appropriate subclass for a given database class. This allows makeDerived to be implemented as a one-liner:

    def makeDerived(self):
        return BranchJobDerived.makeSubclass(self)

It is not clear that FooDerived classes are needed, because Storm appears to permit one table to be associated with many classes. Further testing is required.

Job classes should provide a 'context' member which is a database class.

Jobs which cannot tolerate parallelism should use lp.services.database.locking.try_advisory_lock to avoid it.

Testing

Jobs should have an integration test to show that they run under Celery correctly.

Here is an example:

class TestViaCelery(TestCaseWithFactory):

    layer = CeleryJobLayer

    def test_DerivedDistroseriesDifferenceJob(self):
        self.useFixture(FeatureFixture({
            FEATURE_FLAG_ENABLE_MODULE: u'on',
            'jobs.celery.enabled_classes': 'DistroSeriesDifferenceJob',
            }))
        dsp = self.factory.makeDistroSeriesParent()
        package = self.factory.makeSourcePackageName()
        with block_on_job():
            job = create_job(dsp.derived_series, package, dsp.parent_series)
            transaction.commit()
        self.assertEqual(JobStatus.COMPLETED, job.status)

The CeleryJobLayer ensures a celeryd is running, to process the job. Configuring the feature flag jobs.celery.enabled_classes='DistroSeriesDifferenceJob' ensures that create_job will request the job to run under celery (as a commit hook). transaction.commit() causes the commit hook to fire, requesting the job to run via celery. lp.services.job.tests.block_on_job waits until the job that was requested to run via celery in its scope has completed.

Running celery in development

./utilities/manage-celery-workers.sh start will start celery with the specified concurrency. There is a set of workers per queue, multiple queues per worker is disallowed. Logs are in /var/tmp/celery_<queue_name>.log. You will also need to enable the job under the feature flag jobs.celery.enabled_classes.

QA

celeryd is running on gandwana, as staging. To test your code, it must land in the db-stable branch, and you can enable the job using the jobs.celery.enabled_classes feature flag. Dogfood (labbu) runs celery in development configuration using start-stop-daemon with the same concurrency configuration as production. All queues are consumed.

Production

celeryd is running on ackee. There is at least 1 worker for all queues listed in the configuration.

-  ⇤ ← Revision 1 as of 2012-05-02 21:25:59 → 
  Size: 1443
  Editor: abentley
  Comment:
+   ← Revision 7 as of 2021-05-07 14:08:15 → ⇥
  Size: 4794
  Editor: twom
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 2:
-Celery Jobs are not yet ready for production, because fast-lane/slow-lane work is not complete.

However, all active job types have been migrated to support running under Celery.
+Celery Jobs are used in production, however the mechanism for auto-fallback between 'fast' and 'slow' lanes is currently broken. Long running jobs may hit the 300 second timeout and fail.
All active job types have been migrated to support running under Celery.
-Line 7:
+Line 6:
-The database remains the authoritative source of information about a job.  Celery is merely a mechanism for running that job.  Each attempt to run a job will use its own Celery task.
+The database remains the authoritative source of information about a job.  Celery is merely a mechanism for running that job.  Each attempt to run a job will use its own Celery task.  As that task starts, runs, and terminates the job, the status and lease fields in the database are updated as normal.
-Line 9:
+Line 8:
-This may change in the future, because there are many similarities between Celery's Task class and Launchpad's Job class.  They could merge, or one could replace the other.
+This may change in the future, because Celery's Task class and Launchpad's Job class are very similar.  They could merge, or one could replace the other.
-Line 13:
+Line 12:
+To ensure that jobs cannot fall through the cracks, job sources for jobs that are (or are expected to be) run via celery should be added to {{{lp.services.job.celeryjob.run_missing_ready()}}}
-Line 18:
+Line 19:
+Base classes for specific IRunnableJob classes based on an enum, such as {{{BranchJobDerived}}}, should use {{{EnumeratedSubclass}}} as their metaclass.   This will cause each of their subclasses to be registered with the base class when that subclass is declared.  {{{EnumeratedSubclass}}} automatically provides the {{{makeSubclass}}} method, which returns an instance of the appropriate subclass for a given database class.  This allows {{{makeDerived}}} to be implemented as a one-liner:
{{{
    def makeDerived(self):
        return BranchJobDerived.makeSubclass(self)
}}}

''It is not clear that {{{FooDerived}}} classes are needed, because Storm appears to permit one table to be associated with many classes.  Further testing is required.''
-Line 19:
+Line 29:
+Jobs which cannot tolerate parallelism should use lp.services.database.locking.try_advisory_lock to avoid it.

== Testing ==
Jobs should have an integration test to show that they run under Celery correctly.

Here is an example:
{{{
class TestViaCelery(TestCaseWithFactory):

    layer = CeleryJobLayer

    def test_DerivedDistroseriesDifferenceJob(self):
        self.useFixture(FeatureFixture({
            FEATURE_FLAG_ENABLE_MODULE: u'on',
            'jobs.celery.enabled_classes': 'DistroSeriesDifferenceJob',
            }))
        dsp = self.factory.makeDistroSeriesParent()
        package = self.factory.makeSourcePackageName()
        with block_on_job():
            job = create_job(dsp.derived_series, package, dsp.parent_series)
            transaction.commit()
        self.assertEqual(JobStatus.COMPLETED, job.status)
}}}

The {{{CeleryJobLayer}}} ensures a {{{celeryd}}} is running, to process the job.
Configuring the feature flag {{{jobs.celery.enabled_classes='DistroSeriesDifferenceJob'}}} ensures that {{{create_job}}} will request the job to run under celery (as a commit hook).
{{{transaction.commit()}}} causes the commit hook to fire, requesting the job to run via celery.
{{{lp.services.job.tests.block_on_job}}} waits until the job that was requested to run via celery in its scope has completed.

== Running celery in development ==

{{{./utilities/manage-celery-workers.sh start}}} will start celery with the specified concurrency. There is a set of workers per queue, multiple queues per worker is disallowed. Logs are in {{{/var/tmp/celery_<queue_name>.log}}}. You will also need to enable the job under the feature flag {{{jobs.celery.enabled_classes}}}.

== QA ==
celeryd is running on gandwana, as staging.  To test your code, it must land in the db-stable branch, and you can enable the job using the {{{jobs.celery.enabled_classes}}} feature flag.  Dogfood (labbu) runs celery in development configuration using {{{start-stop-daemon}}} with the same concurrency configuration as production. All queues are consumed.

== Production ==
celeryd is running on ackee. There is at least 1 worker for all queues listed in the configuration.

launchpad development

Diff for "CeleryJobs"