Combine the puller and scanner
https://bugs.edge.launchpad.net/launchpad-bazaar/+bug/280578
Why
- Simplify
- Avoid double handling
- Reduce latency
Gunk
- What to call it?
- synchronizing
- publishing
- bazaar2db
- branchpopulater
- on the UI vs internally (operationally vs code)
- Error handling / OOPSes?
- Just call mirrorFailed for scanner errors.
- Traceback will make errors from the scanner part obviously distinct from
- the puller part.
- How do we talk to the database?
- puller uses xmlrpc
- scanner uses direct database access -- and transfers large amounts of data, would likely be painful and inefficient to serialize it for eg. XMLRPC
- if we keep as-is, maximum of 12 database connections to the MASTER db.
- Separate packages still?
- To start with, yes.
- Before combining
Tweak the puller to parametrize the number of parallel workers by branch type. https://bugs.edge.launchpad.net/launchpad-bazaar/+bug/153779
- Talk to spm / mthaddon / stub see if it's ok to have more connections.
- Combining process
- Branch to call the scanner from the puller worker
- Should update the xmlrpc puller methods to update scanner fields
- Any testing tweaks
- Combine integration tests
- Make the worker take a 'scanner' object so we don't have to make puller tests use the database.
Make another branch that removes last_scanned & last_scanned_id "calls".
- Branch to call the scanner from the puller worker
- Operational issues
- Stop running the scanner cronjob
- Kill the scanner OOPS report / combine with puller.
- After combining
- Kill IBranchScanner.
- Delete Branch.last_scanned and Branch.last_scanned_id from the database.
- Code re-organization: combine the packages.
- Orthogonal
- Extract the puller / scanner methods on IBranch to a separate *interface*.
- De-ick the scanner
Related but out of scope
- Calling puller/scanner from codehosting
- Splitting out puller / scanner columns to other table
- Changing puller to job-based system
- Making puller more like code imports (same as above, really)