Bug / apport BLOB processing
The problem
- Apport uploads can be big, really big. Some of them are up to 100MB in size (though I haven't done a proper examination of this, so take this with a pinch of salt).
We currently parse the upload on a line-by-line basis (braking as appropriate) to get the headers and deal with attachments and so on. See FileBugData.parse() in lib/lp/bugs/browser/bugtarget.py:165.
Big uploads take a long time to process, so we end up with timeouts. Note: the timeouts rarely happen during the parsing process. Instead, they tend to happen towards the end of the request as the page is being rendered, which is a bit of a red herring. The reason for the timeouts is that the data parsing drags out for so long.
- Timeouts are a Bad Thing and stop bugs being filed. This is particularly problematic during the testing phases for a new version of Ubuntu.
The solution
... this version of the parser has already been Bjornified and is about a tillenion times faster than it was previously. I guess we're hitting a new limit now. -- allenap
- We need to move the +filebug BLOB parsing code out of +filebug because:
- BLOB parsing doesn't really belong in the request anyway
- It takes too long
- Even if we improve it so that it no longer times out (if that's possible) there will eventually come a day when the problem rears its head again.
We now have a working Jobs system for Bugs (see lib/lp/bugs/interfaces/bugjob.py) and we can use that to do our BLOB processing.
Requirements for a solution
- Parsing must not happen in the request.
- There must be a way that the +filebug page can query to see whether its extra data has been processed yet.
- The user must be informed that the processing is hapening.
- The user shouldn't be kept waiting too long (obviously this depends upon how long processing is going to take, but it does mean that our cronjob to run the Jobs will have to be at very short intervals).
- Once processing has completed - and not before - the user must be allowed to continue with the +filebug process.
Nice-to-haves for a solution
- It would be nice if Apport could query to see whether data has been parsed yet so that it could delay redirecting the user to Launchpad (which means we don't have to worry too much about having to deal with issues of polling in LP pages).
- Ideally, we'll make this change without having to patch the database. This will allow us to a) roll the change out to edge during the 10.02 cycle and b) cherrypick it to production.
- A progress bar for the processing would be nice (but impractical in the first instance).