QA Obsolete code

Extra 1 N N 1 N 1 N ... 1 N N ...

Server-side tagger tools audit

class="anchor" id="line-3">

This is a performance audit of the QA tagger tools, focusing on tag_bugs_to_qa.py.

class="anchor" id="line-8">

Bug 274074 is fixed released in wadllib 1.5.4, so you shouldn't need length() anymore.

You can now set a bugtask's assignee/status/milestone directly (and then call lp_save) rather than having to invoke transitionToAssignee/transitionToStatus/transitionToMilestone.

HTTP requests class="anchor" id="line-17">

Here's some code from find_bugtask_and_assignee(). I've annotated the lines of code that make HTTP requests. "1" means the code makes one HTTP request. "N" means the code is inside a loop and makes one HTTP request for every loop element.

... # Checking if there's a task assigned to the committer, # and changing its status. found_it = False for bug_task in fixed_bug.bug_tasks: if not bug_task.assignee: continue if bug_task.assignee.self_link == assignee.self_link: found_it = True logger.info("Committer is assigned to the" " bugtask.") break if not found_it and project: for bug_task in fixed_bug.bug_tasks: if (bug_task.target.self_link == project.self_link): found_it = True logger.info( "Found bugtask of project %s." % project.name) break if not found_it: for bug_task in fixed_bug.bug_tasks: if (bug_task.target.self_link in self.lp_projects_links): found_it = True logger.info("Found bugtask of lp-project" " group: %s." % bug_task.target.self_link) break if not found_it: logger.error("This bug isn't ours! Not marking bug %s" " as Fix committed." % fixed_bug.id) return None, False class="anchor" id="line-56">

In the worst case, you are making 4N+3 HTTP requests, where N is the number of bug tasks the bug has. Three of those requests are for collections, which are more expensive to process than requests for individual objects (though in this case the time is probably dwarfed by the HTTP latency).

Why so many requests? Well, look at the first two instances of 'N': if bug_task.assignee is not None, it gets fetched twice, and since that happens inside a loop, you get 2N.

Basically, every time you dereference a property of a launchpadlib object and get another launchpadlib object (or a collection), launchpadlib will make an HTTP request for that object. The key is to assigne these objects to variables, unless you really need the most up-to-date data and are worried about someone else changing the data while you work.

Similarly, every time you run this code:

            for bug_task in fixed_bug.bug_tasks:

launchpadlib makes a brand new request for the bug's bugtasks. This gives you the three instances of "1". Unless you're worried about bug tasks being added in the middle of your processing, you should assign list(fixed_bug.bug_tasks) to a variable and iterate over that list.

Putting all this together, I would rewrite find_bugtask_and_assignee() like this (at least for now):

... # Checking if there's a task assigned to the committer, # and changing its status. found_it = False bug_tasks = list(fixed_bug.bug_tasks) for bug_task in bug_tasks: bug_task_assignee = bug_task.assignee if not bug_task_assignee: continue if bug_task_assignee.self_link == assignee.self_link: found_it = True logger.info("Committer is assigned to the bugtask.") break > if not found_it and project: targets = [] for bug_task in bug_tasks: target = bug_task.target if (target.self_link == project.self_link): found_it = True logger.info( "Found bugtask of project %s." % project.name) break targets.append((bug_task, target)) > if not found_it: for bug_task, target in targets: if (target.self_link in self.lp_projects_links): found_it = True logger.info("Found bugtask of lp-project" " group: %s." % bug_task.target.self_link) break class="anchor" id="line-123">

A rewrite that stores intermediary values in variables makes only 2N+1 requests instead of 4N+3. This required no changes to launchpadlib or the web service, and it only made the code a little more complicated. Note that the final loop doesn't make any HTTP requests at all, because it's working entirely on data that was stored locally in the previous loop.

Similarly, this code makes two requests for bug_task.milestone in the worst case:

        if ((bug_task.milestone is not None and self.milestone != bug_task.milestone.name) or bug_task.milestone is None):

This should do the same thing with one HTTP request:

        milestone = bug_task.milestone if milestone is None or self.milestone != milestone.name:

Something very similar happens in commit.py, in _filter_only_lp_bugs.

lessons class="anchor" id="line-151">

1. get_staging_revision() and get_edge_revision() screen-scrape the revision number. That's something we can publish on the service root (/1.0) or at the forthcoming multiversioned service root (/). We already use the revision number implicitly to force clients to fetch the WADL again when Launchpad is upgraded; I think it makes sense to also publish it explicitly.

2. This script could benefit from representation expansion. For the portion of find_bugtask_and_assignee() I looked at in detail, this would mean making one big HTTP request that got a bug's bug tasks along with each task's assignee and target. This way you could make one HTTP request instead of 2N+1.

3. When writing Python code it's natural to dereference objects very casually. That's why we wrote launchpadlib the way we did. But it leads to inefficient code that needs to be cleaned up with lots of variable assignments. But we could write launchpadlib to cache dereferenced objects rather than fetching them every time. This would allow developers to write code like find_bugtask_and_assignee() without worrying so much about performance.

The reason we didn't do this in the first place is that if launchpadlib caches dereferenced objects in memory, we have no way of knowing when to expire the cache. If we knew when the cache should be expired, Launchpad could serve the appropriate Cache-Control headers, and launchpadlib could cache those resources on disk, allowing them to persist (in some cases) even across script invocations.

Cache expiration doesn't seem like a problem when you're dealing with a quick-running script like tag_bugs_to_qa.py, but some launchpadlib scripts run for hours or are continuously running. If launchpadlib keeps bug.bug_tasks cached in memory forever, those scripts will never know that new data is available. It might make sense to implement in-memory object caching with a configurable expiration time.

Alternatively, we could implement a feature that allowed explicit markers for caches. For example, code might look something like this:

    lp.cache_start() ...do stuff that reuses cached values... lp.cache_stop()

Within the start and stop, values are cached. The "stop" flushes the cache, and outside of the start-stop markers, nothing is cached (that is, you get the current behavior).

Foundations/Webservice/ClientSidePerformance/QATaggerPerformanceAudit (last edited 2010-06-15 16:23:03 by gary)

launchpad development

Foundations/Webservice/ClientSidePerformance/QATaggerPerformanceAudit

QA Obsolete code

Server-side tagger tools audit