Base version determination for source package releases
As part of the derivation work we need to be able to automatically generate package diffs between the current version of a source in each series and their most recent common ancestor. The current versions are easily discoverable, but the common ancestor must be determined manually.
Debian changelogs (found as debian/changelog in the extracted source package) are a reliable record of the ancestry of a source package release: a new entry must be created for an upload to be accepted, and package merge tools normally handle these automatically, leaving little chance of history corruption. By comparing the version sequence in the changelogs of both packages, we can reliably determine the latest common ancestor.
python-debian makes changelog parsing trivial, resulting in a simple code snippet to determine the version of the latest common ancestor given two changelogs:
my_ancestry = set(debian.changelog.Changelog(my_changelog).get_versions()) parent_ancestry = set(debian.changelog.Changelog(parent_changelog).get_versions()) intersection = my_ancestry.intersection(parent_ancestry) base_version = max(intersection) if len(intersection) > 0 else None
While mawson can still perform several of these per second, the two librarian reads cause this to require a non-trivial amount of time. This realisation resulted in exploration of methods which don't require direct parsing of the changelog.
Issues arise if the same version is used for two different sources, in different distributions or archives. This could be mitigated by also comparing the author or timestamp of the version's entry.
In an ideal world, every package would be maintained in a Bazaar branch. Calculating the base version could then be done in the usual Bazaar way.
Sadly, the world is not yet perfect. Not everything is maintained in Bazaar, and those that are do not have perfect use or history.
The most basic method is to use the derived distribution's version rules to determine the base. For example, the base Debian version for an Ubuntu source can normally be determined by stripping ubuntu* or build* from the end of the version. But this is unreliable: packagers are not perfect and often make slight version mistakes, some packages do not follow the rules, and considerable additional evil abounds in this area. It's also difficult to implement, as it requires that Launchpad know each distribution's version mangling rules.
All in all, this seems like a fragile and scary method.
An efficient method of ancestry determination is to examine the publication history in each series. Using only the database, this is much faster than the librarian access and string parsing involved in the changelog method.
A number of possibilities were considered, including those that stored an explicit parent column on SPPH, and those that calculated parents on the fly. The initial idea is that SPPHs would know their immediate parent within their series and archive. We could then walk back from the current version in each archive to build up a history of versions, similar to the changelog method, except without the librarian reads.
A major issue arises, however, when we consider merges: their immediate ancestor from the parent will not show up in the publication history, so will not be found common between the two ancestries. The changelog-based method avoids this problem because a merge integrates both lines of ancestry into the single changelog. We would again have to do some version tricks in this case, with very similar issues to the pure version analysis method.
This method collapses further when we attempt to devise strategies to track parents across pocket and archive copies. What is the ancestor of a new -security upload? It could be from the latest published version in -security, -updates or -proposed, or it could be a superseded version, or it could be another one from a PPA. What is the ancestor of a PPA upload? The latest version in the matching series of the primary archive? What if it's from another PPA? The possibilities (and consequential pain) are endless!