Diff for "Code/BranchRevisions"

Not logged in - Log In / Register

Differences between revisions 6 and 7
Revision 6 as of 2010-10-15 16:46:36
Size: 5385
Editor: abentley
Comment:
Revision 7 as of 2010-10-15 18:17:16
Size: 5765
Editor: abentley
Comment:
Deletions are marked like this. Additions are marked like this.
Line 57: Line 57:
Possible downside: Merges are stored relatively expensively-- for revisions foo, bar, baz, if foo merges bar, and bar merged baz, then baz gets two rows (one for foo, one for baz), and baz gets one row. However, merges do not need to be tracked for mainlines we don't care about. However, we probably care about more than 50% of mainlines for some projects (e.g. Launchpad).

Branches and Revisions

The links between branches and revisions are currently (Sep 2010) handled using the BranchRevision table. There is one row in this table for every revision in the branch. This is mildly insane for the number of feature branches that we encourage projects to use as the vast majority of the revisions are common to the branches.

Consider the Launchpad project itself. There are over 90k revisions in the ancestry, so every branch adds 90k rows to the BranchRevision table.

Before we can simplify, reduce and clean up this relationship, we need to understand what the entries are used for.

Uses for the BranchRevision table.

  • Branch page
    • shows the recent commits for any given branch, up to 10, and includes those already in trunk
      • Is that actually wanted, though? I think people might prefer seeing only revisions created since branching.AaronBentley

      • I think you are right for most branches. Developers of feature branches are only concerned with those since branching, but there are trunk branches where the information is (slightly) useful. Ideally I'd like this to come from loggerhead.TimPenhey

  • Merge proposal page
    • unmerged revisions (up to 10 - confusing ui)
    • commits since the start of the review
  • Finding the most relevant branch for any given revision (primarily used in the revision feeds)
    • What about just keeping track of which branch introduced the revision?AaronBentley

    • This approximation is probably fine.TimPenhey

  • Allocating revision karma
    • Is the branch relevant here, or just the project/package?AaronBentley

    • Just the project/package that the branch is connected to.TimPenhey

  • Merge detection in the scanner
    • Is the tip of this branch in the ancestry of the development focus branch?
    • And if scanning a series linked branch, is the tip of any unmerged branches of the same target present in my ancestry?
    • I feel that this use case is the harder one to solve if we keep a limited ancestry.

Meta: For revision feeds and karma, it seems like we're using a list of all branches containing the revision to find a single branch containing the revision-- if we just store the single branch, we can be more efficient.

Possible Solutions

Delta-compress the branch-revision table

This solution is highly compatible with our existing approach. It is a trade-off of performance for space, but with care, the performance reduction may be unobservable. It applies to all use cases.

Use loggerhead

This applies only to display use cases-- Branch revision listings possibly merge proposal revision listings

Scan only for revisions in the current branch that merge the tips of other branches

In the common case, adding a revision to a branch does not enable detecting a merge, because the revision being added to the branch will already be in the ancestry of the merging branch. The exceptions are new branches (which generally should not be set to merged) and ghost-filling. Ghost-filling is believed to be extremely rare.

Store only tip revision info and do multiple DB queries

This models the underlying branches well, but has performance costs. It applies to display use cases.

Store tip revision info, and group revisions by ancestry

Storing groups of, say, 100 revisions according to ancestry would allow retrieving the latest revisions in one or two single database queries and then doing in-memory graph operations. This models the underlying branches well, and applies to display use cases. It could be implemented to provide fast ghost-filling.

Use bzr-history-db

This is similar to "Store tip revision info, and group revisions by ancestry", but only lefthand history is included in the "groups", which are referred to as "revision ranges" in bzr-history-db.

This is a form of delta compression-- for any revision, it would be possible to look up the branches that include it (by traversing revision groups) and to look up all the revisions included by a branch. However, like "store tip revision info, and group revision by ancestry", it is biased toward fast lookups of recent data.

Loggerhead will be switching to bzr-history-db, so it would be advantageous if it could share the database with Launchpad. (Perhaps a separate db that only LP could write to?)

Possible downside: Merges are stored relatively expensively-- for revisions foo, bar, baz, if foo merges bar, and bar merged baz, then baz gets two rows (one for foo, one for baz), and baz gets one row. However, merges do not need to be tracked for mainlines we don't care about. However, we probably care about more than 50% of mainlines for some projects (e.g. Launchpad).

Store most-relevant branch on Revision

Since there is only one most-relevant branch, we do not need the one-to-many relationship that BranchRevision provides. However, if the most relevant branch is deleted, we would either need to accept a NULL field or find a new most-relevant branch. If we allow the field to become NULL, we can call it "introducing branch" rather than "most-relevant" branch. This supports allocating revision karma and revision feeds.

Store introducing project/package on Revision

This supports the Revision Karma use case. It is not affected by branch deletion, but will not track branch moves. It is subject to project/package deletion.

Associate merge proposals with Revisions

This supports the use case of displaying unmerged revisions.

Store the last 10 revisions for a branch

This supports the use case of displaying branch revisions.

Code/BranchRevisions (last edited 2010-10-26 14:45:32 by abentley)