Creating common branch ancestry is a hard problem Jan. 19th, 2009 David Strauss

Creating common branch ancestry is a hard problem

January 19th, 2009

One of the key features of distributed version control systems (DVCS) is support for divergent development (branching) and then merging. Most DVCS tools, including git and Bazaar, include rather elegant support for such workflows by embedding metadata about common ancestry into branches. In this post, I’ll be focusing on Bazaar.

“Common ancestry” means an identical revision shared by two branches. The most recent common ancestor typically indicates the point of branching, unless there has been a more recent merge. A successful merge between two branches establishes a new, more recent common ancestor. Identifying a common ancestor is a required step for performing automatic three-way merges to integrate changes from a foreign, divergent branch.

Typically, the branching metadata allows Bazaar to automatically determine the most recent common ancestor to use as the base revision for three-way merging. The problem comes when you need to merge two branches that do not share any common ancestors. (For the purpose of this post, I am not counting revision zero, the universal common ancestor, as a common ancestor. It is generally useless for merging.)

Creating common ancestry
If you try to merge two branches without common ancestry and without any revision identifiers, Bazaar will complain and do nothing. You can specify a revision range (like -r5..-1, which would be everything from the fifth revision to the latest on the foreign branch) and then apply the merge. Bazaar will then merge the changes in the specified revision range into your local branch and establish the last merged revision as the latest common ancestor, making future merges a breeze. Unfortunately, finishing this initial merge is where dragons lie. But before I can get into the difficulty of creating common ancestry by merging two unrelated branches, I have to briefly discuss how Bazaar handles files.

How Bazaar tracks files
Bazaar maps file paths to globally unique file IDs, and two branches without prior common ancestry will have different file IDs for the same paths, even if the files are really the same. Every time a file is added to a Bazaar branch, it gets a unique ID. As files are moved and renamed, they keep their unique IDs. So, if two people download Drupal, extract it, and independently “bzr init”, “bzr add”, and “bzr commit”, their respective README.txt files (for example) will have different file IDs.

Creating common ancestry (continued)
These globally unique file IDs cause trouble when merging from unrelated branches. When Bazaar merges two branches (related or not), two files with the same path but different file IDs create a conflict even if they contain identical content. Merging two unrelated branches with lots of shared files creates a mess of conflict files and conflict directories, and there is currently no convenient way to resolve these conflicts.

A concrete example with Drupal
A developer downloads Drupal and puts the project under version control in a fresh Bazaar branch. She discovers that Four Kitchens maintains a Bazaar branch of stable Drupal releases and decides she would like to use the Four Kitchens branch to automate installation of minor Drupal updates. She attempts to merge in the revision range from the Four Kitchens branch that would perform the upgrade. Because the file IDs in the Four Kitchens branch differ from hers, every Drupal core file and directory has a conflict despite having identical content for the base merge revision. After a bit of tedious conflict cleanup, she commits the merge and goes back to enjoying Bazaar’s generally elegant architecture.

I thought this blog post was going to give me a solution!
Nope, there’s not one out there yet. I’m currently looking at writing a custom merge handler (subclassed from the standard merge3 in Bazaar) that would intelligently handle merges where file paths do represent the same files, regardless of file IDs. Unfortunately, the file ID/path conflict is low-level in Bazaar and occurs before reaching the most modular part of merge conflict resolution.

Thanks to Robert Collins on the Bazaar project for walking me through the Bazaar internals necessary for me to explain this issue and, hopefully, solve it.