[Haskell-cafe] A suggestion: On why code merges should influence Haskell design

David Turner dct25-561bs at mythic-beasts.com
Wed Oct 14 08:14:21 UTC 2015

Yes, you're quite right, it's not uncommon for the result of a rebase
either not to compile or else to fail some set of tests. But then that's
also true of all code changes even by a single developer in isolation.

A dependency analysis is an interesting idea (indeed, it's one of the tools
we use to work out what the rough consequences of a change will be) but
risks both false positives and false negatives. On the false positive side,
as you observe, there'll often be something like 'main' which transitively
depends on two parallel changes. On the false negative side, you can get
conflicting changes that occur in separate programs - for instance the two
sides of a network protocol - and a naive dependency analysis will miss
this kind of problem. Heck, the changes could be in different languages
(think browser-side JS talking via AJAX to server-side Haskell). Or even in
separate repositories, possibly even developed by separate organisations,
so you can't even tell you've done a merge let alone tell that it contains

Because of this sort of reasoning, I'm sceptical that the problem of
'bugs-introduced-by-merges' is really any easier than the general problem
of 'bugs'. It's a continuum, not a clearly-defined subclass of bugs.

I realised that there's another way to justify the unreasonable
effectiveness of rebasing over merging. Assuming (conscientious) developers
A & B making parallel changes:

A1 -> A2 -> .... -> An
B1 -> B2 -> .... -> Bm

Each individual change can be seen to be correct, because that's how
conscientious developers work. But then when it comes to merging, from
their individual points of view it looks like:

A1 -> A2 -> .... -> An -> [all of B's changes in one go]
B1 -> B2 -> .... -> Bm -> [all of A's changes in one go]

It's very hard for either A or B to reason about the correctness of that
last step because it's such a big change. On the other hand, rebasing B's
changes on top of A's looks like this:

A1 -> A2 -> .... -> An -> B1' -> B2' -> .... -> Bm'

Then B can go through their rebased changes and use pretty much the same
reasoning as they were using before to check that they're still correct.


On 13 October 2015 at 21:08, Dimitri DeFigueiredo <defigueiredo at ucdavis.edu>

> The 6 years and 100K+ commits data point is pretty impressive! It makes me
> question how much effort should be devoted to avoiding these
> "merge-introduced" bugs in the first place. I assume this data point does
> not capture bugs introduced by merges that were found by unit tests though.
> I hadn't looked at how to actually go about writing the "bug-free" merge
> tool (if it becomes possible to write one). However, I think my immediate
> approach would be different from your suggestion. I was thinking that we
> could just look at the dependency graph between the definitions.
> Consider 2 possible (previously suggested) merge scenarios where we define
> both a variable and a function. In both scenarios:
> - Branch A changes the "variable" (but does not change the function)
> - Branch B changes the function
> The scenarios are different because of the dependency between the function
> and the variable:
> 1. Function definition is independent of variable definition
> https://gist.github.com/dimitri-xyz/f0819c947ecd386b84d6
> 2. Function definition dependents on variable definition
> https://gist.github.com/dimitri-xyz/81a56b0ce0929e8513e9
> Intuitively, I would like to have a merge conflict in scenario 2, but none
> in scenario 1. That suggests the following rule:
> We have a conflict iif a definition *that was changed* in branch B depends
> (transitively) on a definition *that was change* in branch A or vice-versa.
> In scenario 1, the definition of the function does not depend on the
> definition of the variable, so there is no conflict. Notice that 'main'
> depends on both, but it hasn't changed. In scenario 2, the definition of
> the functions has changed and it depends on the definition of the variable,
> so we would have a conflict.
> I understand this may not be sufficient to ensure no bugs are introduced,
> but this is my first approximation. The rule just captures the notion that
> programmer's shouldn't have their assumptions pulled from under their feet
> as they make changes. I wonder if/how I could claim that 'main' (that
> depends on both the function and the variable) is "merge-bug free" in the
> merged version.
> Cheers,
> Dimitri
> On 10/12/15 2:16 PM, David Turner wrote:
>> Hi,
>> I work on a team of ~15 developers and we follow a merge-free process
>> using rebasing. Although in theory it sounds like you'll get the same
>> problems with rebasing as with merging, over 6 years and 100,000+ commits
>> I'm struggling to think of a single bug silently introduced due to a rebase
>> that succeeded without conflicts. It's a cunning bit of social engineering:
>> when merging the work of two developers, it's easy for both of them to
>> disclaim responsibility for any bugs introduced by the merge, but when
>> rebasing the work of developer A on top of that of developer B it's clear
>> that any bugs are dev A's responsibility as their commits come later in the
>> published history than dev B's, and this gives dev A the incentive to check
>> for semantic conflicts properly.
>> That said, it's theoretically possible to introduce bugs with a 'clean'
>> rebase, just as with merging, and I'm interested in your suggestion because
>> of that. The difficulty I see is defining what you mean by "no new bugs".
>> The two branches satisfy two different specifications, and my reading of
>> "no new bugs" is that the merge result satisfies a merged specification. It
>> sounds difficult to automatically merge two specs together in general,
>> although in practice specs are often quite informal and where a rebase
>> looks questionable a merged spec can be invented and verified with human
>> intervention, remembering that the rebasing developer has a personal
>> incentive to get this right.
>> There's two kind of formal spec that I know of in common usage in the
>> industry: types and tests. Using a type system gives you a bunch of
>> theorems about what your program can and cannot do, and this is checked at
>> compile time, and an automated test suite is basically a bunch of
>> properties that your program satisfies which can be verified by running the
>> test suite. Neither of these can capture the real spec exactly, but in
>> practice they turn out to be good enough for many purposes. For those cases
>> where that's not enough you can bring out the heavy artillery of a theorem
>> prover, and this does let you write down the real spec exactly and verify
>> it either side of a merge or rebase. I wouldn't say that approach was in
>> common usage but it does get used where the investment is worth it.
>> Verifying that changes to the type system leave the real spec satisfied
>> sounds quite hard in general. Merging additions to the test suite sounds
>> easy but other changes to the part of the spec embodied in the test suite
>> also sounds hard in general. Of the three, merging changes to an
>> automatically-proved theorem actually feels easiest - my experience is that
>> keeping a (well-written) automatic proof in sync with changes to code is
>> often not that hard, and when it is hard it's normally been because the
>> changes to the code meant the theorem was no longer true.
>> Perhaps enriching the type system to a point where you can write truly
>> useful specs in it is the answer? Thus dependent types.
>> Cheers,
>> David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20151014/2bb699f0/attachment.html>

More information about the Haskell-Cafe mailing list