[Haskell-cafe] A suggestion: On why code merges should influence Haskell design

Tue Oct 13 20:08:14 UTC 2015

The 6 years and 100K+ commits data point is pretty impressive! It makes 
me question how much effort should be devoted to avoiding these 
"merge-introduced" bugs in the first place. I assume this data point 
does not capture bugs introduced by merges that were found by unit tests 
though.

I hadn't looked at how to actually go about writing the "bug-free" merge 
tool (if it becomes possible to write one). However, I think my 
immediate approach would be different from your suggestion. I was 
thinking that we could just look at the dependency graph between the 
definitions.

Consider 2 possible (previously suggested) merge scenarios where we 
define both a variable and a function. In both scenarios:
- Branch A changes the "variable" (but does not change the function)
- Branch B changes the function

The scenarios are different because of the dependency between the 
function and the variable:
1. Function definition is independent of variable definition
https://gist.github.com/dimitri-xyz/f0819c947ecd386b84d6

2. Function definition dependents on variable definition
https://gist.github.com/dimitri-xyz/81a56b0ce0929e8513e9

Intuitively, I would like to have a merge conflict in scenario 2, but 
none in scenario 1. That suggests the following rule:

We have a conflict iif a definition *that was changed* in branch B 
depends (transitively) on a definition *that was change* in branch A or 
vice-versa.

In scenario 1, the definition of the function does not depend on the 
definition of the variable, so there is no conflict. Notice that 'main' 
depends on both, but it hasn't changed. In scenario 2, the definition of 
the functions has changed and it depends on the definition of the 
variable, so we would have a conflict.

I understand this may not be sufficient to ensure no bugs are 
introduced, but this is my first approximation. The rule just captures 
the notion that programmer's shouldn't have their assumptions pulled 
from under their feet as they make changes. I wonder if/how I could 
claim that 'main' (that depends on both the function and the variable) 
is "merge-bug free" in the merged version.

Cheers,

Dimitri

On 10/12/15 2:16 PM, David Turner wrote:
> Hi,
>
> I work on a team of ~15 developers and we follow a merge-free process 
> using rebasing. Although in theory it sounds like you'll get the same 
> problems with rebasing as with merging, over 6 years and 100,000+ 
> commits I'm struggling to think of a single bug silently introduced 
> due to a rebase that succeeded without conflicts. It's a cunning bit 
> of social engineering: when merging the work of two developers, it's 
> easy for both of them to disclaim responsibility for any bugs 
> introduced by the merge, but when rebasing the work of developer A on 
> top of that of developer B it's clear that any bugs are dev A's 
> responsibility as their commits come later in the published history 
> than dev B's, and this gives dev A the incentive to check for semantic 
> conflicts properly.
>
> That said, it's theoretically possible to introduce bugs with a 
> 'clean' rebase, just as with merging, and I'm interested in your 
> suggestion because of that. The difficulty I see is defining what you 
> mean by "no new bugs". The two branches satisfy two different 
> specifications, and my reading of "no new bugs" is that the merge 
> result satisfies a merged specification. It sounds difficult to 
> automatically merge two specs together in general, although in 
> practice specs are often quite informal and where a rebase looks 
> questionable a merged spec can be invented and verified with human 
> intervention, remembering that the rebasing developer has a personal 
> incentive to get this right.
>
> There's two kind of formal spec that I know of in common usage in the 
> industry: types and tests. Using a type system gives you a bunch of 
> theorems about what your program can and cannot do, and this is 
> checked at compile time, and an automated test suite is basically a 
> bunch of properties that your program satisfies which can be verified 
> by running the test suite. Neither of these can capture the real spec 
> exactly, but in practice they turn out to be good enough for many 
> purposes. For those cases where that's not enough you can bring out 
> the heavy artillery of a theorem prover, and this does let you write 
> down the real spec exactly and verify it either side of a merge or 
> rebase. I wouldn't say that approach was in common usage but it does 
> get used where the investment is worth it.
>
> Verifying that changes to the type system leave the real spec 
> satisfied sounds quite hard in general. Merging additions to the test 
> suite sounds easy but other changes to the part of the spec embodied 
> in the test suite also sounds hard in general. Of the three, merging 
> changes to an automatically-proved theorem actually feels easiest - my 
> experience is that keeping a (well-written) automatic proof in sync 
> with changes to code is often not that hard, and when it is hard it's 
> normally been because the changes to the code meant the theorem was no 
> longer true.
>
> Perhaps enriching the type system to a point where you can write truly 
> useful specs in it is the answer? Thus dependent types.
>
> Cheers,
>
> David