[Haskell-cafe] V.I.P.s and the associativity of merge'

Heinrich Apfelmus apfelmus at quantentunnel.de
Thu Dec 30 12:50:25 CET 2010

Leon Smith wrote:
> Ok,  after mulling over the issues that Will Ness has brought up in
> the last few days [1],  I think I have a partial explanation for the
> apparent tension between Will's observations and Heinrich Apfelmus's
> Implicit Heaps article [2],  which both concern the implementation of
> mergeAll [3].
> [1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
> [2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
> [3] 
> [...]
> This raises the question,  is there some
> combination of the shape of the merge' tree and some input for which
> using VIPs dramatically changes the efficiency of a mergeAll
> operation?   I suspect the answer is yes,  though I don't know for
> sure at this point in time.
> However,  I do tacitly believe that the current tree that mergeAll
> uses doesn't exhibit this property for any input,   and so I have
> simplified the implementations of mergeAll and unionAll in the latest
> version of data-ordlist-0.4.4 by avoiding the use of VIPs.   This has
> the nice side benefit of modestly improving performance when the
> elements of the result are highly biased towards the first list.

Will Ness
> For those who remember the discussion about this about a year ago, it turns out 
> there was a simpler version after all lurking somewhere in there (or is it 
> _out_?).
> primes = 2: primes' 
>    where
>     primes' = 3: 5: [7,9..] `minus` tfold
>                       [ [p*p,p*p+2*p..] | p <- primes' ]   
>     tfold ((x:xs):t)    = x : xs `union` tfold (pairs t)
>     pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t

Unfortunately, it turns out that this program is clear, shorter ... and 
subtly wrong. :)

Here an example where the VIP merge would give a different result

     bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
                   error "bad"

We have

     ghci> bad
     [1,2*** Exception: bad

but the VIP version would give at least

     ghci> bad
     [1,2,3,4*** Exception: bad / Prelude: undefined

In other words, this new program already tries to compare the number 3 
to the fourth list when it is still clear that only the first three 
lists are relevant.

Note that this doesn't necessarily mean that the program does not work 
for prime numbers, but *proving* correctness is now considerably more 
difficult because you need estimates about the growth and distribution 
of prime numbers. The VIP version always works as long as there are 
infinitely many primes.

Also, since unnecessary comparisons are performed, it is no longer clear 
whether the time and space complexity stays the same. (Which is not as 
bad as it sounds, since we didn't know them well in the first place 
anyway). More worryingly, changing the tree shape now affects correctness.

Heinrich Apfelmus


More information about the Haskell-Cafe mailing list