[Haskell-cafe] V.I.P.s and the associativity of merge'
apfelmus at quantentunnel.de
Thu Dec 30 12:50:25 CET 2010
Leon Smith wrote:
> Ok, after mulling over the issues that Will Ness has brought up in
> the last few days , I think I have a partial explanation for the
> apparent tension between Will's observations and Heinrich Apfelmus's
> Implicit Heaps article , which both concern the implementation of
> mergeAll .
>  http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
>  http://apfelmus.nfshost.com/articles/implicit-heaps.html
> This raises the question, is there some
> combination of the shape of the merge' tree and some input for which
> using VIPs dramatically changes the efficiency of a mergeAll
> operation? I suspect the answer is yes, though I don't know for
> sure at this point in time.
> However, I do tacitly believe that the current tree that mergeAll
> uses doesn't exhibit this property for any input, and so I have
> simplified the implementations of mergeAll and unionAll in the latest
> version of data-ordlist-0.4.4 by avoiding the use of VIPs. This has
> the nice side benefit of modestly improving performance when the
> elements of the result are highly biased towards the first list.
> For those who remember the discussion about this about a year ago, it turns out
> there was a simpler version after all lurking somewhere in there (or is it
> primes = 2: primes'
> primes' = 3: 5: [7,9..] `minus` tfold
> [ [p*p,p*p+2*p..] | p <- primes' ]
> tfold ((x:xs):t) = x : xs `union` tfold (pairs t)
> pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t
Unfortunately, it turns out that this program is clear, shorter ... and
subtly wrong. :)
Here an example where the VIP merge would give a different result
bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) :
[1,2*** Exception: bad
but the VIP version would give at least
[1,2,3,4*** Exception: bad / Prelude: undefined
In other words, this new program already tries to compare the number 3
to the fourth list when it is still clear that only the first three
lists are relevant.
Note that this doesn't necessarily mean that the program does not work
for prime numbers, but *proving* correctness is now considerably more
difficult because you need estimates about the growth and distribution
of prime numbers. The VIP version always works as long as there are
infinitely many primes.
Also, since unnecessary comparisons are performed, it is no longer clear
whether the time and space complexity stays the same. (Which is not as
bad as it sounds, since we didn't know them well in the first place
anyway). More worryingly, changing the tree shape now affects correctness.
More information about the Haskell-Cafe