[Haskell-cafe] Re: Implementing "unionAll"

Tue Feb 16 22:56:28 EST 2010

> I see no obvious deficiencies. :) Personally, I'd probably structure it like
>
>   http://www.haskell.org/haskellwiki/Prime_numbers#Implicit_Heap

This variant,  based on the wiki article,  is cleaner,  slightly
simpler,  appears to be just as fast,  and allocates slightly less
memory:

> import GHC.Exts(inline)
> import Data.List.Ordered(unionBy)

> union' :: People Int -> People Int -> People Int
> union' (VIP x xt) ys                    = VIP x (union' xt ys)
> union' (Crowd xs) (Crowd ys)            = Crowd (inline unionBy compare xs ys)
> union' xs@(Crowd (x:xt)) ys@(VIP y yt)  = case compare x y of
>    LT -> VIP x (union' (Crowd xt) ys)
>    EQ -> VIP x (union' (Crowd xt) yt)
>    GT -> VIP y (union' xs yt)

> foldTree :: (a -> a -> a) -> [a] -> a
> foldTree f xs = case xs of
>                   [] -> []
>                   xs -> loop xs
>      where
>            loop [x]    = x
>            loop (x:xs) = x `f` loop (pairs xs)
>
>            pairs (x:y:ys) = f x y : pairs ys
>            pairs xs = xs

>  unions xss = serve $ inline foldTree union' [ VIP x (Crowd xs) | (x:xs) <- xss ]
>     where
>     serve (VIP x xs) = x:serve xs
>     serve (Crowd xs) = xs

One of the differences is that I started with a slightly different
"foldTree",  one that was taken directly from Data.List.sort.

The only problem is that it has the same problem as I mentioned:

unionAll [[1,2],[1,2]]  == [1,1,2]

whereas unionAll is intended to be a generalization of "foldr union
[]" to an infinite number of lists,  and should thus return [1,2].
But I should be able to fix this without much difficulty.

> Your  loop  function is a strange melange of many different concerns
> (building a tree, union', adding and removing the VIP constructors).
>
>
> Note that it's currently unclear to me whether the lazy pattern match in
>
>   pairs ~(x: ~(y:ys)) = f x y : pairs ys
>
> is beneficial or not; you used a strict one
>
>   unionPairs (x:y:zs) = union' x y : unionPairs zs
>

Well,  as the library implementation must work on finite cases as
well,  the lazy pattern seems out of the question.

> If you're really concerned about time & space usage, it might even be
> worth to abandon the lazy tree altogether and use a heap to achieve the
> same effect, similar to Melissa O'Neils prime number code. It's not as
> "neat", but much more predictable. :)

Well, it is intended as a high quality, generally useful
implementation,  so of course I care about time and space usage.  :)
 Dave Bayer's original algorithm does slightly better,  but was much
larger in terms of both source code and object size.

Omar implemented something along these lines,  but it didn't perform
so well.   I did not dig into the reasons why, though;  it might not
have had anything to do with the fact an explicit heap was used.

Incidentally,  I tried implementing something like implicit heaps once
upon a time;   but it had a severe performance problem,  taking a few
minutes to produce 20-30 elements.    I didn't have a pressing reason
to figure out why though,  and didn't pursue it further.

Best,
Leon