[Haskell-cafe] Implementing "unionAll"

Leon Smith leon.p.smith at gmail.com
Sat Feb 13 16:18:12 EST 2010


With the urging and assistance of Omar Antolín Camarena,  I will be
adding two functions to data-ordlist:  mergeAll and unionAll,  which
merge (or union)  a potentially infinite list of potentially infinite
ordered lists,   under the assumption that the heads of the non-empty
lists appear in a non-decreasing sequence.

Union takes two sorted lists and produces a new sorted list;  an
element occurs in the result as many times as the maximum number of
occurrences in either list.   The unionAll function generalizes this
behavior to an infinite number of lists.

A reasonable implementation of mergeAll is:

> import Data.List.Ordered(merge, union)

> mergeAll :: Ord a => [[a]] -> [a]
> mergeAll = foldr (\(x:xs) ys -> x : merge xs ys) []

However,  for many inputs,  we can do better;   the library
implementation of mergeAll is based on H. Apfelmus's article "Implicit
Heaps",  which presents a simplification of Dave Bayer's "venturi"
algorithm.   The difference is that the foldr version uses a line of
comparisons, whereas "venturi" uses a tree of comparisons.

http://apfelmus.nfshost.com/articles/implicit-heaps.html
http://www.mail-archive.com/haskell-cafe@haskell.org/msg27612.html

However,  as Omar pointed out to me,  the following implementation of
unionAll has a flaw:

> unionAll :: Ord a => [[a]] -> [a]
> unionAll = foldr (\(x:xs) ys -> x : union xs ys) []

Namely unionAll [[1,2],[1,2]] should return [1,2],  whereas it
actually returns [1,1,2].   After some work,  I believe I have
generalized H. Apfelmus's algorithm to handle this;  however it seems
a bit complicated.   I would love feedback,  especially with regard to
simplifications,  bugs,  testing strategies,  and optimizations:

> unionAll' :: Ord a => [[a]] -> [a]
> unionAll' = unionAllBy compare

> data People a = VIP a (People a) | Crowd [a]

> unionAllBy :: (a -> a -> Ordering) -> [[a]] -> [a]
> unionAllBy cmp xss = loop [ (VIP x (Crowd xs)) | (x:xs) <- xss ]
>   where
>     loop [] = []
>     loop (  VIP x xs  :  VIP y ys  :  xss  )
>       = case cmp x y of
>           LT -> x : loop (  xs  :  VIP y ys  :  xss  )
>           EQ ->     loop (  VIP x (union' xs ys)  :  unionPairs xss  )
>           GT -> error "Data.List.Ordered.unionAll:  assumption violated!"
>     loop (  VIP x xs  :  xss  )
>       =  x : loop (xs:xss)
>     loop [Crowd xs] = xs
>     loop (xs:xss) = loop (unionPairs (xs:xss))
>
>     unionPairs [] = []
>     unionPairs [x] = [x]
>     unionPairs (x:y:zs) = union' x y : unionPairs zs
>
>     union' (VIP x xs) (VIP y ys)
>        = case cmp x y of
>            LT -> VIP x (union' xs (VIP y ys))
>            EQ -> VIP x (union' xs ys)
>            GT -> error "Data.List.Ordered.unionAll:  assumption violated!"
>     union' (VIP x xs) (Crowd ys) = VIP x (union' xs (Crowd ys))
>     union' (Crowd []) ys = ys
>     union' (Crowd xs) (Crowd ys) = Crowd (unionBy cmp xs ys)
>     union' xs@(Crowd (x:xt)) ys@(VIP y yt)
>        = case cmp x y of
>            LT -> VIP x (union' (Crowd xt) ys)
>            EQ -> VIP x (union' (Crowd xt) yt)
>            GT -> VIP y (union' xs yt)

-- Leon


More information about the Haskell-Cafe mailing list