[Haskell-cafe] Re: Implementing "unionAll"
Heinrich Apfelmus
apfelmus at quantentunnel.de
Tue Feb 16 12:21:53 EST 2010
Leon Smith wrote:
> With the urging and assistance of Omar Antolín Camarena, I will be
> adding two functions to data-ordlist: mergeAll and unionAll, which
> merge (or union) a potentially infinite list of potentially infinite
> ordered lists, under the assumption that the heads of the non-empty
> lists appear in a non-decreasing sequence.
>
> However, as Omar pointed out to me, the following implementation of
> unionAll has a flaw:
>
>> unionAll :: Ord a => [[a]] -> [a]
>> unionAll = foldr (\(x:xs) ys -> x : union xs ys) []
>
> Namely unionAll [[1,2],[1,2]] should return [1,2], whereas it
> actually returns [1,1,2]. After some work, I believe I have
> generalized H. Apfelmus's algorithm to handle this; however it seems
> a bit complicated. I would love feedback, especially with regard to
> simplifications, bugs, testing strategies, and optimizations:
>
>> unionAll' :: Ord a => [[a]] -> [a]
>> unionAll' = unionAllBy compare
>
>> data People a = VIP a (People a) | Crowd [a]
>
>> unionAllBy :: (a -> a -> Ordering) -> [[a]] -> [a]
>> unionAllBy cmp xss = loop [ (VIP x (Crowd xs)) | (x:xs) <- xss ]
>> where
>> loop [] = []
>> loop ( VIP x xs : VIP y ys : xss )
>> = case cmp x y of
>> LT -> x : loop ( xs : VIP y ys : xss )
>> EQ -> loop ( VIP x (union' xs ys) : unionPairs xss )
>> GT -> error "Data.List.Ordered.unionAll: assumption violated!"
>> loop ( VIP x xs : xss )
>> = x : loop (xs:xss)
>> loop [Crowd xs] = xs
>> loop (xs:xss) = loop (unionPairs (xs:xss))
>>
>> unionPairs [] = []
>> unionPairs [x] = [x]
>> unionPairs (x:y:zs) = union' x y : unionPairs zs
>>
>> union' (VIP x xs) (VIP y ys)
>> = case cmp x y of
>> LT -> VIP x (union' xs (VIP y ys))
>> EQ -> VIP x (union' xs ys)
>> GT -> error "Data.List.Ordered.unionAll: assumption violated!"
>> union' (VIP x xs) (Crowd ys) = VIP x (union' xs (Crowd ys))
>> union' (Crowd []) ys = ys
>> union' (Crowd xs) (Crowd ys) = Crowd (unionBy cmp xs ys)
>> union' xs@(Crowd (x:xt)) ys@(VIP y yt)
>> = case cmp x y of
>> LT -> VIP x (union' (Crowd xt) ys)
>> EQ -> VIP x (union' (Crowd xt) yt)
>> GT -> VIP y (union' xs yt)
I see no obvious deficiencies. :) Personally, I'd probably structure it like
http://www.haskell.org/haskellwiki/Prime_numbers#Implicit_Heap
so that your code becomes
unionAll = serve . foldTree union' . map vip
Your loop function is a strange melange of many different concerns
(building a tree, union', adding and removing the VIP constructors).
Note that it's currently unclear to me whether the lazy pattern match in
pairs ~(x: ~(y:ys)) = f x y : pairs ys
is beneficial or not; you used a strict one
unionPairs (x:y:zs) = union' x y : unionPairs zs
Daniel Fischer's experiments suggest that the strict one is better
http://www.mail-archive.com/haskell-cafe@haskell.org/msg69807.html
If you're really concerned about time & space usage, it might even be
worth to abandon the lazy tree altogether and use a heap to achieve the
same effect, similar to Melissa O'Neils prime number code. It's not as
"neat", but much more predictable. :)
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Haskell-Cafe
mailing list