[Haskell-cafe] Re: Implementing "unionAll"
Heinrich Apfelmus
apfelmus at quantentunnel.de
Wed Feb 17 06:58:55 EST 2010
Leon Smith wrote:
> Heinrich Apfelmus wrote:
>> I see no obvious deficiencies. :) Personally, I'd probably structure it like
>>
>> http://www.haskell.org/haskellwiki/Prime_numbers#Implicit_Heap
>
> This variant, based on the wiki article, is cleaner, slightly
> simpler, appears to be just as fast, and allocates slightly less
> memory:
>
>> import GHC.Exts(inline)
>> import Data.List.Ordered(unionBy)
>
>> union' :: People Int -> People Int -> People Int
>> union' (VIP x xt) ys = VIP x (union' xt ys)
>> union' (Crowd xs) (Crowd ys) = Crowd (inline unionBy compare xs ys)
>> union' xs@(Crowd (x:xt)) ys@(VIP y yt) = case compare x y of
>> LT -> VIP x (union' (Crowd xt) ys)
>> EQ -> VIP x (union' (Crowd xt) yt)
>> GT -> VIP y (union' xs yt)
>
>> foldTree :: (a -> a -> a) -> [a] -> a
>> foldTree f xs = case xs of
>> [] -> []
>> xs -> loop xs
>> where
>> loop [x] = x
>> loop (x:xs) = x `f` loop (pairs xs)
>>
>> pairs (x:y:ys) = f x y : pairs ys
>> pairs xs = xs
>
>> unions xss = serve $ inline foldTree union' [ VIP x (Crowd xs) | (x:xs) <- xss ]
>> where
>> serve (VIP x xs) = x:serve xs
>> serve (Crowd xs) = xs
>
> One of the differences is that I started with a slightly different
> "foldTree", one that was taken directly from Data.List.sort.
>
> The only problem is that it has the same problem as I mentioned:
>
> unionAll [[1,2],[1,2]] == [1,1,2]
>
> whereas unionAll is intended to be a generalization of "foldr union
> []" to an infinite number of lists, and should thus return [1,2].
> But I should be able to fix this without much difficulty.
Ah, I meant to use the union' from your previous message, but I think
that doesn't work because it doesn't have the crucial property that the case
union (VIP x xs) ys = ...
does not pattern match on the second argument.
The easiest solution is simply to define
unionAll = nub . mergeAll
where
-- specialized definition of nub
nub = map head . groupBy (==)
But you're probably concerned that filtering for duplicates afterwards
will be less efficient. After all, the (implicit) tree built by
mergeAll might needlessly compare a lot of equal elements.
Fortunately, it is straightforward to fuse nub into the tree merging:
nub . serve . foldTree union'
= serve . nubP . foldTree union'
= serve . foldTree (nub' . union')
with appropriate definitions of nubP and nub' . In particular, the
definition
-- remove duplicate VIPs
nub' (Crowd xs) = Crowd xs
nub' (VIP x xs) = VIP x (guard x xs)
where
guard x (VIP y ys)
| x == y = nub' ys
| otherwise = VIP y (guard y ys)
guard x (Crowd (y:ys))
| x == y = Crowd ys
| otherwise = Crowd (y:ys)
takes advantage of the facts that
* the left and right arguments of union' can now be assumed to not
contain duplicates
* crowds do not contain duplicates thanks to the call to unionBy
Whether nub' saves more comparisons than it introduces is another
question. If you want, you can probably fuse nub' and union' as
well, but I guess the result won't be pretty.
> Incidentally, I tried implementing something like implicit heaps once
> upon a time; but it had a severe performance problem, taking a few
> minutes to produce 20-30 elements. I didn't have a pressing reason
> to figure out why though, and didn't pursue it further.
Yeah, they're tricky to get right. One pattern match too strict and it's
sucked into a black hole, two pattern matches too lazy and it will leak
space like the big bang. :)
Regards,
Heinrich Apfelmus
--
http://apfelmus.nfshost.com
More information about the Haskell-Cafe
mailing list