[Haskell-cafe] Time consumption nub
haskell at list.mightyreason.com
haskell at list.mightyreason.com
Wed Jul 18 08:43:45 EDT 2007
Arie Groeneveld wrote:
> Hi,
>
> Wondering about time space consuming: 'nub' vs 'map head.group.sort'
>
> Consider:
>
> ry = [1..10000] ++ replicate 13 5 ++ replicate 21 34
>
>
>
> *Main> length . nub $ ry
> 10000
> (5.18 secs, 1050000 bytes)
>
> *Main> length . map head . group . sort $ ry
> 10000
> (0.03 secs, 6293384 bytes)
>
> Time space
> nub --- +
> fnub +++ -
>
> + is better ;-)
>
> Thanks
>
> @@i=arie
>
nub is working on unsorted input.
If you want (nub.sort) then the best thing to use is a merge sort that discards
duplicates as it works. Copying and modifying GHC's Data.List.sort code:
> -- stolen from http://darcs.haskell.org/packages/base/Data/List.hs
> -- with 'merge' changed to discard duplicates.
> nsort l = mergesort compare l
> nsortBy cmp l = mergesort compare l
>
> mergesort :: (a -> a -> Ordering) -> [a] -> [a]
> mergesort cmp = mergesort' cmp . map wrap
>
> mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a]
> mergesort' cmp [] = []
> mergesort' cmp [xs] = xs
> mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)
>
> merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
> merge_pairs cmp [] = []
> merge_pairs cmp [xs] = [xs]
> merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss
>
> merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
> merge cmp xs [] = xs
> merge cmp [] ys = ys
> merge cmp (x:xs) (y:ys)
> = case x `cmp` y of
> GT -> y : merge cmp (x:xs) ys
> LT -> x : merge cmp xs (y:ys)
> EQ -> x : merge cmp xs ys
>
> wrap :: a -> [a]
> wrap x = [x]
>
Then you can use nsort or nsortBy, which benchmark (with -O2) as slightly faster
than (map head . group . sort)
Cheers,
Chris
More information about the Haskell-Cafe
mailing list