[Haskell-cafe] Function to detect duplicates

Daniel Fischer daniel.is.fischer at web.de
Wed Feb 24 16:13:00 EST 2010


Am Mittwoch 24 Februar 2010 21:25:04 schrieb Gwern Branwen:
> 2010/2/23 Jonas Almström Duregård <jonas.duregard at gmail.com>:
> > Hi Rafael,
> >
> > I assume you will perform this operation on some very large lists, or
> > performance would not be an issue. Have you tested if your optimized
> > version is better than your initial one?
> >
> > You should compare your implementation against something like this:
> >
> > import qualified Data.Set as Set
> > noneRepeated :: (Ord a) => [a] -> Bool
> > noneRepeated = accum Set.empty where
> >  accum _ [] = True
> >  accum s (x:xs)
> >    | Set.member x s = False
> >    | otherwise      = accum (Set.insert x s) xs
> >
> > Also there is some discussion about the nub function that relates to
> > this topic, e.g. http://buffered.io/2008/07/28/a-better-nub/.
> >
> > /Jonas
>
> Or better yet,
> http://www.haskell.org/pipermail/libraries/2008-October/010778.html Much
> more thorough and practical w/r/t to actually getting faster nubs in the
> libraries.

Umm,

using the nubOrd' code to nub a 1 million long list of pseudo random 
numbers takes (here) about 2.5 times the time and twice space as the Set-
based ordNub. It does slightly better for 100,000 elements, but still takes 
more than twice the time (and 1.6 x the space).

In my book, that's a compelling reason to go with the set-based 
implementation - unless we're talking about code to include directly in 
Data.List, but then I'd still _use_ the set-based one.


More information about the Haskell-Cafe mailing list