# [Haskell-cafe] Re: Function to detect duplicates

Daniel Fischer daniel.is.fischer at web.de
Wed Feb 24 10:59:25 EST 2010

```Am Mittwoch 24 Februar 2010 14:25:20 schrieb Ertugrul Soeylemez:
> Jonas Almström Duregård <jonas.duregard at gmail.com> wrote:
> > >>noneRepeated xs = xs == nub xs
> > >
> > > Not quite as bad, nub is O(n^2)
> >
> > You are correct of course. Still, it will probably be a bit less
> > inefficient if the length of the lists are compared (as opposed to the
> > elements):
> >
> > noneRepeated xs = length xs == length (nub xs)
> >
> > [...]
> >
> > > How can you nub in O(n*log n)? Remember, you only have Eq for nub.
>
> Again note that the big advantage of my method is laziness.  The
> comparison will end on the first duplicate found.

Yes, and the suggestions Jonas and I posted had the same property :)

> Using the following nub implementation the overall time complexity should
> be O(n * log n), but may be space-intensive, because it uses O(n) space.

Data.List.nub also uses O(n) space (but has a smaller constant factor).

> Also note that it has a different context (the type needs to be Ord

Yeah, that's the catch, it has a more restricted type. If you have only Eq,
I don't think you can do better than O(n^2). That's why I was irritated by

> > I think the nub-based solution is the best one in general, but it's
> > the base library implementation of nub, which is unfortunate.  In
> > fact, with a better nub implementation, this becomes an O(n * log n)
> > time

, for the type of nub, the library implementation is rather good (perhaps
it can still be improved, but not much, I think).

>
>   import qualified Data.Set as S
>   import Data.List
>
>   myNub :: Ord a => [a] -> [a]
>   myNub = concat . snd . mapAccumL nubMap S.empty
>     where nubMap s x
>
>             | S.member x s = (s, [])
>             | otherwise    = (S.insert x s, [x])

I prefer

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 #-}
module OrdNub (ordNub, ordNubRare) where

import qualified Data.Set as Set

ordNub :: Ord a => [a] -> [a]
ordNub = go Set.empty
where
go !st (x:xs)
| x `Set.member` st = go st xs
| otherwise = x : go (Set.insert x st) xs
go _ [] = []

, it's faster. If you know that duplicates are rare,

ordNubRare :: Ord a => [a] -> [a]
ordNubRare = go 0 Set.empty
where
go sz st (x:xs)
| sz1 == sz = go sz st xs
| otherwise = x : go sz1 st1 xs
where
st1 = Set.insert x st
!sz1 = Set.size st1
go _ _ [] = []

is even faster because it omits the lookups (but it sucks when there are
many duplicates, of course).

>
> Greets
> Ertugrul

Cheers,
Daniel

```