[Haskell-cafe] Re: Function to detect duplicates

Wed Feb 24 10:59:25 EST 2010

Am Mittwoch 24 Februar 2010 14:25:20 schrieb Ertugrul Soeylemez:
> Jonas Almström Duregård <jonas.duregard at gmail.com> wrote:
> > >>noneRepeated xs = xs == nub xs
> > >
> > > Not quite as bad, nub is O(n^2)
> >
> > You are correct of course. Still, it will probably be a bit less
> > inefficient if the length of the lists are compared (as opposed to the
> > elements):
> >
> > noneRepeated xs = length xs == length (nub xs)
> >
> > [...]
> >
> > > How can you nub in O(n*log n)? Remember, you only have Eq for nub.
>
> Again note that the big advantage of my method is laziness.  The
> comparison will end on the first duplicate found.

Yes, and the suggestions Jonas and I posted had the same property :)

> Using the following nub implementation the overall time complexity should
> be O(n * log n), but may be space-intensive, because it uses O(n) space.

Data.List.nub also uses O(n) space (but has a smaller constant factor).

> Also note that it has a different context (the type needs to be Ord
> instead of Eq):

Yeah, that's the catch, it has a more restricted type. If you have only Eq, 
I don't think you can do better than O(n^2). That's why I was irritated by

> > I think the nub-based solution is the best one in general, but it's
> > the base library implementation of nub, which is unfortunate.  In
> > fact, with a better nub implementation, this becomes an O(n * log n)
> > time

, for the type of nub, the library implementation is rather good (perhaps 
it can still be improved, but not much, I think).

>
>   import qualified Data.Set as S
>   import Data.List
>
>   myNub :: Ord a => [a] -> [a]
>   myNub = concat . snd . mapAccumL nubMap S.empty
>     where nubMap s x
>
>             | S.member x s = (s, [])
>             | otherwise    = (S.insert x s, [x])

I prefer

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 #-}
module OrdNub (ordNub, ordNubRare) where

import qualified Data.Set as Set

ordNub :: Ord a => [a] -> [a]
ordNub = go Set.empty
      where
        go !st (x:xs)
            | x `Set.member` st = go st xs
            | otherwise = x : go (Set.insert x st) xs
        go _ [] = []

, it's faster. If you know that duplicates are rare, 

ordNubRare :: Ord a => [a] -> [a]
ordNubRare = go 0 Set.empty
      where
        go sz st (x:xs)
            | sz1 == sz = go sz st xs
            | otherwise = x : go sz1 st1 xs
              where
                st1 = Set.insert x st
                !sz1 = Set.size st1
        go _ _ [] = []

is even faster because it omits the lookups (but it sucks when there are 
many duplicates, of course).

>
> Greets
> Ertugrul

Cheers,
Daniel