[Haskell-beginners] repetition

Thu Aug 4 12:34:49 CEST 2011

On Thursday 04 August 2011, 11:44:31, Luca Ciciriello wrote:
> My mistake, I haven't specified that the starting list is ordered.

Then

map head . group

has the lower time complexity and does what you want.
Since nub has the type

nub :: Eq a => [a] -> [a]

its time complexity is O(n*d) where n is the length of the input list and d 
the number of distinct elements in the list, so generally it has quadratic 
behaviour.
map head . group has linear [O(n)] behaviour, thus scales better. But it 
produces different output in general.

If the types you need to handle are instances of Ord, not only of Eq, you 
can use that to write functions with a better time complexity.
Two very simple functions that remove duplicates and sort the input list 
are

import Data.List

ordNub1 :: Ord a => [a] -> [a]
ordNub1 = map head . group . sort

import qualified Data.Set as S

ordNub2 :: Ord a => [a] -> [a]
ordNub2 = S.toAscList . S.fromList
-- currently at least, S.toList is the same as S.toAscList,
-- but that's not guaranteed, although unlikely to change

Both have O(n*log d) time complexity, but generally ordNub2 is faster.

If you want the order in which the distinct elements appear in the input 
list to be preserved, like nub does, you can use

import qualified Data.Set as S

ordNub3 :: Ord a => [a] -> [a]
ordNub3 = go S.empty
  where
    go st (x:xs)
      | x `S.member` st = go st xs
      | otherwise = x : go (S.insert x st) xs
    go _ [] = []

which also has O(n*log d) time complexity.

> 
> Luca
> 
> On Aug 4, 2011, at 11:39 AM, Thomas Davie wrote:
> > On 4 Aug 2011, at 10:36, Lyndon Maydwell wrote:
> >> nub
> > 
> > Heh, our differing answers point out an ambiguity in the question.
> > 
> > Luca, given [5,10,10,5] what is this function expected to produce?
> > 
> > if you want [5,10] then nub is correct.
> > if you want [5,10,5] then map head . group is correct.
> > 
> > Bob