[Haskell-cafe] ordNub

Sun Jul 14 13:20:52 CEST 2013

tldr: nub is abnormally slow, we shouldn't use it, but we do.

As you might know, Data.List.nub is O(n²). (*)

As you might not know, almost *all* practical Haskell projects use it,
and that in places where an Ord instance is given, e.g. happy, Xmonad,
ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
more (see https://github.com/nh2/haskell-ordnub).

I've taken the Ord-based O(n * log n) implementation from yi using a Set:

  ordNub :: (Ord a) => [a] -> [a]
  ordNub l = go empty l
    where
      go _ []     = []
      go s (x:xs) = if x `member` s then go s xs
                                    else x : go (insert x s) xs

and put benchmarks on
http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
(compare `nub` vs `ordNub`).

`ordNub` is not only in a different complexity class, but even seems to
perform better than nub for very small numbers of actually different
list elements (that's the numbers before the benchmark names).

(The benchmark also shows some other potential problem: Using a state
monad to keep the set instead of a function argument can be up to 20
times slower. Should that happen?)

What do you think about ordNub?

I've seen a proposal from 5 years ago about adding a *sort*Nub function
started by Neil, but it just died.

(*) The mentioned complexity is for the (very common) worst case, in
which the number of different elements in the list grows with the list
(alias you don't have an N element list with always only 5 different
things inside).