[Haskell-cafe] ordNub

Mon Jul 15 07:26:21 CEST 2013

On Sun, Jul 14, 2013 at 4:20 AM, Niklas Hambüchen <mail at nh2.me> wrote:
> tldr: nub is abnormally slow, we shouldn't use it, but we do.
>
>
> As you might know, Data.List.nub is O(n²). (*)
>
> As you might not know, almost *all* practical Haskell projects use it,
> and that in places where an Ord instance is given, e.g. happy, Xmonad,
> ghc-mod, Agda, darcs, QuickCheck, yesod, shake, Cabal, haddock, and 600
> more (see https://github.com/nh2/haskell-ordnub).
>
> I've taken the Ord-based O(n * log n) implementation from yi using a Set:
>
>   ordNub :: (Ord a) => [a] -> [a]
>   ordNub l = go empty l
>     where
>       go _ []     = []
>       go s (x:xs) = if x `member` s then go s xs
>                                     else x : go (insert x s) xs
>
>
> and put benchmarks on
> http://htmlpreview.github.io/?https://github.com/nh2/haskell-ordnub/blob/1f0a2c94a/report.html
> (compare `nub` vs `ordNub`).

Richard Bird has a book, "Pearls of Functional Algorithm Design" that
is meant to teach a form of deriving algorithms from the properties we
ask of them. In this book, he gives a possible derivation of ordNub,
simply called nub in the book, following the methodology he is
teaching. He notes in the text that this derivation feels more
complicated than it ought.

Here is his version: http://lpaste.net/87625

I just sent you a pull request to add that one and S.toList .
S.fromList that was suggested in this thread. I don't think those two
implementations are faster than the others but it's nice to have them
for completeness.

Jason