Performance horrors

Neil Mitchell ndmitchell at gmail.com
Tue Aug 26 06:26:17 EDT 2008


Hi

> I was looking at the definitions of nub (and hence nubBy) recently
> in connection with a trac ticket and realised that this is O(n^2)
> complexity! Ouch!

Complexity theory plus the Eq context makes that inevitable. You can't
have nub over Eq in anything less than O(n^2)

> I was going to say I sure hope nobody uses this in real programs,
> but I thought I'd better check first and I see that darcs seems to
> use it in a few places. Hmm..

I use it all the time, its dead handy. There are 12 instances in
Hoogle, for example. If profiling later shows it to be a problem, I'd
fix it - but I can't ever actually remember that being the case.
Premature optimisation is the root of all evil.

> Seeing as practically all Eq instances are also Ord instances, at
> the very least we could have O(n*(log n)) definitions for ..
>
> nub :: Ord a => [a] -> [a]
> nub = nubBy compare
>
> nubBy :: (a -> a -> Ordering) -> [a] -> [a]
> nubBy cmp xs ys = -- something using an AVL tree perhaps.
>
> ..or even better using the generalised trie stuff Jamie Brandon
> has been working on.

Having nubOrd is a great idea, I even proposed it previously, but
people disagreed with me. Propose it and add it by all means.

> So could nub and nubBy be deprecated and an appropriate health warning
> added to the Haddock?

No. They should say O(n^2) in the haddock documentation, but O(n^2) /= useless.

Thanks

Neil


More information about the Libraries mailing list