Performance horrors

Adrian Hey ahey at iee.org
Tue Aug 26 03:38:45 EDT 2008


Hello Folks,

I was looking at the definitions of nub (and hence nubBy) recently
in connection with a trac ticket and realised that this is O(n^2)
complexity! Ouch!

I was going to say I sure hope nobody uses this in real programs,
but I thought I'd better check first and I see that darcs seems to
use it in a few places. Hmm..

How did we ever get stuff like this in the standard libs? I can only
imagine this is relic from the days when Haskell was only used for
research or pedagogical purposes only (not for real programs).

Seeing as practically all Eq instances are also Ord instances, at
the very least we could have O(n*(log n)) definitions for ..

nub :: Ord a => [a] -> [a]
nub = nubBy compare

nubBy :: (a -> a -> Ordering) -> [a] -> [a]
nubBy cmp xs ys = -- something using an AVL tree perhaps.

..or even better using the generalised trie stuff Jamie Brandon
has been working on.

Of course I'm not actually advocating this as it's probably bad policy
to have a simple standard lib dependent on any complex non-standard lib.
But if it just isn't possible to implement some functions with
reasonable efficiency using simple definitions only then I think they
really shouldn't be there at all. Instead we should make users "roll
their own" using whatever complex non-standard lib they want.

So could nub and nubBy be deprecated and an appropriate health warning
added to the Haddock?

In the mean time I think I'll put appropriate alternative definitions
in the next AVL release and ask Jamie Brandon to consider doing the same
in his generalised tries lib (GMap).

Also, looking at a few other functions there like union(By) and
intersection(By) make me quite suspicious. Maybe we need a thorough
performance audit to get rid of implementations that are so insanely
inefficient they should *never* be used.

Regards
--
Adrian Hey






More information about the Libraries mailing list