Performance horrors

Cale Gibbard cgibbard at gmail.com
Fri Aug 29 02:56:09 EDT 2008


2008/8/26 Adrian Hey <ahey at iee.org>:
> Seeing as practically all Eq instances are also Ord instances, at
> the very least we could have O(n*(log n)) definitions for ..
>
> nub :: Ord a => [a] -> [a]
> nub = nubBy compare
>
> nubBy :: (a -> a -> Ordering) -> [a] -> [a]
> nubBy cmp xs ys = -- something using an AVL tree perhaps.

While I agree that it would be handy to have Ord-specific versions of
these, I'd just like to reiterate/expand on something which Neil
mentioned: map head . group . sort has the additional effect of: 1)
Demanding the entire input list before it can produce even a single
element, and 2) Sorting the result rather than keeping things in the
order they first occurred in the input.

A correct implementation of nub which made use of Ord would maintain
(say) a Data.Set of elements already seen, as it traversed the list
lazily, producing elements in the output as soon as new elements were
seen in the input, and no later. This of course guarantees that you
return them in the order that they're first seen as well. This is
still O(n log n), but reduces correctly to O(k log k) when only k
elements of the input are needed to get the desired number of elements
of the resulting list. Please don't make nub any stricter!

On a side note related to the request for inclusion of complexities,
since Haskell is a lazy language, we really ought to have complexities
written in terms of the demanded portion of the result. Of course,
Data.Set and Data.Map are (structure) strict, so it doesn't affect
them so much, but it would certainly be nice for the functions in
Data.List to know the answer to "if If the input is size n and I only
demand k of the elements of the result, then what is the complexity?",
especially for things like sort, where a lazy implementation can, for
instance, make the head of the list available in just O(n) time.

 - Cale


More information about the Libraries mailing list