Performance horrors

Adrian Hey ahey at iee.org
Tue Aug 26 13:32:40 EDT 2008


Neil Mitchell wrote:
> There are 12 instances in
> Hoogle, for example. If profiling later shows it to be a problem, I'd
> fix it - but I can't ever actually remember that being the case.
> Premature optimisation is the root of all evil.

I'm sure most of the community would agree with you, but I have to
say that if the consequence of this philosophy is horrors like this
in the standard libs, it should come as no surprise that Haskell has
a reputation for being "slow". What else is lurking there I wonder?
What's really bad is that the terrible performance isn't even
documented. It may be obvious, but it should still be documented.

Has anybody even the remotest clue why darcs is (apparently) so slow?
Maybe the community itself should share some of the blame for this. Like
it wasn't obvious to me that the uses of nub I saw in darcs could rely
on very short lists (<=1 element :-)

> Having nubOrd is a great idea, I even proposed it previously, but
> people disagreed with me. Propose it and add it by all means.

Like I said, I'm not proposing it, as it doesn't seem to possible to
implement it efficiently using anything else in the standard libs.
You could do nubOrd (but not nubOrdBy) using Data.Set.

But there are 2 problems..
1- This not only introduces a cyclic dependency between modules, but
also packages. I'm not sure how well various compilers and cabal would
deal with this between them, but I'm not optimistic.

2- Data.Set is not obviously the best underlying implementation (in
fact it is obviously not the best underlying implementation, this and
Data.Map etc really should be pensioned off to hackage along with the
rest of the badly documented, unreliable, inefficient and unstable
junk :-)

So I still think they should be deprecated. It seems like the least bad
option if we can agree that their use should be strongly discouraged.

>> So could nub and nubBy be deprecated and an appropriate health warning
>> added to the Haddock?
> 
> No. They should say O(n^2) in the haddock documentation, but O(n^2) /= useless.

I would say it is if there are many obvious O(n.(log n)) or better
implementations that can be used in in 99%+ of cases. I mean so
obvious that users can quite easily roll their own in 3 lines
of code or less.

Regards
--
Adrian Hey



More information about the Libraries mailing list