nub

Jan-Willem Maessen jmaessen@alum.mit.edu
Fri, 29 Nov 2002 11:43:26 -0500


I find that nub is nearly always the wrong tool for the job (unless
the job is trivial quickie coding).  I'll point out that:

> map head . group . sort

has O(n lg n) asymptotic complexity, whereas nub or (sort . nub) both
are O(n^2).  This fact seems all too frequently forgotten.  For short
lists it doesn't matter much---but I find I like to use set union a
lot, and "merge" is quite a bit better than List.union for this, so I
tend to keep even short sets sorted.  (For bigger sets, of course, one
should import a nice set implementation.)

Richard Braakman <dark@xs4all.nl> writes:
> uniq = map head . group

and later:
> I think both functions are useful.  If I understand it right, uniq
> can evaluate its argument list lazily, while nub cannot.  There's no
> real need to put uniq in the standard library, though.

Here's a little experiment for folks with big Haskell programs:

Look for uses of "group" in your code.  I suggest:
  find -name '*.hs' | xargs grep -n '[^a-zA-Z]group[^a-zA-Z]' 

Now, look at two divisions of these calls:
1) What proportion are *not* performed in conjunction with a call to
   sort?

2) What proportion are *not* preceded by "map head"?

In the case of the Eager Haskell compiler, the answer to (1) was
"none", and the answer to (2) was "several".

I suspect I'm not uniq in this finding.  Providing "uniq" and/or
"uniqSort" as prelude functions could in fact be a great service.  

Of course, if I only got to add one function to List, it would be
some variant of "merge".  

-Jan-Willem Maessen