[Haskell-cafe] ordNub

Wed Oct 16 16:46:14 UTC 2013

On Sun, Oct 13, 2013 at 6:25 PM, Niklas Hambüchen <mail at nh2.me> wrote:
> ordNub behaves has the same behaviour as nub, while (Set.toList .
> Set.fromList) doesn't.

Perhaps you mean that they produce exactly the same
output when fully evaluated. But I doubt that their behavior
is *exactly* the same in every aspect. Given the differences
in strictness between sets and lists, can you prove that
nub and nubOrd have *exactly* the same strictness properties
in every possible case?

> ...ordNub is *always* faster than nub, even for singleton lists.

This is also a claim that I doubt. You say that you tested the
case of singletons. How about this:

2 : replicate 100000 1 ++ [3]

Well, you could optimize for that by having nubOrd squeeze
runs of of equal elements in the input. But then how about
something like this:

[3,2,1] ++ take 100000 (cycle [3,2,1]) ++ [4]

Or variations on that, cycling some other fairly small number of
elements. Set containers must do extra comparisons, whereas
the of cost running down the first few hundred elements of a linked
list (as long as the compiler fuses the iteration down to a tight
loop in the output code and you remain within the cache) is near zero.
How could nubOrd be as fast as ord in these kinds of cases?

I'm not saying that it wouldn't be worthwhile to add a standard
well-optimized implementation of nubOrd somewhere.
But nub is a very useful function. The only advantage of nubOrd
is for very large lists where the complexity advantage overtakes
the excellent constants of nub to provide better speed. In
practice, the cases where that's worthwhile are unusual.
I rarely bother with nubOrd over nub.

-Yitz