[Haskell-cafe] ordNub

Mon Oct 14 02:20:51 UTC 2013

> Niklas Hambüchen <mail <at> nh2.me> writes:
> 
> > On 13/10/13 21:42, AntC wrote:
> > ...
> > If you use the Set library, that fact may be very visible!
> > Because Set re-sequences the whole list, as per its Ord instance.
> > 
> > But List.nub preserves the list sequence
> > (except for omitting duplicates).
> 
> I mean *exactly* what you say here.
> 
> ordNub behaves has the same behaviour as nub, while (Set.toList .
> Set.fromList) doesn't.
> 

That's great, thank you.

> > [BTW I am still less than convinced that overall a Set-based ordNub is 
> > significantly more efficient. I suspect it depends on how big is your 
> > list.]
> 
> What do you mean?
> 
> ordNub is clearly in a different complexity class, ...

Yes, I'm not disputing that.

> ... and the benchmarks that I provided show not only this,
> but also that ordNub is *always* faster than nub,
> even for singleton lists.

Thanks Niklas, I hadn't spotted those benchmarks back in July.

I'm surprised at that result for singletons 
(and for very small numbers of elements which are in fact each different).

Especially because List's `nub` uses `filter` ==> fold, which should be 
tail-recursive.

It seems to me that for small numbers, the Set-based approach still 
requires comparing each element to each other. Plus there's the overhead 
for building the Set and inserting each element into it -- where `insert` 
again walks the Set to find the insertion point.

Then here's a further possible optimisation, instead of making separate 
calls to `member` and `insert`:
* Make a single call to
    insert' :: (Ord a) => a -> Set a -> (Bool, Set a)
* The Bool returns True if already a member.
* Else returns an updated Set in the snd, with the element inserted.