anthony_clayden at clear.net.nz
Mon Oct 14 02:20:51 UTC 2013
> Niklas Hambüchen <mail <at> nh2.me> writes:
> > On 13/10/13 21:42, AntC wrote:
> > ...
> > If you use the Set library, that fact may be very visible!
> > Because Set re-sequences the whole list, as per its Ord instance.
> > But List.nub preserves the list sequence
> > (except for omitting duplicates).
> I mean *exactly* what you say here.
> ordNub behaves has the same behaviour as nub, while (Set.toList .
> Set.fromList) doesn't.
That's great, thank you.
> > [BTW I am still less than convinced that overall a Set-based ordNub is
> > significantly more efficient. I suspect it depends on how big is your
> > list.]
> What do you mean?
> ordNub is clearly in a different complexity class, ...
Yes, I'm not disputing that.
> ... and the benchmarks that I provided show not only this,
> but also that ordNub is *always* faster than nub,
> even for singleton lists.
Thanks Niklas, I hadn't spotted those benchmarks back in July.
I'm surprised at that result for singletons
(and for very small numbers of elements which are in fact each different).
Especially because List's `nub` uses `filter` ==> fold, which should be
It seems to me that for small numbers, the Set-based approach still
requires comparing each element to each other. Plus there's the overhead
for building the Set and inserting each element into it -- where `insert`
again walks the Set to find the insertion point.
Then here's a further possible optimisation, instead of making separate
calls to `member` and `insert`:
* Make a single call to
insert' :: (Ord a) => a -> Set a -> (Bool, Set a)
* The Bool returns True if already a member.
* Else returns an updated Set in the snd, with the element inserted.
More information about the Haskell-Cafe