[Haskell-cafe] ordNub

Richard A. O'Keefe ok at cs.otago.ac.nz
Tue Jul 16 05:35:22 CEST 2013


On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
> 
> I'm still against having an Ord version, since my intuition tells me
> that hash-based data structures are faster than ordered ones.

There are at least four different things that "an Ord version" might
mean:

 - first sort a list, then eliminate duplicates
 - sort a list eliminating duplicates stably as you go
   (think 'merge sort', using 'union' instead of 'merge')
 - build a balanced tree set as you go
 - having a list that is already sorted, use that to
   eliminated duplicates cheaply.

These things have different costs.  For example, if there are N
elements of which U are unique, the first as O(N.log N) cost,
the third has O(N.log U) cost, and the fourth has O(N) cost.

What I want is more often ordNubBy than ordNub, though.

> Someone
> else can write the patch, though!
> 
> As a tangent, can anyone think of a data structure for which you can
> write an Ord instance but Hashable/Eq is impossible (or prove
> otherwise)? How about the converse?

Since Ord has Eq as a superclass, and since 0 is a functionally
correct hash value for anything, if you can implement Ord you
can obviously implement Hashable/Eq.  Whether it is *useful* to
do so is another question.

It turns out that it _is_ possible to define good quality hash
functions on sets, but most code in the field to do so is pretty bad.
(Just a modular sum or exclusive or.)




More information about the Haskell-Cafe mailing list