Clark Gaebel cgaebel at uwaterloo.ca
Tue Jul 16 05:58:56 CEST 2013

```nubBy is a very good suggestion. Added!

Regarding good hash functions: if your data structure is algebraic,
you can derive generic and Hashable will give you a pretty good hash
function:

> data ADT a = C0 Int String | C1 [a]
>   deriving Generic
>
> instance Hashable a => Hashable (ADT a)

It's magic!

- Clark

On Mon, Jul 15, 2013 at 11:35 PM, Richard A. O'Keefe <ok at cs.otago.ac.nz> wrote:
>
> On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
>>
>> I'm still against having an Ord version, since my intuition tells me
>> that hash-based data structures are faster than ordered ones.
>
> There are at least four different things that "an Ord version" might
> mean:
>
>  - first sort a list, then eliminate duplicates
>  - sort a list eliminating duplicates stably as you go
>    (think 'merge sort', using 'union' instead of 'merge')
>  - build a balanced tree set as you go
>  - having a list that is already sorted, use that to
>    eliminated duplicates cheaply.
>
> These things have different costs.  For example, if there are N
> elements of which U are unique, the first as O(N.log N) cost,
> the third has O(N.log U) cost, and the fourth has O(N) cost.
>
> What I want is more often ordNubBy than ordNub, though.
>
>> Someone
>> else can write the patch, though!
>>
>> As a tangent, can anyone think of a data structure for which you can
>> write an Ord instance but Hashable/Eq is impossible (or prove
>> otherwise)? How about the converse?
>
> Since Ord has Eq as a superclass, and since 0 is a functionally
> correct hash value for anything, if you can implement Ord you
> can obviously implement Hashable/Eq.  Whether it is *useful* to
> do so is another question.
>
> It turns out that it _is_ possible to define good quality hash
> functions on sets, but most code in the field to do so is pretty bad.
> (Just a modular sum or exclusive or.)
>
> _______________________________________________