[Haskell-cafe] Re: (flawed?) benchmark : sort

Adrian Hey ahey at iee.org
Tue Mar 11 03:48:51 EDT 2008


Jonathan Cast wrote:
> On 10 Mar 2008, at 4:00 AM, Adrian Hey wrote:
> 
>> Neil Mitchell wrote:
>>>>  2) What does it do with duplicate elements in the list? I expect it 
>>>> deletes
>>>>  them. To avoid this, you'd need to use something like fromListWith, 
>>>> keeping
>>>>  track of how many duplicates there are, and expanding at the end.
>>> That would be wrong. Consider:
>>> data Foo = Foo Int Int
>>> instance Ord Foo where
>>>     compare (Foo a _) (Foo b _) = compare a b
>>
>> I would consider such an Ord instance to be hopelessly broken, and I
>> don't think it's the responsibility of authors of functions with Ord
>> constraints in their sigs (such as sort) to consider such possibilities
>> or specify and control the behaviour of their behaviour for such
>> instances. Trying to do this is what leads to horrors such as the "left
>> biasing" of Data.Map (for example).
> 
> Data.Map is implicitly using such an Ord instance behind the scenes, and 
> I think it has to to maintain its own invariants.  If I take the `union' 
> of two maps that take the same key to different values, I have to decide 
> which value to use, even if every Ord instance supplied by my clients is 
> 100% Adrian-compliant.

The biasing policy for Data.Map/Set is refering to (Set) elements, or
(Map) keys, not the associated values (in a Map). So during an insertion
op, if an "equal" element/key is found the Set/Map the biasing policy
tells me which of the two "equal" elements/keys in incorporated in the
resulting Set/Map.

So there's an implied acceptance of the posibility that the choice is
significant and that the two elements/keys can be both "equal" and "not
equal" at the same time. This is crazy IMO. Implementors should not
have to offer an guarantees about this and should be perfectly free
to make their own unspecified choice regarding which of two "equal"
values is used in any expression (on space efficiency grounds say).

For example, the left biasing of insertion on Data.Set forces the
implementation to burn O(log n) heap space creating a new "equal" set,
even if the set already contains an old element that is "equal" to the
new element. I would argue that in this situation it should be perfectly
correct to simply return the old set instead.

Regards
--
Adrian Hey



More information about the Haskell-Cafe mailing list