A more useful Monoid instance for Data.Map
Henning Thielemann
lemming at henning-thielemann.de
Tue Mar 12 23:29:30 CET 2013
On Tue, 12 Mar 2013, Milan Straka wrote:
> Hi all,
>
>> -----Original message-----
>> From: Ben Gamari <bgamari.foss at gmail.com>
>> Sent: 12 Mar 2013, 11:06
>>
>> <snip>
>>
>> We should take advantage of this opportunity to begin an easy cleanup
>> and accept that there are still other issues that will need to be dealt
>> with once the tools for doing so exist.
>
> IIRC, the instances we are talking about are
>
>
> Current:
> instance (Ord k) => Monoid (Map k v) where
> mempty = empty
> mappend = union
> mconcat = unions
>
>
> Proposed:
> instance (Ord k, Monoid v) => Monoid (Map k v) where
> mempty = empty
> mappend map0 map1 = unionWith mappend map0 map1
> mconcat maps = foldr (unionWith mappend) empty maps
> or even
> instance (Ord k, Semigroup v) => Monoid (Map k v) where
> mempty = empty
> mappend map0 map1 = unionWith <> map0 map1
> mconcat maps = foldr (unionWith <>) empty maps
>
>
> I am not convinced that the proposed instance is universally the best
> one. I like the current instance -- I sometimes use it on
> Map k v where v is a "primitive" type like Int or Double for which no
> Monoid instance exists. Personally, I am also not sure that
> fromList [(1, "A")] `mappend` fromList [(1, "B")
> should really be
> fromList [(1, "AB")]
>
> The price for any change is very high:
> a) silent semantic change for replacing the instance is a no go,
> consider for example for Map k String or Map k ByteString.
> b) removing and then adding the instance means we will have two major
> version bumps. That places a burden on every user of the library to
> at least upgrade upper bounds. And as Edward Kmett wrote earlier
> today, the users that want to use the Monoid instance must revert to
> some kind of conditional compilation.
>
> I am not saying we should never do any breaking changes, but in this
> case, the benefit seems rather small to me.
First of all, I do not know, why people require the current semantics of
"mappend" for Data.Map and why they do not just use "Map.union" in order
to explicitly tell that they drop duplicate elements intentionally.
The best semantics for "mappend" should be driven by the question what is
best for working with monoids (e.g. as writer type in the Writer monad).
The original proposal gave the good argument that by choosing a monoid
type for the Map element you can choose how elements for duplicate keys
are merged.
My experience with Map.union and Map.fromList is that I had bugs that were
very hard to debug because I did not kept the case of duplicate keys in
mind. Now I have added warnings to HLint for every use of Map.union and
Map.fromList. Either duplicate keys are allowed, then I should explicitely
state how to combine them, or keys must be distinct, then I should use
Map.unionWith (error "duplicate keys"). Unfortunately, my HLint checks are
bypassed if the 'Map.union' is disguised as a 'mappend'.
I think that changing 'mappend' in the proposed way has the advantage to
make programmers think about duplicate keys. The answer could be to
replace 'mappend' by 'Map.union' or choose an element Monoid for combining
the elements. Additionally Semigroup/Monoid types like First and Last are
more explicit than the current behaviour of "mappend" where you cannot
guess whether the left or the right element is kept when keys clash.
More information about the Libraries
mailing list