Proposal: Remove Semigroup and Monoid instances for Data.Map, Data.IntMap, Data.HashMap

David Feuer david.feuer at gmail.com
Tue Feb 13 19:33:45 UTC 2018


Many people have recognized for years that the Semigroup and Monoid
instances for Data.Map, Data.IntMap, and Data.HashMap are not so
great. In particular, when the same key is present in both maps, they
simply use the value from the first argument, ignoring the other one.
This somewhat counter-intuitive behavior can lead to bugs. See, for
example, the discussion in Tim Humphries's blog post[*]. I would like
do do the following:

1. Deprecate the Semigroup and Monoid instances for Data.Map.Map,
Data.IntMap.IntMap, and Data.HashMap.HashMap in the next releases of
containers and unordered-containers.

2. Remove the deprecated instances.

3. After another several years (four or five, perhaps?), make a major
release of each package in which the instances are replaced with the
following:

  instance (Ord k, Semigroup v) => Semigroup (Map k v) where
    (<>) = Data.Map.Strict.unionWith (<>)
  instance (Ord k, Semigroup v) => Monoid (Map k v) where
    mempty = Data.Map.Strict.empty

  instance Semigroup v => Semigroup (IntMap v) where
    (<>) = Data.IntMap.Strict.unionWith (<>)
  instance Semigroup v => Monoid (IntMap v) where
    mempty = Data.IntMap.Strict.empty

  instance (Eq k, Hashable k, Semigroup v) => Semigroup (HashMap k v) where
    (<>) = Data.HashMap.Strict.unionWith (<>)
  instance (Eq k, Hashable k, Semigroup v) => Monoid(HashMap k v) where
    mempty = Data.HashMap.Strict.empty

Why do I want the strict versions? That choice may seem a bit
surprising, since the data structures are lazy. But the lazy versions
really like to leak memory, making them unsuitable for most practical
purposes.

The big risk:

Someone using old code or old tutorial documentation could get subtly
wrong behavior without noticing. That is why I have specified an
extended period between removing the current instances and adding the
desired ones.

Alternatives:

1. Remove the instances but don't add the new ones. I fear this may
lead others to write their own orphan instances, which may not even
all do the same thing.

2. Write separate modules with newtype-wrapped versions of the data
structures implementing the desired instances. Unfortunately, this
would be annoying to maintain, and also annoying to use--packages
using the unwrapped and wrapped versions will use different types.
Manually wrapping and unwrapping to make the different types work with
each other will introduce lots of potential for mistakes and
confusion.

Discussion period: three weeks.

[*] http://teh.id.au/posts/2017/03/03/map-merge/index.html


More information about the Libraries mailing list