Map folds in containers: why no worker/wrapper split?
claus.reinke at talk21.com
Sat Jun 26 03:27:08 EDT 2010
>> Lots of performance problems with containers are about
>> pragmatics, not representation, eg, one knows that Maps are
>> strict in keys but available operations happen to be defined
>> in such a way that one cannot easily ensure other desirable
>> strictness properties.
> Not to put any extra burden on the messenger, but do you have a summary
> of the particular strictnesses that are often desired, or the common
> problems encountered in practice?
Can't provide an exhaustive summary, but perhaps I can illustrate
the general overview already given in a bit more detail. The
background for these problems is usually the same, namely an
application that needs a Map that is strict in its values, not just in
its keys. That is not the default usage, and it is not well supported.
Let us say, some working code is translated from a strict functional
language, and the translator just wants to make the code do the
same thing in the non-strict language Haskell (without changing
the algorithm to take advantage of non-strictness). That is sadly
typical of the "my Haskell code is slow and runs out of memory"
threads (sometimes strictness is just not thought about, though).
So we need to ensure an invariant that would be the default
in a strict language: the contents of Map entries are always
evaluated. With that in mind, look at the Data.Map API again:
1. without looking at the source, the existence of insertWith(Key)'
is the only hint that the other operations are not strict in the
values (and insertWith' is not in IntMap, only in Map)
2. you can probably guess that Maps are strict in keys and
representation, from the description as balanced trees, but
to be sure, you'll be looking at the source
3. of the operations, we are not concerned with lookups (we
need to evaluate "values" before putting them into the Map)
nor with deletion; so what about construction/modification/
combination (note that inspection of source is necessary to
make any progress)?
- singleton/insert: easy enough
- insertWith(Key): oops! we can't use this; and insertWith(Key)'
is only provided for Map, not IntMap (so we can't switch
between the two),
- insertLookupWithKey: oops
- adjust*/update* go back to udpateWithKey: looks useable
- alter: looks useable
- union(s): look ok
- union(s)With(Key): oops
- difference*: ok
- intersection: ok
- intersectionWith(Key): oops
- map(WithKey): oops
- mapAccum(R)(WithKey): oops
- mapKeys: ok
- mapKeysWith: oops
- mapKeysMonotonic: ok
Even if you do this analysis (many users won't, and I don't
guarantee I've got it right), you're left with a substantially
smaller useable API. "useable/ok" here means that there is
a way to modify the API functions to enforce the strictness
invariant we're after (all values evaluated); "oops" means
that even if we wanted to, we could not use the function
because it does not give us the means to add strictness.
Generally, higher-order operations that apply functions
but simply put the result in a non-strict field of the Map
representation are suspect here; things like adjust/alter
are more helpful because we can tie the Just/Nothing
decision to prior evaluation; things like folds suffer from
the usual issues (non-strict in accumulator, not written
in a way that would help the strictness analyser to
produce a specialised version); ..
Chances are (as various "my Haskell code is slow/overflows"
posts have illustrated over the years) that users will get it
wrong. And those who try to get it right won't be happy
with the API support for their use case (the useable API
is so reduced that it is tempting to fork the code instead).
So most of it boils down to just one common problem,
with many different symptoms. The solution has to be
designed into the API from the start, and it needs to be
consistent and documented. One starts with the two
main use cases (Maps strict or non-strict in values) and
investigates how that affects all the API operations.
Implementing everything twice would be code
duplication, but better than what we have now
(suggestions for limiting code duplication exist as well:
specialize the representation or the operations for the
two use cases, using specialization, inlining, strictness
analysis, type families, or ..). Implementers are turned
away by the code duplication of the simple solution,
and nobody has done the measurements to compare
the more elegant solutions, as far as I know.
The default use case (Maps not strict in values) works,
the other use case (Maps strict in values) is understood,
but not well supported (making sure that availability
of a key can be made dependent on the evaluation of
its value would be sufficient, but all too many operations
fall short of supporting this). Which is why I attributed
the problems to pragmatics.
Does this help?
> In particular I'm thinking in the context of the bytestring-trie package
> and how it can be improved. The initial API sought to mirror the
> containers version of Map/IntMap, because they seemed rather stable at
> the time. However, there've been a number of changes and proposals
> lately, so I want to be sure I don't miss anything.
API mirroring seems a good idea, to make it easier to
choose the representation most suited for an application.
It should work both ways: if you find the need to provide
slightly different operations to support strict mode with
your package, suggest that containers should follow.
More information about the Libraries