Map folds in containers: why no worker/wrapper split?
Milan Straka
fox at ucw.cz
Thu Jun 24 06:03:37 EDT 2010
Hi,
I agree the strictness is undocumented and inconsistent (I was bitten by
it several times).
I did not mention it much in the paper, maybe I should have.
I want to work on the patches now with the knowledge of the benchmark
results, and the strictness info is another thing I want to deal with.
BTW, the paper speaks of sets only, but mentions that it applied to maps
too. The benchmarks contains both Set/Map and IntSet/IntMap and the
initial patches were also to all there four containers.
Cheers,
Milan
>> I did quite a lot of benchmarking, strictness analysis and performance
>> improvements to the containers package recently, see the draft of my
>> paper http://fox.ucw.cz/papers/containers/containers.pdf.
>>
>> Next week I will start finalizing the patches and submit them. The 'not
>> generating a worker/wrapper' issue is one of several on my radar, so
>> this should be resolved.
>
> That is good to hear. It is great that someone is doing the work
> needed to pin down, measure, document, and perhaps even cure some of the
> performance issues in containers. If nothing else, a published summary of
> the performance issues that have popped up over the years would be a
> start.
>
> Btw, the first thing I'd look for in a paper on containers performance,
> and the one issue that tends to trip up most users, is strictness. So I
> was surprised not to see a discussion of this in your draft (at least
> not in a quick browse). Could it be that your work has focussed on
> the Sets rather than the Maps in containers?
>
> Lots of performance problems with containers are about
> pragmatics, not representation, eg, one knows that Maps are
> strict in keys but available operations happen to be defined
> in such a way that one cannot easily ensure other desirable
> strictness properties.
>
> For the worker/wrapper split, there are non-strict definitions
> of higher-order functions, which the compiler could make strict
> by using strictness information about the calling context if the
> definitions were written differently.
>
> For other operations, turning the non-strict definitions into
> strict versions (when strict versions are more appropriate;
> ideally, both variants should be well supported and documented) is often
> difficult for users of containers.
>
> The issues are also non-obvious, which is perhaps worse, because
> Map/IntMap is often used to attempt to speed up a loop accumulator, with
> the result that thunks pile up behind the scene and performance goes down
> until large problem sizes cannot be handled. At this point, users are no
> longer aware that they built in the performance problem right at the
> start, by using a supposedly efficient data structure
> in an unfortunate way - they think it is about Haskell and large
> datasets.
>
> Some examples that I happen to recall (there are more in the list
> archives of cafe and libraries, under various keywords, as the original
> poster doesn't realize that the root cause is in containers; there are
> also some related tickets in the GHC trac):
>
> - IntMap and Map expose different APIs wrt strictness
> http://www.haskell.org/pipermail/libraries/2009-March/011399.html
> http://www.mail-archive.com/haskell-cafe@haskell.org/msg48925.html
>
> - to work around the non-strict Map fold, users have to
> resort to nested folds (non-strict fold to convert to list,
> then strict foldl' over list) and other tricks:
> http://www.haskell.org/pipermail/haskell-cafe/2010-June/079098.html
>
> similarly non-trivial thinking is required to work around other
> no-control-over-strictness issues in the APIs, bypassing the
> obvious functions and using non-obvious ones based on
> knowledge about their implementation
>
> - strictness is used inconsistently:
> http://hackage.haskell.org/trac/ghc/ticket/4109
>
> - strictness is not documented: !!!
> search for "strict" in the Data.Map/IntMap documentation..
>
> this is really bad - the only way for users to become aware
> of strictness problems is by running into them, and the only way to
> fix strictness-related performance issues is by referring
> to the implementation (in theory, an implementation change
> could invalidate lots of performance fixes in production code)
>
> It seems that strictness documentation and control would make for a large
> section in a paper on containers performance.
>
> Claus
>
More information about the Libraries
mailing list