Summary of containers patches

Thu Sep 23 21:12:21 EDT 2010

If I am reading this right, the numbers for alter/delete/update are rather unsettling in your proposal.

Is there a compromise point that does not cause those to suffer so much?

Because otherwise doubling the object code size doesn't seem like such a terrible burden for the such significant speed benefits we currently see in those areas.

-Edward

On Sep 23, 2010, at 8:22 PM, Milan Straka <fox at ucw.cz> wrote:

> Hi Simon, others,
> 
> here comes the results.
> 
> I consider 4 variants of containers:
> inline: the current fox.auryn.cz/darcs/containers with many INLINE pragmas
> nothing: all INLINE pragmas deleted
> inlinable: all INLINE pragmas replaced with INLINABLE
> mix: all INLINE pragmas replaced with INLINABLE, but lookup*, member*
>   and insert* functions are kept INLINE.
> 
> At first code bloat. Writing size of the binaries in bytes, all binaries
> are stripped, I386.
> 
> * map-properties - binary running test on Data.Map, uses many functions
>  from Data.Map
>  - inline: 5266740
>  - nothing: 2325940
>  - inlinable: 2348116
>  - mix: 2571156
>  That is 2.9MB increase in size from nothing to inline.
> 
> * Map.hs - benchmark of Data.Map, using several functions only
>  - inline: 24465452
>  - nothing: 24302348
>  - inlinable: 24326924
>  - mix: 24417100
>  160kB only from nothing to inline.
> 
> Another code bloat. This time I compiled
> ghc-7.0: GHC itself with containers containing INLINE pragmas
>  20723280B
> ghc-7.0-noinlinemap: GHC itself with containers without INLINE pragmas
>  19965464B
> That is 740kB difference between ghc-7.0 binaries.
> (There is 1082kB difference between libHSghc-7.0.0.a libraries.)
> 
> Another code bloat. This time Map.hs benchmark again. All packages
> needed for the benchmark (most notably criterion) are compiled:
> - against containers package contains INLINE pragmas
>  24465452B
> - against containers package without INLINE pragmas
>  23753808B
> That is 700kB difference _on one binary_.
> 
> So the code bloat is considerable.
> 
> Now the timing results. The results are the change in runtime in %.
> [1] speedup between inline and noinline (negative numbers means slowdown)
> [2] speedup between inline and inlinable
> [3] speedup between inline and mix
> (the results are += several %, I did not have time to do it better)
> 
>                            [1]        [2]        [3]
> alter                       -50.33     -49.17     -47.51
> delete                      -46.67     -50.93     -48.79
> difference                  1.93       0.18       6.33
> insert                      -44.11     -44.25     -3.09
> insertLookupWithKey empty   -14.62     -15.70     1.22
> insertLookupWithKey update  -21.54     -19.23     6.20
> insertLookupWithKey' empty  -16.79     -23.67     -2.82
> insertLookupWithKey' update -26.62     -26.99     0.47
> insertWith empty            -44.42     -48.29     -5.88
> insertWith update           -43.28     -45.03     2.50
> insertWith' empty           -45.30     -46.66     -3.45
> insertWith' update          -38.15     -38.27     3.94
> insertWithKey empty         -42.60     -45.58     -6.99
> insertWithKey update        -40.13     -36.62     -1.71
> insertWithKey' empty        -46.54     -46.53     -4.22
> insertWithKey' update       -43.10     -42.16     0.98
> intersection                1.42       -1.75      0.37
> lookup                      -94.03     -94.62     6.14
> lookupIndex                 -86.25     -86.92     4.91
> union                       -0.04      -0.87      2.12
> update                      -50.64     -55.27     -54.17
> updateLookupWithKey         -43.90     -43.31     -49.22
> 
> So, the INLINABLE does not help at all.
> 
> I in fact expected this outcome -- it is not inlining itself that is
> beneficial, it is the _specialization_ (which happens during inlining if
> possible).
> 
> Please bear in mind that the speedup caused by specialization is
> significant only in the case when the Ord.compare function is trivial
> (if it is an instruction like <=, the result between instruction call
> and a dynamic dictionary call is very big). That means things like Map
> Int, Map Word, Map Int8 and so on. But with these types one usually uses
> IntMap. What I am saying is that the usage of Map which would benefit
> from specialization may not be very common.
> 
> At this point I suggest the following:
> - as immediate solution, remove all INLINE from the containers package,
>  just leave INLINE in lookup*, member* and insert* functions. This
>  causes small code bloat and quite big wins on these functions, which
>  I consider to be very frequent.
> 
>  I could imagine to remove also INLINE for insert* functions, or to add
>  INLINE for delete*, update* and alter*.
> 
> - in the long term, find a way to get back the speedup without the code
>  bloat. The problem is that _every_ call to a lookup has a different
>  code in the binary. We need to share the code of specialized functions
>  (like the c++ templates do). I mean the following:
>  - before linking, find out which dictionaries are passed to
>    Data.Map.lookup method. Specialize Data.Map.lookup for these types
>    (only once for each type) and change the calls to lookup with
>    specific dictionaries.
>  The effect should be as if user said SPECIALIZE, for each method
>  only for types it is non-generically being used for.
> 
> - I would not bother with INLINABLE currently.
> 
> Simon, others, do you agree with my short-term solution? If you do,
> I will push the patches (I need to do it before I fly to ICFP, which
> will be in 30 hours).
> 
> Cheers,
> Milan
> 
>>> as you probably know, there are several containers patches dealing with
>>> performance, separate testsuite and benchmarking.
>>> 
>>> Specifically, I mean tickets
>>> http://hackage.haskell.org/trac/ghc/ticket/4277
>>> http://hackage.haskell.org/trac/ghc/ticket/4279
>>> http://hackage.haskell.org/trac/ghc/ticket/4280
>>> http://hackage.haskell.org/trac/ghc/ticket/4311
>>> http://hackage.haskell.org/trac/ghc/ticket/4312
>>> http://hackage.haskell.org/trac/ghc/ticket/4333
>>> 
>>> I have all of them in the following repository
>>> http://fox.auryn.cz/darcs/containers/
>>> 
>>> There are no user-visible changes, just performance and the split
>>> testsuite.
>>> 
>>> There were no comments in some time, so I would like to push them to
>>> containers.
>>> 
>>> There is just the code-bloat issue to solve. The excessive use of INLINE
>>> pragma causes a large code bloat. I am aware of it, it happened even
>>> when I was benchmarking the containers.
>>> 
>>> Personally I vote for:
>>> - changing INLINE to INLINABLE
>>> - leave INLINE just in
>>>   - member/lookup functions
>>>   - insert* functions
>>>   as these benefit from specialization a lot and do not cause such
>>>   a bloat (measured during the internship, do not recall the numbers
>>>   now).
>>> 
>>> I suspect the performance to drop a bit, but I think it is better to
>>> have a bit slower containers without such a code bloat.
>>> 
>>> Simon, I will be flying to IFCP soon, but if you want, I can prepare the
>>> INLINE->INLINABLE patches and put them in the repo.
>> 
>> That would help a lot, thanks.  I started on this today but I'm a bit  
>> overloaded, and getting criterion installed wasn't very smooth (a couple  
>> of package breakages, but nothing serious and patches are upstream).
>> 
>> Cheers,
>>    Simon
>> 
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries