Summary of containers patches
Edward Kmett
ekmett at gmail.com
Thu Sep 23 21:12:21 EDT 2010
If I am reading this right, the numbers for alter/delete/update are rather unsettling in your proposal.
Is there a compromise point that does not cause those to suffer so much?
Because otherwise doubling the object code size doesn't seem like such a terrible burden for the such significant speed benefits we currently see in those areas.
-Edward
On Sep 23, 2010, at 8:22 PM, Milan Straka <fox at ucw.cz> wrote:
> Hi Simon, others,
>
> here comes the results.
>
> I consider 4 variants of containers:
> inline: the current fox.auryn.cz/darcs/containers with many INLINE pragmas
> nothing: all INLINE pragmas deleted
> inlinable: all INLINE pragmas replaced with INLINABLE
> mix: all INLINE pragmas replaced with INLINABLE, but lookup*, member*
> and insert* functions are kept INLINE.
>
> At first code bloat. Writing size of the binaries in bytes, all binaries
> are stripped, I386.
>
> * map-properties - binary running test on Data.Map, uses many functions
> from Data.Map
> - inline: 5266740
> - nothing: 2325940
> - inlinable: 2348116
> - mix: 2571156
> That is 2.9MB increase in size from nothing to inline.
>
> * Map.hs - benchmark of Data.Map, using several functions only
> - inline: 24465452
> - nothing: 24302348
> - inlinable: 24326924
> - mix: 24417100
> 160kB only from nothing to inline.
>
> Another code bloat. This time I compiled
> ghc-7.0: GHC itself with containers containing INLINE pragmas
> 20723280B
> ghc-7.0-noinlinemap: GHC itself with containers without INLINE pragmas
> 19965464B
> That is 740kB difference between ghc-7.0 binaries.
> (There is 1082kB difference between libHSghc-7.0.0.a libraries.)
>
> Another code bloat. This time Map.hs benchmark again. All packages
> needed for the benchmark (most notably criterion) are compiled:
> - against containers package contains INLINE pragmas
> 24465452B
> - against containers package without INLINE pragmas
> 23753808B
> That is 700kB difference _on one binary_.
>
> So the code bloat is considerable.
>
> Now the timing results. The results are the change in runtime in %.
> [1] speedup between inline and noinline (negative numbers means slowdown)
> [2] speedup between inline and inlinable
> [3] speedup between inline and mix
> (the results are += several %, I did not have time to do it better)
>
> [1] [2] [3]
> alter -50.33 -49.17 -47.51
> delete -46.67 -50.93 -48.79
> difference 1.93 0.18 6.33
> insert -44.11 -44.25 -3.09
> insertLookupWithKey empty -14.62 -15.70 1.22
> insertLookupWithKey update -21.54 -19.23 6.20
> insertLookupWithKey' empty -16.79 -23.67 -2.82
> insertLookupWithKey' update -26.62 -26.99 0.47
> insertWith empty -44.42 -48.29 -5.88
> insertWith update -43.28 -45.03 2.50
> insertWith' empty -45.30 -46.66 -3.45
> insertWith' update -38.15 -38.27 3.94
> insertWithKey empty -42.60 -45.58 -6.99
> insertWithKey update -40.13 -36.62 -1.71
> insertWithKey' empty -46.54 -46.53 -4.22
> insertWithKey' update -43.10 -42.16 0.98
> intersection 1.42 -1.75 0.37
> lookup -94.03 -94.62 6.14
> lookupIndex -86.25 -86.92 4.91
> union -0.04 -0.87 2.12
> update -50.64 -55.27 -54.17
> updateLookupWithKey -43.90 -43.31 -49.22
>
> So, the INLINABLE does not help at all.
>
> I in fact expected this outcome -- it is not inlining itself that is
> beneficial, it is the _specialization_ (which happens during inlining if
> possible).
>
> Please bear in mind that the speedup caused by specialization is
> significant only in the case when the Ord.compare function is trivial
> (if it is an instruction like <=, the result between instruction call
> and a dynamic dictionary call is very big). That means things like Map
> Int, Map Word, Map Int8 and so on. But with these types one usually uses
> IntMap. What I am saying is that the usage of Map which would benefit
> from specialization may not be very common.
>
> At this point I suggest the following:
> - as immediate solution, remove all INLINE from the containers package,
> just leave INLINE in lookup*, member* and insert* functions. This
> causes small code bloat and quite big wins on these functions, which
> I consider to be very frequent.
>
> I could imagine to remove also INLINE for insert* functions, or to add
> INLINE for delete*, update* and alter*.
>
> - in the long term, find a way to get back the speedup without the code
> bloat. The problem is that _every_ call to a lookup has a different
> code in the binary. We need to share the code of specialized functions
> (like the c++ templates do). I mean the following:
> - before linking, find out which dictionaries are passed to
> Data.Map.lookup method. Specialize Data.Map.lookup for these types
> (only once for each type) and change the calls to lookup with
> specific dictionaries.
> The effect should be as if user said SPECIALIZE, for each method
> only for types it is non-generically being used for.
>
> - I would not bother with INLINABLE currently.
>
> Simon, others, do you agree with my short-term solution? If you do,
> I will push the patches (I need to do it before I fly to ICFP, which
> will be in 30 hours).
>
> Cheers,
> Milan
>
>>> as you probably know, there are several containers patches dealing with
>>> performance, separate testsuite and benchmarking.
>>>
>>> Specifically, I mean tickets
>>> http://hackage.haskell.org/trac/ghc/ticket/4277
>>> http://hackage.haskell.org/trac/ghc/ticket/4279
>>> http://hackage.haskell.org/trac/ghc/ticket/4280
>>> http://hackage.haskell.org/trac/ghc/ticket/4311
>>> http://hackage.haskell.org/trac/ghc/ticket/4312
>>> http://hackage.haskell.org/trac/ghc/ticket/4333
>>>
>>> I have all of them in the following repository
>>> http://fox.auryn.cz/darcs/containers/
>>>
>>> There are no user-visible changes, just performance and the split
>>> testsuite.
>>>
>>> There were no comments in some time, so I would like to push them to
>>> containers.
>>>
>>> There is just the code-bloat issue to solve. The excessive use of INLINE
>>> pragma causes a large code bloat. I am aware of it, it happened even
>>> when I was benchmarking the containers.
>>>
>>> Personally I vote for:
>>> - changing INLINE to INLINABLE
>>> - leave INLINE just in
>>> - member/lookup functions
>>> - insert* functions
>>> as these benefit from specialization a lot and do not cause such
>>> a bloat (measured during the internship, do not recall the numbers
>>> now).
>>>
>>> I suspect the performance to drop a bit, but I think it is better to
>>> have a bit slower containers without such a code bloat.
>>>
>>> Simon, I will be flying to IFCP soon, but if you want, I can prepare the
>>> INLINE->INLINABLE patches and put them in the repo.
>>
>> That would help a lot, thanks. I started on this today but I'm a bit
>> overloaded, and getting criterion installed wasn't very smooth (a couple
>> of package breakages, but nothing serious and patches are upstream).
>>
>> Cheers,
>> Simon
>>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries
More information about the Libraries
mailing list