Summary of containers patches

Thu Sep 23 21:17:37 EDT 2010

fox:
> Now the timing results. The results are the change in runtime in %.
> [1] speedup between inline and noinline (negative numbers means slowdown)
> [2] speedup between inline and inlinable
> [3] speedup between inline and mix
> (the results are += several %, I did not have time to do it better)
> 
>                             [1]        [2]        [3]
> alter                       -50.33     -49.17     -47.51
> delete                      -46.67     -50.93     -48.79
> difference                  1.93       0.18       6.33
> insert                      -44.11     -44.25     -3.09
> insertLookupWithKey empty   -14.62     -15.70     1.22
> insertLookupWithKey update  -21.54     -19.23     6.20
> insertLookupWithKey' empty  -16.79     -23.67     -2.82
> insertLookupWithKey' update -26.62     -26.99     0.47
> insertWith empty            -44.42     -48.29     -5.88
> insertWith update           -43.28     -45.03     2.50
> insertWith' empty           -45.30     -46.66     -3.45
> insertWith' update          -38.15     -38.27     3.94
> insertWithKey empty         -42.60     -45.58     -6.99
> insertWithKey update        -40.13     -36.62     -1.71
> insertWithKey' empty        -46.54     -46.53     -4.22
> insertWithKey' update       -43.10     -42.16     0.98
> intersection                1.42       -1.75      0.37
> lookup                      -94.03     -94.62     6.14
> lookupIndex                 -86.25     -86.92     4.91
> union                       -0.04      -0.87      2.12
> update                      -50.64     -55.27     -54.17
> updateLookupWithKey         -43.90     -43.31     -49.22
> 
> So, the INLINABLE does not help at all.
> 
> I in fact expected this outcome -- it is not inlining itself that is
> beneficial, it is the _specialization_ (which happens during inlining if
> possible).
> 
> Please bear in mind that the speedup caused by specialization is
> significant only in the case when the Ord.compare function is trivial
> (if it is an instruction like <=, the result between instruction call
> and a dynamic dictionary call is very big). That means things like Map
> Int, Map Word, Map Int8 and so on. But with these types one usually uses
> IntMap. What I am saying is that the usage of Map which would benefit
> from specialization may not be very common.

50%. Huh. 

Do you have enough time  to look on Hackage to see at which types Map is
used most frequently? I suspect Map Int is actually pretty common. 

> At this point I suggest the following:
> - as immediate solution, remove all INLINE from the containers package,
>   just leave INLINE in lookup*, member* and insert* functions. This
>   causes small code bloat and quite big wins on these functions, which
>   I consider to be very frequent.
> 
>   I could imagine to remove also INLINE for insert* functions, or to add
>   INLINE for delete*, update* and alter*.
> 
> - in the long term, find a way to get back the speedup without the code
>   bloat. The problem is that _every_ call to a lookup has a different
>   code in the binary. We need to share the code of specialized functions
>   (like the c++ templates do). I mean the following:
>   - before linking, find out which dictionaries are passed to
>     Data.Map.lookup method. Specialize Data.Map.lookup for these types
>     (only once for each type) and change the calls to lookup with
>     specific dictionaries.
>   The effect should be as if user said SPECIALIZE, for each method
>   only for types it is non-generically being used for.
> 
> - I would not bother with INLINABLE currently.
> 
> Simon, others, do you agree with my short-term solution? If you do,
> I will push the patches (I need to do it before I fly to ICFP, which
> will be in 30 hours).

Well, I still don't understand what penalty the code bloat is exactly.
Making things a lot slower is clearly bad, making disk size a bit bigger
-- when my harddrive is already 8x bigger than 3 years ago, doesn't seem
so bad.

-- Don