[Git][ghc/ghc][wip/T21286] 3 commits: Improve aggressive specialisation

Simon Peyton Jones (@simonpj) gitlab at gitlab.haskell.org
Fri Sep 9 16:51:54 UTC 2022



Simon Peyton Jones pushed to branch wip/T21286 at Glasgow Haskell Compiler / GHC


Commits:
8fe28b19 by Simon Peyton Jones at 2022-09-09T17:49:41+01:00
Improve aggressive specialisation

This patch fixes #21286, by not unboxing dictionaries in
worker/wrapper (ever). The main payload is tiny:

* In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox
  dictionaries in `get_dmd`.  See Note [Do not unbox class dictionaries]
  in that modules

* I also found that imported wrappers were being fruitlessly
  specialised, so I fixed that too, in canSpecImport.
  See Note [Specialising imported functions] point (2).

In doing due diligence in the testsuite I fixed a number of
other things:

* Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make,
  and Note [Inline specialisations] in GHC.Core.Opt.Specialise,
  and remove duplication between the two. The new Note describes
  how we specialise functions with an INLINABLE pragma.

  And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`.

* Improve Note [Worker/wrapper for INLINABLE functions] in
  GHC.Core.Opt.WorkWrap.

  And (critially) make an actual change which is to propagate the
  user-written pragma from the original function to the wrapper; see
  `mkStrWrapperInlinePrag`.

* Write new Note [Specialising imported functions] in
  GHC.Core.Opt.Specialise

All this has a big effect on some compile times:

Metrics: compile_time/bytes allocated
--------------------------------------------------------
                LargeRecord(normal) ghc/alloc  6,084,071,354  -50.1% GOOD
           ManyConstructors(normal) ghc/alloc  3,928,349,810   +1.7%
MultiLayerModulesTH_OneShot(normal) ghc/alloc  2,523,518,560   +1.2%
                     T12545(normal) ghc/alloc  1,633,149,272   +3.1%
                     T13056(optasm) ghc/alloc    349,532,453   -8.7% GOOD
                     T13253(normal) ghc/alloc    343,592,469   -3.3% GOOD
                     T15164(normal) ghc/alloc  1,304,125,024   -3.4% GOOD
                     T16190(normal) ghc/alloc    278,584,392   -1.5%
                     T16577(normal) ghc/alloc  8,050,423,421   -2.8% GOOD
                     T17836(normal) ghc/alloc    829,913,981   +2.3%
                     T18223(normal) ghc/alloc    734,732,288  -33.3% GOOD
                     T18282(normal) ghc/alloc    150,159,957   -2.9% GOOD
                     T18478(normal) ghc/alloc    498,300,837   +1.2%
                     T19695(normal) ghc/alloc  1,444,571,802   -2.5% GOOD
                      T9630(normal) ghc/alloc  1,523,682,706  -32.8% GOOD
                      WWRec(normal) ghc/alloc    624,174,317   -9.6% GOOD
                     hie002(normal) ghc/alloc  9,020,356,301   +1.8%
-------------------------------------------------------------------------
                          geo. mean                            -1.7%
                          minimum                             -50.1%
                          maximum                              +3.1%

I diligently investigated all these big drops.

* Caused by not doing w/w for dictionaries:
    T13056, T15164, WWRec, T18223

* Caused by not fruitlesslly specialising wrappers
    LargeRecord, T9630

I also got one runtime improvement:
     T9203(normal) run/alloc     105,672,160    -10.7% GOOD
but I did not investigate.

Nofib is a wash:

+===============================++===============+===========+
|                     real/anna ||        -0.13% |      0.0% |
|                      real/fem ||        +0.13% |      0.0% |
|                   real/fulsom ||        -0.16% |      0.0% |
|                   real/gamteb ||        +0.02% |      0.0% |
|                       real/gg ||        +0.01% |      0.0% |
|                     real/lift ||        -1.55% |      0.0% |
|                  real/reptile ||        -0.11% |      0.0% |
|                      real/scs ||        -0.08% |      0.0% |
|                  real/smallpt ||        +0.51% |      0.0% |
|                   real/symalg ||        -0.01% |      0.0% |
|                  real/veritas ||        +0.05% |      0.0% |
|         shootout/binary-trees ||        +0.00% |      0.0% |
|       shootout/fannkuch-redux ||        -0.05% |      0.0% |
|         shootout/k-nucleotide ||        -0.01% |      0.0% |
|               shootout/n-body ||        -0.06% |      0.0% |
|        shootout/spectral-norm ||        +0.01% |      0.0% |
|          spectral/constraints ||        +0.20% |      0.0% |
|               spectral/dom-lt ||        +1.80% |      0.0% |
|               spectral/expert ||        +0.33% |      0.0% |

Metric Decrease:
    LargeRecord
    T13056
    T15164
    T16577
    T18223
    T9630
    WWRec
    T9203

- - - - -
e72a1eb5 by Simon Peyton Jones at 2022-09-09T17:53:17+01:00
Refactor UnfoldingSource and IfaceUnfolding

I finally got tired of the way that IfaceUnfolding reflected
a previous structure of unfoldings, not the current one. This
MR refactors UnfoldingSource and IfaceUnfolding to be simpler
and more consistent.

It's largely just a refactor, but in UnfoldingSource (which moves
to GHC.Types.Basic, since it is now used in IfaceSyn too), I
distinguish between /user-specified/ and /system-generated/ stable
unfoldings.

    data UnfoldingSource
      = VanillaSrc
      | StableUserSrc   -- From a user-specified pragma
      | StableSystemSrc -- From a system-generated unfolding
      | CompulsorySrc

This has a minor effect in CSE (see the use of isisStableUserUnfolding
in GHC.Core.Opt.CSE), which I tripped over when working on
specialisation, but it seems like a Good Thing to know anyway.

- - - - -
8be40982 by Simon Peyton Jones at 2022-09-09T17:53:17+01:00
INLINE/INLINEABLE pragmas in Foreign.Marshal.Array

Foreign.Marshal.Array contains many small functions, all of which are
overloaded, and which are critical for performance. Yet none of them
had pragmas, so it was a fluke whether or not they got inlined.

This patch makes them all either INLINE (small ones) or
INLINEABLE and hence specialisable (larger ones).

See Note [Specialising array operations] in that module.

- - - - -


30 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/Specialise.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/Ppr.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unfold/Make.hs
- compiler/GHC/CoreToIface.hs
- compiler/GHC/HsToCore.hs
- compiler/GHC/HsToCore/Binds.hs
- compiler/GHC/HsToCore/Foreign/C.hs
- compiler/GHC/Iface/Rename.hs
- compiler/GHC/Iface/Syntax.hs
- compiler/GHC/Iface/Tidy.hs
- compiler/GHC/IfaceToCore.hs
- compiler/GHC/Tc/TyCl/Instance.hs
- compiler/GHC/Types/Basic.hs
- compiler/GHC/Types/Id/Make.hs
- libraries/base/Foreign/Marshal/Array.hs
- testsuite/tests/arityanal/should_compile/Arity05.stderr
- testsuite/tests/arityanal/should_compile/Arity11.stderr
- testsuite/tests/arityanal/should_compile/Arity14.stderr
- testsuite/tests/deSugar/should_compile/T19969.stderr
- testsuite/tests/deSugar/should_compile/T2431.stderr
- testsuite/tests/indexed-types/should_compile/T7837.stderr


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/cbadbc1a6815da0f2eddfb7c4291f7026a93ad3a...8be40982755fa9872bbf3de0737accdb263c0500

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/cbadbc1a6815da0f2eddfb7c4291f7026a93ad3a...8be40982755fa9872bbf3de0737accdb263c0500
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20220909/1e27de41/attachment.html>


More information about the ghc-commits mailing list