[Git][ghc/ghc][wip/simplifier-tweaks] 10 commits: Include -haddock in DynFlags fingerprint

Sun Jul 30 22:29:14 UTC 2023

Simon Peyton Jones pushed to branch wip/simplifier-tweaks at Glasgow Haskell Compiler / GHC

Commits:
0bfc8908 by Finley McIlwaine at 2023-07-28T18:46:26-04:00
Include -haddock in DynFlags fingerprint

The -haddock flag determines whether or not the resulting .hi files
contain haddock documentation strings. If the existing .hi files do
not contain haddock documentation strings and the user requests them,
we should recompile.

- - - - -
40425c50 by Andreas Klebinger at 2023-07-28T18:47:02-04:00
Aarch64 NCG: Use encoded immediates for literals.

Try to generate

    instr x2, <imm>

instead of

    mov x1, lit
    instr x2, x1

When possible. This get's rid if quite a few redundant
mov instructions.

I believe this causes a metric decrease for LargeRecords as
we reduce register pressure.

-------------------------
Metric Decrease:
    LargeRecord
-------------------------

- - - - -
e9a0fa3f by Bodigrim at 2023-07-28T18:47:42-04:00
Bump filepath submodule to 1.4.100.4

Resolves #23741

Metric Decrease:
    MultiComponentModules
    MultiComponentModulesRecomp
    MultiLayerModules
    MultiLayerModulesRecomp
    T10421
    T12234
    T12425
    T13035
    T13701
    T13719
    T16875
    T18304
    T18698a
    T18698b
    T21839c
    T9198
    TcPlugin_RewritePerf
    hard_hole_fits

Metric decrease on Windows can be probably attributed to https://github.com/haskell/filepath/pull/183

- - - - -
ee93edfd by Bodigrim at 2023-07-28T18:48:21-04:00
Add since pragmas to GHC.IO.Handle.FD

- - - - -
d0369802 by Simon Peyton Jones at 2023-07-30T09:24:48+01:00
Make the occurrence analyser smarter about join points

This MR addresses #22404.  There is a big Note

   Note [Occurrence analysis for join points]

that explains it all.  Significant changes

* New field occ_join_points in OccEnv

* The NonRec case of occAnalBind splits into two cases:
  one for existing join points (which does the special magic for
  Note [Occurrence analysis for join points], and one for other
  bindings.

* mkOneOcc adds in info from occ_join_points.

* All "bring into scope" activity is centralised in the
  new function `addInScope`.

* I made a local data type LocalOcc for use inside the occurrence analyser
  It is like OccInfo, but lacks IAmDead and IAmALoopBreaker, which in turn
  makes computationns over it simpler and more efficient.

* I found quite a bit of allocation in GHC.Core.Rules.getRules
  so I optimised it a bit.

More minor changes

* I found I was using (Maybe Arity) a lot, so I defined a new data
  type JoinPointHood and used it everwhere.  This touches a lot of
  non-occ-anal files, but it makes everything more perspicuous.

* Renamed data constructor WithUsageDetails to WUD, and
  WithTailUsageDetails to WTUD

This also fixes #21128, on the way.

--------- Compiler perf -----------
I spent quite a time on performance tuning, so even though it
does more than before, the occurrence analyser runs slightly faster
on average.  Here are the compile-time allocation changes over 0.5%

      CoOpt_Read(normal) ghc/alloc    766,025,520    754,561,992  -1.5%
CoOpt_Singletons(normal) ghc/alloc    759,436,840    762,925,512  +0.5%
     LargeRecord(normal) ghc/alloc  1,814,482,440  1,799,530,456  -0.8%
       PmSeriesT(normal) ghc/alloc     68,159,272     67,519,720  -0.9%
          T10858(normal) ghc/alloc    120,805,224    118,746,968  -1.7%
          T11374(normal) ghc/alloc    164,901,104    164,070,624  -0.5%
          T11545(normal) ghc/alloc     79,851,808     78,964,704  -1.1%
          T12150(optasm) ghc/alloc     73,903,664     71,237,544  -3.6% GOOD
          T12227(normal) ghc/alloc    333,663,200    331,625,864  -0.6%
          T12234(optasm) ghc/alloc     52,583,224     52,340,344  -0.5%
          T12425(optasm) ghc/alloc     81,943,216     81,566,720  -0.5%
          T13056(optasm) ghc/alloc    294,517,928    289,642,512  -1.7%
      T13253-spj(normal) ghc/alloc    118,271,264     59,859,040 -49.4% GOOD
          T15164(normal) ghc/alloc  1,102,630,352  1,091,841,296  -1.0%
          T15304(normal) ghc/alloc  1,196,084,000  1,166,733,000  -2.5%
          T15630(normal) ghc/alloc    148,729,632    147,261,064  -1.0%
          T15703(normal) ghc/alloc    379,366,664    377,600,008  -0.5%
          T16875(normal) ghc/alloc     32,907,120     32,670,976  -0.7%
          T17516(normal) ghc/alloc  1,658,001,888  1,627,863,848  -1.8%
          T17836(normal) ghc/alloc    395,329,400    393,080,248  -0.6%
          T18140(normal) ghc/alloc     71,968,824     73,243,040  +1.8%
          T18223(normal) ghc/alloc    456,852,568    453,059,088  -0.8%
          T18282(normal) ghc/alloc    129,105,576    131,397,064  +1.8%
          T18304(normal) ghc/alloc     71,311,712     70,722,720  -0.8%
         T18698a(normal) ghc/alloc    208,795,112    210,102,904  +0.6%
         T18698b(normal) ghc/alloc    230,320,736    232,697,976  +1.0%  BAD
          T19695(normal) ghc/alloc  1,483,648,128  1,504,702,976  +1.4%
          T20049(normal) ghc/alloc     85,612,024     85,114,376  -0.6%
         T21839c(normal) ghc/alloc    415,080,992    410,906,216  -1.0% GOOD
           T4801(normal) ghc/alloc    247,590,920    250,726,272  +1.3%
           T6048(optasm) ghc/alloc     95,699,416     95,080,680  -0.6%
            T783(normal) ghc/alloc    335,323,384    332,988,120  -0.7%
           T9233(normal) ghc/alloc    709,641,224    685,947,008  -3.3% GOOD
           T9630(normal) ghc/alloc    965,635,712    948,356,120  -1.8%
           T9675(optasm) ghc/alloc    444,604,152    428,987,216  -3.5% GOOD
           T9961(normal) ghc/alloc    303,064,592    308,798,800  +1.9%  BAD
           WWRec(normal) ghc/alloc    503,728,832    498,102,272  -1.1%

               geo. mean                                          -1.0%
               minimum                                           -49.4%
               maximum                                            +1.9%

In fact these figures seem to vary between platforms; generally worse
on i386 for some reason.  The Windows numbers vary by 1% espec in
benchmarks where the total allocation is low. But the geom mean stays
solidly negative, which is good.  The "increase/decrease" list below
covers all platforms.

The big win on T13253-spj comes because it has a big nest of join
points, each occurring twice in the next one.  The new occ-anal takes
only one iteration of the simplifier to do the inlining; the old one
took four.  Moreover, we get much smaller code with the new one:

  New: Result size of Tidy Core
    = {terms: 429, types: 84, coercions: 0, joins: 14/14}

  Old: Result size of Tidy Core
    = {terms: 2,437, types: 304, coercions: 0, joins: 10/10}

--------- Runtime perf -----------
No significant changes in nofib results, except a 1% reduction in
compiler allocation.

Metric Decrease:
    CoOpt_Read
    T13253-spj
    T9233
    T9630
    T9675
    T12150
    T21839c
    LargeRecord
    MultiComponentModulesRecomp
    T10421
    T13701
    T10421
    T13701
    T12425

Metric Increase:
    T18140
    T9961
    T18282
    T18698a
    T18698b
    T19695

- - - - -
42aa7fbd by Julian Ospald at 2023-07-30T17:22:01-04:00
Improve documentation around IOException and ioe_filename

See:

* https://github.com/haskell/core-libraries-committee/issues/189
* https://github.com/haskell/unix/pull/279
* https://github.com/haskell/unix/pull/289

- - - - -
6e3b92bf by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Several improvements to the handling of coercions

* Make `mkSymCo` and `mkInstCo` smarter
  Fixes #23642

* Fix return role of `SelCo` in the coercion optimiser.
  Fixes #23617

* Make the coercion optimiser `opt_trans_rule` work better for newtypes
  Fixes #23619

- - - - -
6262e6ad by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Simplifier improvements

This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do.  But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.

The net result is good: a 2% improvement in compile time.  The table
below shows changes over 1%.

The main changes are:

* The SimplEnv now has a seInlineDepth field, which says how deep
  in unfoldings we are.  See Note [Inline depth] in Simplify.Env

* Avoid repeatedly simplifying coercions.
  see Note [Avoid re-simplifying coercions] in Simplify.Iteration
  As you'll see from the Note, this makes use of the seInlineDepth.

* Allow Simplify.Utils.postInlineUnconditionally to inline variables
  that are used exactly once. See Note [Post-inline for single-use things].

* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
  This is another part of Note [Post-inline for single-use things], and
  is really good for reducing simplifier iterations in situations like
      case K e of { K x -> blah }
  wher x is used once in blah.

* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
  elimination.  Note [Case elim in exprIsConApp_maybe]

* When making join points, don't do so if the join point is so small
  it will immediately be inlined.  See Note [Duplicating alternatives]

* Do not add an unfolding to a join point at birth.  This is a tricky one
  and has a long Note [Do not add unfoldings to join points at birth]
  It shows up in two places
  - In `mkDupableAlt` do not add an inlining
  - (trickier) In `simplLetUnfolding` don't add an unfolding for a
    fresh join point
  I am not fully satisifed with this, but it works and is well documented.

* Many new or rewritten Notes.  E.g. Note [Avoiding simplifying repeatedly]

I discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very
delicately balanced.  It's a small, heavily used, overloaded function
and it's important that it inlines. By a fluke it was before, but at
various times in my journey it stopped doing so.  So I added an INLINE
pragma to it.

Metrics: compile_time/bytes allocated
------------------------------------------------
           CoOpt_Singletons(normal)   -4.3% GOOD
                LargeRecord(normal)  -23.3% GOOD
                  PmSeriesS(normal)   -2.4%
                     T11195(normal)   -1.7%
                     T12227(normal)  -20.0% GOOD
                     T12545(normal)   -5.4%
                 T13253-spj(normal)  -50.7% GOOD
                     T13386(normal)   -5.1% GOOD
                     T14766(normal)   -2.4% GOOD
                     T15164(normal)   -1.7%
                     T15304(normal)   +1.0%
                     T15630(normal)   -7.7%
                    T15630a(normal)          NEW
                     T15703(normal)   -7.5% GOOD
                     T16577(normal)   -5.1% GOOD
                     T17516(normal)   -3.6%
                     T18223(normal)  -16.8% GOOD
                     T18282(normal)   -1.5%
                     T18304(normal)   +1.9%
                    T21839c(normal)   -3.5% GOOD
                      T3064(normal)   -1.5%
                      T5030(normal)  -16.2% GOOD
                   T5321Fun(normal)   -1.6%
                      T6048(optasm)   -2.1% GOOD
                      T8095(normal)   -6.1% GOOD
                      T9630(normal)   -5.1% GOOD
                      WWRec(normal)   -1.6%

                          geo. mean   -2.1%
                          minimum    -50.7%
                          maximum     +1.9%

Metric Decrease:
    CoOpt_Singletons
    LargeRecord
    T12227
    T13253-spj
    T13386
    T14766
    T15703
    T16577
    T18223
    T21839c
    T5030
    T6048
    T8095
    T9630

- - - - -
50518da5 by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Improve postInlineUnconditionally

This commit adds two things to postInlineUnconditionally:

1. Do not postInlineUnconditionally join point, ever.
   Doing so does not reduce allocation, which is the main point,
   and with join points that are used a lot it can bloat code.
   See point (1) of Note [Duplicating join points] in
   GHC.Core.Opt.Simplify.Iteration.

2. Do not postInlineUnconditionally a strict (demanded) binding.
   It will not allocate a thunk (it'll turn into a case instead)
   so again the main point of inlining it doesn't hold.  Better
   to check per-call-site.

- - - - -
0caebc5d by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Update testsuite output

- - - - -

15 changed files:

- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Lint.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Exitify.hs
- compiler/GHC/Core/Opt/FloatIn.hs
- compiler/GHC/Core/Opt/FloatOut.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0a5a2092c07a5e759993a8dd0d5ce82898a811f7...0caebc5d5f197f74ae6849ca2c4f38ef82d44412

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0a5a2092c07a5e759993a8dd0d5ce82898a811f7...0caebc5d5f197f74ae6849ca2c4f38ef82d44412
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230730/9887d3de/attachment-0001.html>