[Git][ghc/ghc][wip/simplifier-tweaks] 18 commits: rts: Fix TSAN_ENABLED CPP guard

Tue Apr 2 20:22:48 UTC 2024

Simon Peyton Jones pushed to branch wip/simplifier-tweaks at Glasgow Haskell Compiler / GHC

Commits:
c8a4c050 by Ben Gamari at 2024-04-02T12:50:35-04:00
rts: Fix TSAN_ENABLED CPP guard

This should be `#if defined(TSAN_ENABLED)`, not `#if TSAN_ENABLED`,
lest we suffer warnings.

- - - - -
e91dad93 by Cheng Shao at 2024-04-02T12:50:35-04:00
rts: fix errors when compiling with TSAN

This commit fixes rts compilation errors when compiling with TSAN:

- xxx_FENCE macros are redefined and trigger CPP warnings.
- Use SIZEOF_W. WORD_SIZE_IN_BITS is provided by MachDeps.h which
  Cmm.h doesn't include by default.

- - - - -
a9ab9455 by Cheng Shao at 2024-04-02T12:50:35-04:00
rts: fix clang-specific errors when compiling with TSAN

This commit fixes clang-specific rts compilation errors when compiling
with TSAN:

- clang doesn't have -Wtsan flag
- Fix prototype of ghc_tsan_* helper functions
- __tsan_atomic_* functions aren't clang built-ins and
  sanitizer/tsan_interface_atomic.h needs to be included
- On macOS, TSAN runtime library is
  libclang_rt.tsan_osx_dynamic.dylib, not libtsan. -fsanitize-thread
  as a link-time flag will take care of linking the TSAN runtime
  library anyway so remove tsan as an rts extra library

- - - - -
865bd717 by Cheng Shao at 2024-04-02T12:50:35-04:00
compiler: fix github link to __tsan_memory_order in a comment

- - - - -
07cb627c by Cheng Shao at 2024-04-02T12:50:35-04:00
ci: improve TSAN CI jobs

- Run TSAN jobs with +thread_sanitizer_cmm which enables Cmm
  instrumentation as well.
- Run TSAN jobs in deb12 which ships gcc-12, a reasonably recent gcc
  that @bgamari confirms he's using in #GHC:matrix.org. Ideally we
  should be using latest clang release for latest improvements in
  sanitizers, though that's left as future work.
- Mark TSAN jobs as manual+allow_failure in validate pipelines. The
  purpose is to demonstrate that we have indeed at least fixed
  building of TSAN mode in CI without blocking the patch to land, and
  once merged other people can begin playing with TSAN using their own
  dev setups and feature branches.

- - - - -
a1c18c7b by Andrei Borzenkov at 2024-04-02T12:51:11-04:00
Merge tc_infer_hs_type and tc_hs_type into one function using ExpType philosophy (#24299, #23639)

This patch implements refactoring which is a prerequisite to
updating kind checking of type patterns. This is a huge simplification
of the main worker that checks kind of HsType.

It also fixes the issues caused by previous code duplication, e.g.
that we didn't add module finalizers from splices in inference mode.

- - - - -
c93eecf1 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Several improvements to the handling of coercions

* Make `mkSymCo` and `mkInstCo` smarter
  Fixes #23642

* Fix return role of `SelCo` in the coercion optimiser.
  Fixes #23617

* Make the coercion optimiser `opt_trans_rule` work better for newtypes
  Fixes #23619

- - - - -
0a2c3c2d by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
FloatOut: improve floating for join point

See the new Note [Floating join point bindings].

* Completely get rid of the complicated join_ceiling nonsense, which
  I have never understood.

* Do not float join points at all, except perhaps to top level.

* Some refactoring around wantToFloat, to treat Rec and NonRec more
  uniformly

- - - - -
6d605aa5 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Improve eta-expansion through call stacks

See Note [Eta expanding through CallStacks] in GHC.Core.Opt.Arity

This is a one-line change, that fixes an inconsistency
-               || isCallStackPredTy ty
+               || isCallStackPredTy ty || isCallStackTy ty

- - - - -
447f2ac0 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Spelling, layout, pretty-printing only

- - - - -
f36b9603 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Improve exprIsConApp_maybe a little

Eliminate a redundant case at birth.  This sometimes reduces
Simplifier iterations.

See Note [Case elim in exprIsConApp_maybe].

- - - - -
7fc8cfc3 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Inline GHC.HsToCore.Pmc.Solver.Types.trvVarInfo

When exploring compile-time regressions after meddling with the Simplifier, I
discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very delicately
balanced.  It's a small, heavily used, overloaded function and it's important
that it inlines. By a fluke it was before, but at various times in my journey it
stopped doing so.  So I just added an INLINE pragma to it; no sense in depending
on a delicately-balanced fluke.

- - - - -
a818823d by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Slight improvement in WorkWrap

Ensure that WorkWrap preserves lambda binders, in case of join points.  Sadly I
have forgotten why I made this change (it was while I was doing a lot of
meddling in the Simplifier, but
  * it does no harm,
  * it is slightly more efficient, and
  * presumably it made something better!

Anyway I have kept it in a separate commit.

- - - - -
c471144d by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Use named record fields for the CastIt { ... } data constructor

This is a pure refactor

- - - - -
346115e0 by Simon Peyton Jones at 2024-04-02T21:20:44+01:00
Remove a long-commented-out line

Pure refactoring

- - - - -
57f42293 by Simon Peyton Jones at 2024-04-02T21:20:49+01:00
Simplifier improvements

This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do.  But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.

The main changes are below.  There are also many new or rewritten Notes.

Avoiding simplifying repeatedly
~~~~~~~~~~~~~~~
See Note [Avoiding simplifying repeatedly]

* The SimplEnv now has a seInlineDepth field, which says how deep
  in unfoldings we are.  See Note [Inline depth] in Simplify.Env.
  Currently used only for the next point: avoiding repeatedly
  simplifying coercions.

* Avoid repeatedly simplifying coercions.
  see Note [Avoid re-simplifying coercions] in Simplify.Iteration
  As you'll see from the Note, this makes use of the seInlineDepth.

* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
  This is another part of Note [Post-inline for single-use things], and
  is really good for reducing simplifier iterations in situations like
      case K e of { K x -> blah }
  wher x is used once in blah.

* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
  elimination.  Note [Case elim in exprIsConApp_maybe]

* Improve the case-merge transformation:
  - Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts`
    and friends.  See Note [Merge Nested Cases] in GHC.Core.Utils.
  - Add a new case for `tagToEnum#`; see wrinkle (MC3).
  - Add a new case to look through join points: see wrinkle (MC4)

postInlineUnconditionally
~~~~~~~~~~~~~~~~~~~~~~~~~
* Allow Simplify.Utils.postInlineUnconditionally to inline variables
  that are used exactly once. See Note [Post-inline for single-use things].

* Do not postInlineUnconditionally join point, ever.
  Doing so does not reduce allocation, which is the main point,
  and with join points that are used a lot it can bloat code.
  See point (1) of Note [Duplicating join points] in
  GHC.Core.Opt.Simplify.Iteration.

* Do not postInlineUnconditionally a strict (demanded) binding.
  It will not allocate a thunk (it'll turn into a case instead)
  so again the main point of inlining it doesn't hold.  Better
  to check per-call-site.

* Improve occurrence analyis for bottoming function calls, to help
  postInlineUnconditionally.  See Note [Bottoming function calls]
  in GHC.Core.Opt.OccurAnal

Inlining generally
~~~~~~~~~~~~~~~~~~
* In GHC.Core.Opt.Simplify.Utils.interestingCallContext,
  use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case.
  See Note [Seq is boring]  Also, wrinkle (SB1), inline in that
  `seq` context only for INLINE functions (UnfWhen guidance).

* In GHC.Core.Opt.Simplify.Utils.interestingArg,
  - return ValueArg for OtherCon [c1,c2, ...], but
  - return NonTrivArg for OtherCon []
  This makes a function a little less likely to inline if all we
  know is that the argument is evaluated, but nothing else.

* isConLikeUnfolding is no longer true for OtherCon {}.
  This propagates to exprIsConLike.  Con-like-ness has /positive/
  information.

Join points
~~~~~~~~~~~
* Be very careful about inlining join points.
  See these two long Notes
    Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration
    Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline

* When making join points, don't do so if the join point is so small
  it will immediately be inlined; check uncondInlineJoin.

* In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining
  heuristics for join points. In general we /do not/ want to inline
  join points /even if they are small/.  See Note [Duplicating join points]
  GHC.Core.Opt.Simplify.Iteration.

  But sometimes we do: see Note [Inlining join points] in
  GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function.

* Do not add an unfolding to a join point at birth.  This is a tricky one
  and has a long Note [Do not add unfoldings to join points at birth]
  It shows up in two places
  - In `mkDupableAlt` do not add an inlining
  - (trickier) In `simplLetUnfolding` don't add an unfolding for a
    fresh join point
  I am not fully satisifed with this, but it works and is well documented.

* In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise
  having a non-inlined join point.

Performance changes
~~~~~~~~~~~~~~~~~~~
* Binary sizes fall by around 2.6%, according to nofib.

* Compile times improve slightly. Here are the figures over 1%.

  I investiate the biggest differnce in T18304. It's a very small module, just
  a few hundred nodes. The large percentage difffence is due to a single
  function that didn't quite inline before, and does now, making code size a
  bit bigger.  I decided gains outweighed the losses.

    Metrics: compile_time/bytes allocated (changes over +/- 1%)
    ------------------------------------------------
           CoOpt_Singletons(normal)   -9.2% GOOD
                LargeRecord(normal)  -23.5% GOOD
MultiComponentModulesRecomp(normal)   +1.2%
MultiLayerModulesTH_OneShot(normal)   +4.1%  BAD
                  PmSeriesS(normal)   -3.8%
                  PmSeriesV(normal)   -1.5%
                     T11195(normal)   -1.3%
                     T12227(normal)  -20.4% GOOD
                     T12545(normal)   -3.2%
                     T12707(normal)   -2.1% GOOD
                     T13253(normal)   -1.2%
                 T13253-spj(normal)   +8.1%  BAD
                     T13386(normal)   -3.1% GOOD
                     T14766(normal)   -2.6% GOOD
                     T15164(normal)   -1.4%
                     T15304(normal)   +1.2%
                     T15630(normal)   -8.2%
                    T15630a(normal)          NEW
                     T15703(normal)  -14.7% GOOD
                     T16577(normal)   -2.3% GOOD
                     T17516(normal)  -39.7% GOOD
                     T18140(normal)   +1.2%
                     T18223(normal)  -17.1% GOOD
                     T18282(normal)   -5.0% GOOD
                     T18304(normal)  +10.8%  BAD
                     T18923(normal)   -2.9% GOOD
                      T1969(normal)   +1.0%
                     T19695(normal)   -1.5%
                     T20049(normal)  -12.7% GOOD
                    T21839c(normal)   -4.1% GOOD
                      T3064(normal)   -1.5%
                      T3294(normal)   +1.2%  BAD
                      T4801(normal)   +1.2%
                      T5030(normal)  -15.2% GOOD
                   T5321Fun(normal)   -2.2% GOOD
                      T6048(optasm)  -16.8% GOOD
                       T783(normal)   -1.2%
                      T8095(normal)   -6.0% GOOD
                      T9630(normal)   -4.7% GOOD
                      T9961(normal)   +1.9%  BAD
                      WWRec(normal)   -1.4%
        info_table_map_perf(normal)   -1.3%
                 parsing001(normal)   +1.5%

                          geo. mean   -2.0%
                          minimum    -39.7%
                          maximum    +10.8%

* Runtimes generally improve. In the testsuite perf/should_run gives:
   Metrics: runtime/bytes allocated
   ------------------------------------------
             Conversions(normal)   -0.3%
                 T13536a(optasm)  -41.7% GOOD
                   T4830(normal)   -0.1%
           haddock.Cabal(normal)   -0.1%
            haddock.base(normal)   -0.1%
        haddock.compiler(normal)   -0.1%

                       geo. mean   -0.8%
                       minimum    -41.7%
                       maximum     +0.0%

* For runtime, nofib is a better test.  The news is mostly good.
  Here are the number more than +/- 0.1%:

    # bytes allocated
    ==========================++==========
       imaginary/digits-of-e1 ||  -14.40%
       imaginary/digits-of-e2 ||   -4.41%
          imaginary/paraffins ||   -0.17%
               imaginary/rfib ||   -0.15%
       imaginary/wheel-sieve2 ||   -0.10%
                real/compress ||   -0.47%
                   real/fluid ||   -0.10%
                  real/fulsom ||   +0.14%
                  real/gamteb ||   -1.47%
                      real/gg ||   -0.20%
                   real/infer ||   +0.24%
                     real/pic ||   -0.23%
                  real/prolog ||   -0.36%
                     real/scs ||   -0.46%
                 real/smallpt ||   +4.03%
        shootout/k-nucleotide ||  -20.23%
              shootout/n-body ||   -0.42%
       shootout/spectral-norm ||   -0.13%
              spectral/boyer2 ||   -3.80%
         spectral/constraints ||   -0.27%
          spectral/hartel/ida ||   -0.82%
                spectral/mate ||  -20.34%
                spectral/para ||   +0.46%
             spectral/rewrite ||   +1.30%
              spectral/sphere ||   -0.14%
    ==========================++==========
                    geom mean ||   -0.59%

    real/smallpt has a huge nest of local definitions, and I
    could not pin down a reason for a regression.  But there are
    three big wins!

Metric Decrease:
    CoOpt_Singletons
    LargeRecord
    T12227
    T12707
    T13386
    T13536a
    T14766
    T15703
    T16577
    T17516
    T18223
    T18282
    T18923
    T21839c
    T20049
    T5321Fun
    T5030
    T6048
    T8095
    T9630
    T783
Metric Increase:
    MultiLayerModulesTH_OneShot
    T13253-spj
    T18304
    T18698a
    T9961
    T3294

- - - - -
ec584eaf by Simon Peyton Jones at 2024-04-02T21:21:23+01:00
Testsuite message changes from simplifier improvements

- - - - -
9c910ae1 by Simon Peyton Jones at 2024-04-02T21:21:23+01:00
Account for bottoming functions in OccurAnal

This fixes #24582, a small but long-standing bug

- - - - -

15 changed files:

- .gitlab/generate-ci/gen_ci.hs
- .gitlab/jobs.yaml
- compiler/GHC/Cmm/ThreadSanitizer.hs
- compiler/GHC/Core.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/Opt/FloatOut.hs
- compiler/GHC/Core/Opt/Monad.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/Pipeline.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Inline.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/848f360fe1bcbdac28e9cc0014665bf8ccbfd259...9c910ae1a616e74f5389828ad9b9126d0282764e

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/848f360fe1bcbdac28e9cc0014665bf8ccbfd259...9c910ae1a616e74f5389828ad9b9126d0282764e
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240402/66e7a572/attachment-0001.html>