[Git][ghc/ghc][wip/simplifier-tweaks] 10 commits: Include -haddock in DynFlags fingerprint
Simon Peyton Jones (@simonpj)
gitlab at gitlab.haskell.org
Sun Jul 30 22:29:14 UTC 2023
Simon Peyton Jones pushed to branch wip/simplifier-tweaks at Glasgow Haskell Compiler / GHC
Commits:
0bfc8908 by Finley McIlwaine at 2023-07-28T18:46:26-04:00
Include -haddock in DynFlags fingerprint
The -haddock flag determines whether or not the resulting .hi files
contain haddock documentation strings. If the existing .hi files do
not contain haddock documentation strings and the user requests them,
we should recompile.
- - - - -
40425c50 by Andreas Klebinger at 2023-07-28T18:47:02-04:00
Aarch64 NCG: Use encoded immediates for literals.
Try to generate
instr x2, <imm>
instead of
mov x1, lit
instr x2, x1
When possible. This get's rid if quite a few redundant
mov instructions.
I believe this causes a metric decrease for LargeRecords as
we reduce register pressure.
-------------------------
Metric Decrease:
LargeRecord
-------------------------
- - - - -
e9a0fa3f by Bodigrim at 2023-07-28T18:47:42-04:00
Bump filepath submodule to 1.4.100.4
Resolves #23741
Metric Decrease:
MultiComponentModules
MultiComponentModulesRecomp
MultiLayerModules
MultiLayerModulesRecomp
T10421
T12234
T12425
T13035
T13701
T13719
T16875
T18304
T18698a
T18698b
T21839c
T9198
TcPlugin_RewritePerf
hard_hole_fits
Metric decrease on Windows can be probably attributed to https://github.com/haskell/filepath/pull/183
- - - - -
ee93edfd by Bodigrim at 2023-07-28T18:48:21-04:00
Add since pragmas to GHC.IO.Handle.FD
- - - - -
d0369802 by Simon Peyton Jones at 2023-07-30T09:24:48+01:00
Make the occurrence analyser smarter about join points
This MR addresses #22404. There is a big Note
Note [Occurrence analysis for join points]
that explains it all. Significant changes
* New field occ_join_points in OccEnv
* The NonRec case of occAnalBind splits into two cases:
one for existing join points (which does the special magic for
Note [Occurrence analysis for join points], and one for other
bindings.
* mkOneOcc adds in info from occ_join_points.
* All "bring into scope" activity is centralised in the
new function `addInScope`.
* I made a local data type LocalOcc for use inside the occurrence analyser
It is like OccInfo, but lacks IAmDead and IAmALoopBreaker, which in turn
makes computationns over it simpler and more efficient.
* I found quite a bit of allocation in GHC.Core.Rules.getRules
so I optimised it a bit.
More minor changes
* I found I was using (Maybe Arity) a lot, so I defined a new data
type JoinPointHood and used it everwhere. This touches a lot of
non-occ-anal files, but it makes everything more perspicuous.
* Renamed data constructor WithUsageDetails to WUD, and
WithTailUsageDetails to WTUD
This also fixes #21128, on the way.
--------- Compiler perf -----------
I spent quite a time on performance tuning, so even though it
does more than before, the occurrence analyser runs slightly faster
on average. Here are the compile-time allocation changes over 0.5%
CoOpt_Read(normal) ghc/alloc 766,025,520 754,561,992 -1.5%
CoOpt_Singletons(normal) ghc/alloc 759,436,840 762,925,512 +0.5%
LargeRecord(normal) ghc/alloc 1,814,482,440 1,799,530,456 -0.8%
PmSeriesT(normal) ghc/alloc 68,159,272 67,519,720 -0.9%
T10858(normal) ghc/alloc 120,805,224 118,746,968 -1.7%
T11374(normal) ghc/alloc 164,901,104 164,070,624 -0.5%
T11545(normal) ghc/alloc 79,851,808 78,964,704 -1.1%
T12150(optasm) ghc/alloc 73,903,664 71,237,544 -3.6% GOOD
T12227(normal) ghc/alloc 333,663,200 331,625,864 -0.6%
T12234(optasm) ghc/alloc 52,583,224 52,340,344 -0.5%
T12425(optasm) ghc/alloc 81,943,216 81,566,720 -0.5%
T13056(optasm) ghc/alloc 294,517,928 289,642,512 -1.7%
T13253-spj(normal) ghc/alloc 118,271,264 59,859,040 -49.4% GOOD
T15164(normal) ghc/alloc 1,102,630,352 1,091,841,296 -1.0%
T15304(normal) ghc/alloc 1,196,084,000 1,166,733,000 -2.5%
T15630(normal) ghc/alloc 148,729,632 147,261,064 -1.0%
T15703(normal) ghc/alloc 379,366,664 377,600,008 -0.5%
T16875(normal) ghc/alloc 32,907,120 32,670,976 -0.7%
T17516(normal) ghc/alloc 1,658,001,888 1,627,863,848 -1.8%
T17836(normal) ghc/alloc 395,329,400 393,080,248 -0.6%
T18140(normal) ghc/alloc 71,968,824 73,243,040 +1.8%
T18223(normal) ghc/alloc 456,852,568 453,059,088 -0.8%
T18282(normal) ghc/alloc 129,105,576 131,397,064 +1.8%
T18304(normal) ghc/alloc 71,311,712 70,722,720 -0.8%
T18698a(normal) ghc/alloc 208,795,112 210,102,904 +0.6%
T18698b(normal) ghc/alloc 230,320,736 232,697,976 +1.0% BAD
T19695(normal) ghc/alloc 1,483,648,128 1,504,702,976 +1.4%
T20049(normal) ghc/alloc 85,612,024 85,114,376 -0.6%
T21839c(normal) ghc/alloc 415,080,992 410,906,216 -1.0% GOOD
T4801(normal) ghc/alloc 247,590,920 250,726,272 +1.3%
T6048(optasm) ghc/alloc 95,699,416 95,080,680 -0.6%
T783(normal) ghc/alloc 335,323,384 332,988,120 -0.7%
T9233(normal) ghc/alloc 709,641,224 685,947,008 -3.3% GOOD
T9630(normal) ghc/alloc 965,635,712 948,356,120 -1.8%
T9675(optasm) ghc/alloc 444,604,152 428,987,216 -3.5% GOOD
T9961(normal) ghc/alloc 303,064,592 308,798,800 +1.9% BAD
WWRec(normal) ghc/alloc 503,728,832 498,102,272 -1.1%
geo. mean -1.0%
minimum -49.4%
maximum +1.9%
In fact these figures seem to vary between platforms; generally worse
on i386 for some reason. The Windows numbers vary by 1% espec in
benchmarks where the total allocation is low. But the geom mean stays
solidly negative, which is good. The "increase/decrease" list below
covers all platforms.
The big win on T13253-spj comes because it has a big nest of join
points, each occurring twice in the next one. The new occ-anal takes
only one iteration of the simplifier to do the inlining; the old one
took four. Moreover, we get much smaller code with the new one:
New: Result size of Tidy Core
= {terms: 429, types: 84, coercions: 0, joins: 14/14}
Old: Result size of Tidy Core
= {terms: 2,437, types: 304, coercions: 0, joins: 10/10}
--------- Runtime perf -----------
No significant changes in nofib results, except a 1% reduction in
compiler allocation.
Metric Decrease:
CoOpt_Read
T13253-spj
T9233
T9630
T9675
T12150
T21839c
LargeRecord
MultiComponentModulesRecomp
T10421
T13701
T10421
T13701
T12425
Metric Increase:
T18140
T9961
T18282
T18698a
T18698b
T19695
- - - - -
42aa7fbd by Julian Ospald at 2023-07-30T17:22:01-04:00
Improve documentation around IOException and ioe_filename
See:
* https://github.com/haskell/core-libraries-committee/issues/189
* https://github.com/haskell/unix/pull/279
* https://github.com/haskell/unix/pull/289
- - - - -
6e3b92bf by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Several improvements to the handling of coercions
* Make `mkSymCo` and `mkInstCo` smarter
Fixes #23642
* Fix return role of `SelCo` in the coercion optimiser.
Fixes #23617
* Make the coercion optimiser `opt_trans_rule` work better for newtypes
Fixes #23619
- - - - -
6262e6ad by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Simplifier improvements
This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do. But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.
The net result is good: a 2% improvement in compile time. The table
below shows changes over 1%.
The main changes are:
* The SimplEnv now has a seInlineDepth field, which says how deep
in unfoldings we are. See Note [Inline depth] in Simplify.Env
* Avoid repeatedly simplifying coercions.
see Note [Avoid re-simplifying coercions] in Simplify.Iteration
As you'll see from the Note, this makes use of the seInlineDepth.
* Allow Simplify.Utils.postInlineUnconditionally to inline variables
that are used exactly once. See Note [Post-inline for single-use things].
* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
This is another part of Note [Post-inline for single-use things], and
is really good for reducing simplifier iterations in situations like
case K e of { K x -> blah }
wher x is used once in blah.
* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
elimination. Note [Case elim in exprIsConApp_maybe]
* When making join points, don't do so if the join point is so small
it will immediately be inlined. See Note [Duplicating alternatives]
* Do not add an unfolding to a join point at birth. This is a tricky one
and has a long Note [Do not add unfoldings to join points at birth]
It shows up in two places
- In `mkDupableAlt` do not add an inlining
- (trickier) In `simplLetUnfolding` don't add an unfolding for a
fresh join point
I am not fully satisifed with this, but it works and is well documented.
* Many new or rewritten Notes. E.g. Note [Avoiding simplifying repeatedly]
I discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very
delicately balanced. It's a small, heavily used, overloaded function
and it's important that it inlines. By a fluke it was before, but at
various times in my journey it stopped doing so. So I added an INLINE
pragma to it.
Metrics: compile_time/bytes allocated
------------------------------------------------
CoOpt_Singletons(normal) -4.3% GOOD
LargeRecord(normal) -23.3% GOOD
PmSeriesS(normal) -2.4%
T11195(normal) -1.7%
T12227(normal) -20.0% GOOD
T12545(normal) -5.4%
T13253-spj(normal) -50.7% GOOD
T13386(normal) -5.1% GOOD
T14766(normal) -2.4% GOOD
T15164(normal) -1.7%
T15304(normal) +1.0%
T15630(normal) -7.7%
T15630a(normal) NEW
T15703(normal) -7.5% GOOD
T16577(normal) -5.1% GOOD
T17516(normal) -3.6%
T18223(normal) -16.8% GOOD
T18282(normal) -1.5%
T18304(normal) +1.9%
T21839c(normal) -3.5% GOOD
T3064(normal) -1.5%
T5030(normal) -16.2% GOOD
T5321Fun(normal) -1.6%
T6048(optasm) -2.1% GOOD
T8095(normal) -6.1% GOOD
T9630(normal) -5.1% GOOD
WWRec(normal) -1.6%
geo. mean -2.1%
minimum -50.7%
maximum +1.9%
Metric Decrease:
CoOpt_Singletons
LargeRecord
T12227
T13253-spj
T13386
T14766
T15703
T16577
T18223
T21839c
T5030
T6048
T8095
T9630
- - - - -
50518da5 by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Improve postInlineUnconditionally
This commit adds two things to postInlineUnconditionally:
1. Do not postInlineUnconditionally join point, ever.
Doing so does not reduce allocation, which is the main point,
and with join points that are used a lot it can bloat code.
See point (1) of Note [Duplicating join points] in
GHC.Core.Opt.Simplify.Iteration.
2. Do not postInlineUnconditionally a strict (demanded) binding.
It will not allocate a thunk (it'll turn into a case instead)
so again the main point of inlining it doesn't hold. Better
to check per-call-site.
- - - - -
0caebc5d by Simon Peyton Jones at 2023-07-30T22:29:34+01:00
Update testsuite output
- - - - -
15 changed files:
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Lint.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Exitify.hs
- compiler/GHC/Core/Opt/FloatIn.hs
- compiler/GHC/Core/Opt/FloatOut.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0a5a2092c07a5e759993a8dd0d5ce82898a811f7...0caebc5d5f197f74ae6849ca2c4f38ef82d44412
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/0a5a2092c07a5e759993a8dd0d5ce82898a811f7...0caebc5d5f197f74ae6849ca2c4f38ef82d44412
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230730/9887d3de/attachment-0001.html>
More information about the ghc-commits
mailing list