[Git][ghc/ghc][wip/T21851-rule-win] 8 commits: Apply some tricks to speed up core lint.
Simon Peyton Jones (@simonpj)
gitlab at gitlab.haskell.org
Wed Sep 28 22:32:02 UTC 2022
Simon Peyton Jones pushed to branch wip/T21851-rule-win at Glasgow Haskell Compiler / GHC
Commits:
c2d73cb4 by Andreas Klebinger at 2022-09-28T15:07:30-04:00
Apply some tricks to speed up core lint.
Below are the noteworthy changes and if given their impact on compiler
allocations for a type heavy module:
* Use the oneShot trick on LintM
* Use a unboxed tuple for the result of LintM: ~6% reduction
* Avoid a thunk for the result of typeKind in lintType: ~5% reduction
* lint_app: Don't allocate the error msg in the hot code path: ~4%
reduction
* lint_app: Eagerly force the in scope set: ~4%
* nonDetCmpType: Try to short cut using reallyUnsafePtrEquality#: ~2%
* lintM: Use a unboxed maybe for the `a` result: ~12%
* lint_app: make go_app tail recursive to avoid allocating the go function
as heap closure: ~7%
* expandSynTyCon_maybe: Use a specialized data type
For a less type heavy module like nofib/spectral/simple compiled with
-O -dcore-lint allocations went down by ~24% and compile time by ~9%.
-------------------------
Metric Decrease:
T1969
-------------------------
- - - - -
b74b6191 by sheaf at 2022-09-28T15:08:10-04:00
matchLocalInst: do domination analysis
When multiple Given quantified constraints match a Wanted, and there is
a quantified constraint that dominates all others, we now pick it
to solve the Wanted.
See Note [Use only the best matching quantified constraint].
For example:
[G] d1: forall a b. ( Eq a, Num b, C a b ) => D a b
[G] d2: forall a . C a Int => D a Int
[W] {w}: D a Int
When solving the Wanted, we find that both Givens match, but we pick
the second, because it has a weaker precondition, C a Int, compared
to (Eq a, Num Int, C a Int). We thus say that d2 dominates d1;
see Note [When does a quantified instance dominate another?].
This domination test is done purely in terms of superclass expansion,
in the function GHC.Tc.Solver.Interact.impliedBySCs. We don't attempt
to do a full round of constraint solving; this simple check suffices
for now.
Fixes #22216 and #22223
- - - - -
2a53ac18 by Simon Peyton Jones at 2022-09-28T17:49:09-04:00
Improve aggressive specialisation
This patch fixes #21286, by not unboxing dictionaries in
worker/wrapper (ever). The main payload is tiny:
* In `GHC.Core.Opt.DmdAnal.finaliseArgBoxities`, do not unbox
dictionaries in `get_dmd`. See Note [Do not unbox class dictionaries]
in that module
* I also found that imported wrappers were being fruitlessly
specialised, so I fixed that too, in canSpecImport.
See Note [Specialising imported functions] point (2).
In doing due diligence in the testsuite I fixed a number of
other things:
* Improve Note [Specialising unfoldings] in GHC.Core.Unfold.Make,
and Note [Inline specialisations] in GHC.Core.Opt.Specialise,
and remove duplication between the two. The new Note describes
how we specialise functions with an INLINABLE pragma.
And simplify the defn of `spec_unf` in `GHC.Core.Opt.Specialise.specCalls`.
* Improve Note [Worker/wrapper for INLINABLE functions] in
GHC.Core.Opt.WorkWrap.
And (critially) make an actual change which is to propagate the
user-written pragma from the original function to the wrapper; see
`mkStrWrapperInlinePrag`.
* Write new Note [Specialising imported functions] in
GHC.Core.Opt.Specialise
All this has a big effect on some compile times. This is
compiler/perf, showing only changes over 1%:
Metrics: compile_time/bytes allocated
-------------------------------------
LargeRecord(normal) -50.2% GOOD
ManyConstructors(normal) +1.0%
MultiLayerModulesTH_OneShot(normal) +2.6%
PmSeriesG(normal) -1.1%
T10547(normal) -1.2%
T11195(normal) -1.2%
T11276(normal) -1.0%
T11303b(normal) -1.6%
T11545(normal) -1.4%
T11822(normal) -1.3%
T12150(optasm) -1.0%
T12234(optasm) -1.2%
T13056(optasm) -9.3% GOOD
T13253(normal) -3.8% GOOD
T15164(normal) -3.6% GOOD
T16190(normal) -2.1%
T16577(normal) -2.8% GOOD
T16875(normal) -1.6%
T17836(normal) +2.2%
T17977b(normal) -1.0%
T18223(normal) -33.3% GOOD
T18282(normal) -3.4% GOOD
T18304(normal) -1.4%
T18698a(normal) -1.4% GOOD
T18698b(normal) -1.3% GOOD
T19695(normal) -2.5% GOOD
T5837(normal) -2.3%
T9630(normal) -33.0% GOOD
WWRec(normal) -9.7% GOOD
hard_hole_fits(normal) -2.1% GOOD
hie002(normal) +1.6%
geo. mean -2.2%
minimum -50.2%
maximum +2.6%
I diligently investigated some of the big drops.
* Caused by not doing w/w for dictionaries:
T13056, T15164, WWRec, T18223
* Caused by not fruitlessly specialising wrappers
LargeRecord, T9630
For runtimes, here is perf/should+_run:
Metrics: runtime/bytes allocated
--------------------------------
T12990(normal) -3.8%
T5205(normal) -1.3%
T9203(normal) -10.7% GOOD
haddock.Cabal(normal) +0.1%
haddock.base(normal) -1.1%
haddock.compiler(normal) -0.3%
lazy-bs-alloc(normal) -0.2%
------------------------------------------
geo. mean -0.3%
minimum -10.7%
maximum +0.1%
I did not investigate exactly what happens in T9203.
Nofib is a wash:
+-------------------------------++--+-----------+-----------+
| || | tsv (rel) | std. err. |
+===============================++==+===========+===========+
| real/anna || | -0.13% | 0.0% |
| real/fem || | +0.13% | 0.0% |
| real/fulsom || | -0.16% | 0.0% |
| real/lift || | -1.55% | 0.0% |
| real/reptile || | -0.11% | 0.0% |
| real/smallpt || | +0.51% | 0.0% |
| spectral/constraints || | +0.20% | 0.0% |
| spectral/dom-lt || | +1.80% | 0.0% |
| spectral/expert || | +0.33% | 0.0% |
+===============================++==+===========+===========+
| geom mean || | | |
+-------------------------------++--+-----------+-----------+
I spent quite some time investigating dom-lt, but it's pretty
complicated. See my note on !7847. Conclusion: it's just a delicate
inlining interaction, and we have plenty of those.
Metric Decrease:
LargeRecord
T13056
T13253
T15164
T16577
T18223
T18282
T18698a
T18698b
T19695
T9630
WWRec
hard_hole_fits
T9203
- - - - -
addeefc0 by Simon Peyton Jones at 2022-09-28T17:49:09-04:00
Refactor UnfoldingSource and IfaceUnfolding
I finally got tired of the way that IfaceUnfolding reflected
a previous structure of unfoldings, not the current one. This
MR refactors UnfoldingSource and IfaceUnfolding to be simpler
and more consistent.
It's largely just a refactor, but in UnfoldingSource (which moves
to GHC.Types.Basic, since it is now used in IfaceSyn too), I
distinguish between /user-specified/ and /system-generated/ stable
unfoldings.
data UnfoldingSource
= VanillaSrc
| StableUserSrc -- From a user-specified pragma
| StableSystemSrc -- From a system-generated unfolding
| CompulsorySrc
This has a minor effect in CSE (see the use of isisStableUserUnfolding
in GHC.Core.Opt.CSE), which I tripped over when working on
specialisation, but it seems like a Good Thing to know anyway.
- - - - -
7be6f9a4 by Simon Peyton Jones at 2022-09-28T17:49:09-04:00
INLINE/INLINEABLE pragmas in Foreign.Marshal.Array
Foreign.Marshal.Array contains many small functions, all of which are
overloaded, and which are critical for performance. Yet none of them
had pragmas, so it was a fluke whether or not they got inlined.
This patch makes them all either INLINE (small ones) or
INLINEABLE and hence specialisable (larger ones).
See Note [Specialising array operations] in that module.
- - - - -
b0c89dfa by Jade Lovelace at 2022-09-28T17:49:49-04:00
Export OnOff from GHC.Driver.Session
I was working on fixing an issue where HLS was trying to pass its
DynFlags to HLint, but didn't pass any of the disabled language
extensions, which HLint would then assume are on because of their
default values.
Currently it's not possible to get any of the "No" flags because the
`DynFlags.extensions` field can't really be used since it is [OnOff
Extension] and OnOff is not exported.
So let's export it.
- - - - -
2f050687 by Bodigrim at 2022-09-28T17:50:28-04:00
Avoid Data.List.group; prefer Data.List.NonEmpty.group
This allows to avoid further partiality, e. g., map head . group is
replaced by map NE.head . NE.group, and there are less panic calls.
- - - - -
8651488d by Simon Peyton Jones at 2022-09-28T23:31:50+01:00
Make rewrite rules "win" over inlining
If a rewrite rule and a rewrite rule compete in the simplifier, this
patch makes sure that the rewrite rule "win". That is, in general
a bit fragile, but it's a huge help when making specialisation work
reliably, as #21851 and #22097 showed.
The change is fairly straightforwad, and documented in
Note [Rewrite rules and inlining]
in GHC.Core.Opt.Simplify.Iteration.
Compile-times change quite a bit, but the trend is slightly
down
Metrics: compile_time/bytes allocated
-------------------------------------
Baseline
Test Metric value New value Change
-------------------------------------------------------------------------
LargeRecord(normal) ghc/alloc 6,084,071,354 4,566,082,536 -25.0% GOOD
T10421(normal) ghc/alloc 111,994,304 116,148,544 +3.7% BAD
T10421a(normal) ghc/alloc 79,150,034 83,423,664 +5.4%
T10547(normal) ghc/alloc 28,211,501 28,177,960 -0.1%
T12545(normal) ghc/alloc 1,633,149,272 1,614,005,512 -1.2%
T13253(normal) ghc/alloc 343,592,469 347,635,808 +1.2%
T14052(ghci) ghc/alloc 3,681,055,512 3,749,063,688 +1.8%
T15304(normal) ghc/alloc 1,295,512,578 1,277,067,608 -1.4%
T16577(normal) ghc/alloc 8,050,423,421 8,291,093,504 +3.0% BAD
T17516(normal) ghc/alloc 1,800,051,592 1,840,575,008 +2.3%
T17836(normal) ghc/alloc 829,913,981 812,805,232 -2.1%
T17836b(normal) ghc/alloc 45,437,680 45,057,120 -0.8%
T18223(normal) ghc/alloc 734,732,288 646,329,352 -12.0% GOOD
T3064(normal) ghc/alloc 180,023,717 182,061,608 +1.1%
T9630(normal) ghc/alloc 1,523,682,706 1,492,863,632 -2.0% GOOD
T9961(normal) ghc/alloc 358,760,821 368,223,992 +2.6% BAD
geo. mean -0.3%
minimum -25.0%
maximum +5.4%
Metric Decrease:
LargeRecord
T18223
T9630
Metric Increase:
T10421
T16577
T9961
- - - - -
30 changed files:
- compiler/GHC/Cmm/Switch.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/Reg/Liveness.hs
- compiler/GHC/CmmToLlvm/Base.hs
- compiler/GHC/Core.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/FamInstEnv.hs
- compiler/GHC/Core/Lint.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/Specialise.hs
- compiler/GHC/Core/Opt/Stats.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/Ppr.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Core/TyCon.hs
- compiler/GHC/Core/Type.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unfold/Make.hs
- compiler/GHC/CoreToIface.hs
- + compiler/GHC/Data/Unboxed.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/HsToCore.hs
- compiler/GHC/HsToCore/Binds.hs
- compiler/GHC/HsToCore/Foreign/C.hs
- compiler/GHC/HsToCore/Match/Constructor.hs
- compiler/GHC/Iface/Rename.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/c20396faef0cc3c18e57c977ac439a00d19e16ed...8651488dd9e0159844772334a2d9edb6b3e01a55
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/c20396faef0cc3c18e57c977ac439a00d19e16ed...8651488dd9e0159844772334a2d9edb6b3e01a55
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20220928/8fd290f6/attachment-0001.html>
More information about the ghc-commits
mailing list