[commit: ghc] wip/spj-early-inline2: Simon's early-inline patch, take 2 (2caa001)

git at git.haskell.org git at git.haskell.org
Tue Feb 21 23:27:27 UTC 2017


Repository : ssh://git@git.haskell.org/ghc

On branch  : wip/spj-early-inline2
Link       : http://ghc.haskell.org/trac/ghc/changeset/2caa001813d03168266935a475aff34c55091e07/ghc

>---------------------------------------------------------------

commit 2caa001813d03168266935a475aff34c55091e07
Author: David Feuer <David.Feuer at gmail.com>
Date:   Tue Feb 21 00:00:54 2017 -0500

    Simon's early-inline patch, take 2
    
    Summary:
    Refactor floating of bindings (fiBind)
    
    This is just a local refactoring.
    
    I originally planned to try floating top-level bindings inwards,
    but I backed off from that leaving only this (harmless) refactoring,
    which has no behavioural effect.
    
    I also make FloatIn into a ModGuts -> ModGuts function; again not
    necessary now, but no harm either.
    
    My attempt also used the new function CoreFVs.freeVarsBind; but
    that too is a plausible refactorig of freeVars, so I left it in too.
    
    Add -fspec-constr-keen
    
    I dicovered that the dramatic imprvoement in perf/should_run/T9339
    with the introduction of join points was really rather a fluke, and
    very fragile.
    
    The real problem (see Note [Making SpecConstr keener]) is that
    SpecConstr wasn't specialising a function even though it was applied
    to a freshly-allocated constructor.  The paper describes plausible
    reasons for this, but I think it may well be better to be a bit more
    aggressive.
    
    So this patch add -fspec-constr-keen, which makes SpecConstr a bit
    keener to specialise, by ignoring whether or not the argument
    corresponding to a call pattern is scrutinised in the function body.
    Now the gains in T9339 should be robust; and it might even be a
    better default.
    
    I'd be interested in what happens if we switched on -fspec-constr-keen
    with -O2.
    
    Occurrence-analyse the result of rule firings
    
    When studying simplCore/should_compile/T7785 I found that a long
    chain of maps
      map f (map f (map f (map f (...))))
    took an unreasonably long time to simplify.  The problem got
    worse when I started inlining in the InitialPhase, which is how
    I stumbled on it.
    
    The solution turned  out to be rather simple.  It's described in
       Note [Occurence-analyse after rule firing]
    in Simplify.hs
    
    Stop uniques ending up in SPEC rule names
    
    Make Specialise work with casts
    
    With my upcoming early-inlining patch it turned out that Specialise
    was getting stuck on casts.  This patch fixes it; see Specialise
    [Account for casts in binding].
    
    Inline data constructor wrappers in phase 2 only
    
    This patch prepares for my upcoming early-inlining patch. It arranges
    that data constructor wrappers are not inlined until Phase 2 (the
    first of the "normal" phases.)  That gives rules a chance to fire
    in the InitialPhase (aka "gentle").
    
    This has been a bit of a problem for a while, so it's nice to have
    a fix.  It should make no difference immediately, becuase currently
    nothing is inlined in the InitialPhase anyway.
    
    Combine identical case alterantives in CSE
    
    See Note [Combine case alternatives] in CSE.  This opportunity
    surfaced when I was was studying early inlining.  It's easy (and
    cheap) to exploit, and sometimes makes a worthwhile saving.
    
    Extend CSE to handle recursive bindings
    
    I came across a program with two identical recursive bindings, so
    I wondered if they could be CSE'd.  It turned out to be pretty easy
    so I did it.
    
    See Note [CSE for recursive bindings] in CSE
    
    Fix SetLevels for makeStaticPtr
    
    This too is prepartory for my early-inlining patch.  It turned
    out that early inlining exposed a bug in the way that static
    pointers were being floated.
    
    Small changes to expression sizing in CoreUnfold
    
    The only significant change here is that
    
       case e of {}
    
    should be treated like 'e', rather than like a case expression.
    We don't push a return address, for example, since 'e' is sure to
    diverge.
    
    I forget why I did this; but it will make these empty-case expressions
    (which are only there to satisfy the type checker) cost-free.
    
    The Early Inline Patch
    
    This very small patch switches on sm_inline even in the InitialPhase
    (aka "gentle" phase).   There is no reason not to... and the results
    are astonishing.
    
    I think the peformance of GHC itself improves by about 5%; and some
    programs get much smaller, quicker.  Result: across the board irmprovements in
    compile time performance.  Here are the changes in perf/compiler;
    the numbers are decreases in compiler bytes-allocated:
    
      3%   T5837
      7%   parsing001
      9%   T12234
      35%  T9020
      9%   T3064
      13%  T9961
      20%  T13056
      5%   T9872d
      5%   T9872c
      5%   T9872b
      7%   T9872a
      5%   T783
      35%  T12227
      20%  T1969
    
    Plus in perf/should_run
    
      5%   lazy-bs-alloc
    
    It wasn't as easy as it sounds: I did a raft of preparatory work in
    earlier patches.  But it's great!
    
    Add VarSet.anyDVarSet, allDVarSet
    
    I need these in a later commit.
    
    Also rename
      varSetAny  -->  anyVarSet
      varSetAll  -->  allVarSet
    for consistency with other functions; eg filterVarSet
    
    Improve pretty-printing of types
    
    When doing debug-printing it's really important that the free vars
    of a type are printed with their uniques.  The IfaceTcTyVar thing
    was a stab in that direction, but it only worked for TcTyVars, not
    TyVars.
    
    This patch does it properly, by keeping track of the free vars of the
    type when translating Type -> IfaceType, and passing that down through
    toIfaceTypeX.  Then when we find a variable, look in that set, and
    translate it to IfaceFreeTyVar if so.  (I renamed IfaceTcTyVar to
    IfaceFreeTyVar.)
    
    Fiddly but not difficult.
    
    Change -ddump-tc-trace output in TcErrors, slightly
    
    Only affects debugging
    
    Improve SetLevels for join points
    
    C.f. Trac #13286, #13236
    
    * Never destroy a join point unless it goes to top level
      See Note [Floating join point bindings]
    
    * Never float a MFE if it has a free join variable
      Note [Free join points]
    
    * Stop treating nullary join points specially
    
    * Enforce the invariant that le_join_ceil >= le_ctxt_lvl
      (Needs more thought...)
    
    Error message wibbles accumulated from the preceding patches
    
    I suppose it would be better to spread them with their individual
    changes
    
    Error message wibbles
    
    More test suite wibbles
    
    Also update the Makefile to make it easier to see what's wrong
    with one test.
    
    Reviewers: austin, goldfire, bgamari, simonmar
    
    Subscribers: thomie
    
    Differential Revision: https://phabricator.haskell.org/D3167


>---------------------------------------------------------------

2caa001813d03168266935a475aff34c55091e07
 testsuite/tests/rts/all.T                             | 2 ++
 testsuite/tests/simplCore/should_compile/Makefile     | 2 +-
 testsuite/tests/simplCore/should_compile/T8848.hs     | 0
 testsuite/tests/simplCore/should_compile/T8848.stdout | 2 ++
 testsuite/tests/th/TH_Roles2.stderr                   | 4 ++--
 5 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/testsuite/tests/rts/all.T b/testsuite/tests/rts/all.T
index 2fae73c..1cf0e8e 100644
--- a/testsuite/tests/rts/all.T
+++ b/testsuite/tests/rts/all.T
@@ -162,6 +162,8 @@ def checkDynAsm(actual_file, normaliser):
     actual_raw = read_no_crs(actual_file)
     actual_str = normaliser(actual_raw)
     actual = actual_str.split()
+    # Note that the gold linker seems to reverse ctors1 and ctors2.
+    # See #13283.
     if actual == ['initArray1', 'initArray2', 'ctors1', 'ctors2', 'success']:
         return 1
     elif actual == ['ctors1', 'ctors2', 'initArray1', 'initArray2', 'success']:
diff --git a/testsuite/tests/simplCore/should_compile/Makefile b/testsuite/tests/simplCore/should_compile/Makefile
index 590ba43..08664d7 100644
--- a/testsuite/tests/simplCore/should_compile/Makefile
+++ b/testsuite/tests/simplCore/should_compile/Makefile
@@ -16,7 +16,7 @@ T9509:
 	$(RM) -f T9509*.o T9509*.hi
 	'$(TEST_HC)' $(TEST_HC_OPTS) -O -c T9509a.hs
 	'$(TEST_HC)' $(TEST_HC_OPTS) -O -c T9509.hs  \
-              -ddump-rule-rewrites | grep SPEC
+              -ddump-rule-rewrites | { grep SPEC || true; }
         # Grep output should show a SPEC rule firing, twice
 
 T8832:
diff --git a/testsuite/tests/simplCore/should_compile/T8848.stdout b/testsuite/tests/simplCore/should_compile/T8848.stdout
new file mode 100644
index 0000000..de0d424
--- /dev/null
+++ b/testsuite/tests/simplCore/should_compile/T8848.stdout
@@ -0,0 +1,2 @@
+Rule fired: SPEC map2
+Rule fired: SPEC map2
diff --git a/testsuite/tests/th/TH_Roles2.stderr b/testsuite/tests/th/TH_Roles2.stderr
index 3027911..d2c6316 100644
--- a/testsuite/tests/th/TH_Roles2.stderr
+++ b/testsuite/tests/th/TH_Roles2.stderr
@@ -16,8 +16,8 @@ TH_Roles2.$tcT
       TH_Roles2.$trModule
       (GHC.Types.TrNameS "T"#)
       1
-      krep_a4im
-krep_a4im [InlPrag=[~]]
+      krep_a40L
+krep_a40L [InlPrag=[~]]
   = GHC.Types.KindRepFun
       (GHC.Types.KindRepVar 0)
       (GHC.Types.KindRepTYPE GHC.Types.LiftedRep)



More information about the ghc-commits mailing list