[commit: ghc] master: Revert "Simplify callSiteInline a little" (9dbf66d)

git at git.haskell.org git at git.haskell.org
Tue May 15 17:05:02 UTC 2018


Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/9dbf66d74e65309d02c9d700094e363f59c94096/ghc

>---------------------------------------------------------------

commit 9dbf66d74e65309d02c9d700094e363f59c94096
Author: Ben Gamari <ben at smart-cactus.org>
Date:   Mon May 14 09:25:32 2018 -0400

    Revert "Simplify callSiteInline a little"
    
    This lead to some rather significant performance regressions.
    Specifically,
    
        bytes allocated value is too high:
            Expected    T5631(normal) bytes allocated: 1106015512 +/-5%
            Lower bound T5631(normal) bytes allocated: 1050714736
            Upper bound T5631(normal) bytes allocated: 1161316288
            Actual      T5631(normal) bytes allocated: 1164953208
            Deviation   T5631(normal) bytes allocated:        5.3 %
        *** unexpected stat test failure for T5631(normal)
        max_bytes_used value is too high:
            Expected    T9630(normal) max_bytes_used: 35324712 +/-15%
            Lower bound T9630(normal) max_bytes_used: 30026005
            Upper bound T9630(normal) max_bytes_used: 40623419
            Actual      T9630(normal) max_bytes_used: 43490984
            Deviation   T9630(normal) max_bytes_used:     23.1 %
        *** unexpected stat test failure for T9630(normal)
    
    This reverts commit 7271db46c332f15c302b9a55f4ed005cdd0fb898.
    This reverts commit b750dcc5d9e1df8505788a41d0cf0d92acd17f0d.
    This reverts commit 33de71fa06d03e6da396a7c0a314fea3b492ab91.


>---------------------------------------------------------------

9dbf66d74e65309d02c9d700094e363f59c94096
 compiler/coreSyn/CoreUnfold.hs      | 33 +++++++++++++++++----------------
 testsuite/tests/perf/compiler/all.T | 10 ++++------
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/compiler/coreSyn/CoreUnfold.hs b/compiler/coreSyn/CoreUnfold.hs
index b4f080b..c1f7892 100644
--- a/compiler/coreSyn/CoreUnfold.hs
+++ b/compiler/coreSyn/CoreUnfold.hs
@@ -1151,11 +1151,11 @@ callSiteInline dflags id active_unfolding lone_variable arg_infos cont_info
       -- idUnfolding checks for loop-breakers, returning NoUnfolding
       -- Things with an INLINE pragma may have an unfolding *and*
       -- be a loop breaker  (maybe the knot is not yet untied)
-        CoreUnfolding { uf_tmpl = unf_template
+        CoreUnfolding { uf_tmpl = unf_template, uf_is_top = is_top
                       , uf_is_work_free = is_wf
                       , uf_guidance = guidance, uf_expandable = is_exp }
           | active_unfolding -> tryUnfolding dflags id lone_variable
-                                    arg_infos cont_info unf_template
+                                    arg_infos cont_info unf_template is_top
                                     is_wf is_exp guidance
           | otherwise -> traceInline dflags id "Inactive unfolding:" (ppr id) Nothing
         NoUnfolding      -> Nothing
@@ -1175,10 +1175,10 @@ traceInline dflags inline_id str doc result
  = result
 
 tryUnfolding :: DynFlags -> Id -> Bool -> [ArgSummary] -> CallCtxt
-             -> CoreExpr -> Bool -> Bool -> UnfoldingGuidance
+             -> CoreExpr -> Bool -> Bool -> Bool -> UnfoldingGuidance
              -> Maybe CoreExpr
 tryUnfolding dflags id lone_variable
-             arg_infos cont_info unf_template
+             arg_infos cont_info unf_template is_top
              is_wf is_exp guidance
  = case guidance of
      UnfNever -> traceInline dflags id str (text "UnfNever") Nothing
@@ -1250,10 +1250,10 @@ tryUnfolding dflags id lone_variable
               CaseCtxt   -> not (lone_variable && is_exp)  -- Note [Lone variables]
               ValAppCtxt -> True                           -- Note [Cast then apply]
               RuleArgCtxt -> uf_arity > 0  -- See Note [Unfold info lazy contexts]
-              DiscArgCtxt -> uf_arity > 0  -- Note [Inlining in ArgCtxt]
+              DiscArgCtxt -> uf_arity > 0  --
               RhsCtxt     -> uf_arity > 0  --
-              _other      -> False         -- See Note [Nested functions]
-
+              _           -> not is_top && uf_arity > 0   -- Note [Nested functions]
+                                                      -- Note [Inlining in ArgCtxt]
 
 {-
 Note [Unfold into lazy contexts], Note [RHS of lets]
@@ -1323,17 +1323,18 @@ However for worker/wrapper it may be worth inlining even if the
 arity is not satisfied (as we do in the CoreUnfolding case) so we don't
 require saturation.
 
+
 Note [Nested functions]
 ~~~~~~~~~~~~~~~~~~~~~~~
-At one time we treated a call of a non-top-level function as
-"interesting" (regardless of how boring the context) in the hope
-that inlining it would eliminate the binding, and its allocation.
-Specifically, in the default case of interesting_call we had
-   _other -> not is_top && uf_arity > 0
-
-But actually postInlineUnconditionally does some of this and overall
-it makes virtually no difference to nofib.  So I simplified away this
-special case
+If a function has a nested defn we also record some-benefit, on the
+grounds that we are often able to eliminate the binding, and hence the
+allocation, for the function altogether; this is good for join points.
+But this only makes sense for *functions*; inlining a constructor
+doesn't help allocation unless the result is scrutinised.  UNLESS the
+constructor occurs just once, albeit possibly in multiple case
+branches.  Then inlining it doesn't increase allocation, but it does
+increase the chance that the constructor won't be allocated at all in
+the branches that don't use it.
 
 Note [Cast then apply]
 ~~~~~~~~~~~~~~~~~~~~~~
diff --git a/testsuite/tests/perf/compiler/all.T b/testsuite/tests/perf/compiler/all.T
index fbca75d..7718551 100644
--- a/testsuite/tests/perf/compiler/all.T
+++ b/testsuite/tests/perf/compiler/all.T
@@ -449,7 +449,7 @@ test('T5631',
         # 2014-04-04:     346389856 (x86 Windows, 64 bit machine)
         # 2014-12-01:     390199244 (Windows laptop)
         # 2016-04-06:     570137436 (amd64/Linux) many reasons
-           (wordsize(64), 1164944688, 5)]),
+           (wordsize(64), 1106015512, 5)]),
         # expected value: 774595008 (amd64/Linux):
         # expected value: 735486328 (amd64/Linux) 2012/12/12:
         # expected value: 690742040 (amd64/Linux) Call Arity improvements
@@ -464,9 +464,8 @@ test('T5631',
         # 2017-02-17:     1517484488 (amd64/Linux) Type-indexed Typeable
         # 2017-03-03:     1065147968 (amd64/Linux) Share Typeable KindReps
         # 2017-03-31:     1037482512 (amd64/Linux) Fix memory leak in simplifier
-	    # 2017-07-27:     1106015512 (Mac) Regresssion from tracking visibility in
-	    #                            TypeEqOrigin should be fixed by #14037
-        # 2018-05-05:     1164944688 (amd64/Linux) Simplify callSiteInline a little
+	# 2017-07-27:     1106015512 (Mac) Regresssion from tracking visibility in TypeEqOrigin
+	#                                  should be fixed by #14037
        only_ways(['normal'])
       ],
      compile,
@@ -1271,11 +1270,10 @@ test ('T9630',
       [ compiler_stats_num_field('max_bytes_used', # Note [residency]
           [(platform('x86_64-unknown-mingw32'),   39867088, 15),
           # 2017-12-24:                     34171816 (x64/Windows)
-          (wordsize(64), 42664296, 15)
+          (wordsize(64), 35324712, 15)
           # initial:    56955240
           # 2017-06-07: 41568168     Stop the specialiser generating loopy code
           # 2018-02-25: 35324712     It's not entirely clear
-          # 2018-05-05: 42664296     Don't inline nested function bindings as aggressively
           ]),
       extra_clean(['T9630a.hi', 'T9630a.o'])
       ],



More information about the ghc-commits mailing list