[commit: ghc] master: Revert "Simplify callSiteInline a little" (9dbf66d)
git at git.haskell.org
git at git.haskell.org
Tue May 15 17:05:02 UTC 2018
Repository : ssh://git@git.haskell.org/ghc
On branch : master
Link : http://ghc.haskell.org/trac/ghc/changeset/9dbf66d74e65309d02c9d700094e363f59c94096/ghc
>---------------------------------------------------------------
commit 9dbf66d74e65309d02c9d700094e363f59c94096
Author: Ben Gamari <ben at smart-cactus.org>
Date: Mon May 14 09:25:32 2018 -0400
Revert "Simplify callSiteInline a little"
This lead to some rather significant performance regressions.
Specifically,
bytes allocated value is too high:
Expected T5631(normal) bytes allocated: 1106015512 +/-5%
Lower bound T5631(normal) bytes allocated: 1050714736
Upper bound T5631(normal) bytes allocated: 1161316288
Actual T5631(normal) bytes allocated: 1164953208
Deviation T5631(normal) bytes allocated: 5.3 %
*** unexpected stat test failure for T5631(normal)
max_bytes_used value is too high:
Expected T9630(normal) max_bytes_used: 35324712 +/-15%
Lower bound T9630(normal) max_bytes_used: 30026005
Upper bound T9630(normal) max_bytes_used: 40623419
Actual T9630(normal) max_bytes_used: 43490984
Deviation T9630(normal) max_bytes_used: 23.1 %
*** unexpected stat test failure for T9630(normal)
This reverts commit 7271db46c332f15c302b9a55f4ed005cdd0fb898.
This reverts commit b750dcc5d9e1df8505788a41d0cf0d92acd17f0d.
This reverts commit 33de71fa06d03e6da396a7c0a314fea3b492ab91.
>---------------------------------------------------------------
9dbf66d74e65309d02c9d700094e363f59c94096
compiler/coreSyn/CoreUnfold.hs | 33 +++++++++++++++++----------------
testsuite/tests/perf/compiler/all.T | 10 ++++------
2 files changed, 21 insertions(+), 22 deletions(-)
diff --git a/compiler/coreSyn/CoreUnfold.hs b/compiler/coreSyn/CoreUnfold.hs
index b4f080b..c1f7892 100644
--- a/compiler/coreSyn/CoreUnfold.hs
+++ b/compiler/coreSyn/CoreUnfold.hs
@@ -1151,11 +1151,11 @@ callSiteInline dflags id active_unfolding lone_variable arg_infos cont_info
-- idUnfolding checks for loop-breakers, returning NoUnfolding
-- Things with an INLINE pragma may have an unfolding *and*
-- be a loop breaker (maybe the knot is not yet untied)
- CoreUnfolding { uf_tmpl = unf_template
+ CoreUnfolding { uf_tmpl = unf_template, uf_is_top = is_top
, uf_is_work_free = is_wf
, uf_guidance = guidance, uf_expandable = is_exp }
| active_unfolding -> tryUnfolding dflags id lone_variable
- arg_infos cont_info unf_template
+ arg_infos cont_info unf_template is_top
is_wf is_exp guidance
| otherwise -> traceInline dflags id "Inactive unfolding:" (ppr id) Nothing
NoUnfolding -> Nothing
@@ -1175,10 +1175,10 @@ traceInline dflags inline_id str doc result
= result
tryUnfolding :: DynFlags -> Id -> Bool -> [ArgSummary] -> CallCtxt
- -> CoreExpr -> Bool -> Bool -> UnfoldingGuidance
+ -> CoreExpr -> Bool -> Bool -> Bool -> UnfoldingGuidance
-> Maybe CoreExpr
tryUnfolding dflags id lone_variable
- arg_infos cont_info unf_template
+ arg_infos cont_info unf_template is_top
is_wf is_exp guidance
= case guidance of
UnfNever -> traceInline dflags id str (text "UnfNever") Nothing
@@ -1250,10 +1250,10 @@ tryUnfolding dflags id lone_variable
CaseCtxt -> not (lone_variable && is_exp) -- Note [Lone variables]
ValAppCtxt -> True -- Note [Cast then apply]
RuleArgCtxt -> uf_arity > 0 -- See Note [Unfold info lazy contexts]
- DiscArgCtxt -> uf_arity > 0 -- Note [Inlining in ArgCtxt]
+ DiscArgCtxt -> uf_arity > 0 --
RhsCtxt -> uf_arity > 0 --
- _other -> False -- See Note [Nested functions]
-
+ _ -> not is_top && uf_arity > 0 -- Note [Nested functions]
+ -- Note [Inlining in ArgCtxt]
{-
Note [Unfold into lazy contexts], Note [RHS of lets]
@@ -1323,17 +1323,18 @@ However for worker/wrapper it may be worth inlining even if the
arity is not satisfied (as we do in the CoreUnfolding case) so we don't
require saturation.
+
Note [Nested functions]
~~~~~~~~~~~~~~~~~~~~~~~
-At one time we treated a call of a non-top-level function as
-"interesting" (regardless of how boring the context) in the hope
-that inlining it would eliminate the binding, and its allocation.
-Specifically, in the default case of interesting_call we had
- _other -> not is_top && uf_arity > 0
-
-But actually postInlineUnconditionally does some of this and overall
-it makes virtually no difference to nofib. So I simplified away this
-special case
+If a function has a nested defn we also record some-benefit, on the
+grounds that we are often able to eliminate the binding, and hence the
+allocation, for the function altogether; this is good for join points.
+But this only makes sense for *functions*; inlining a constructor
+doesn't help allocation unless the result is scrutinised. UNLESS the
+constructor occurs just once, albeit possibly in multiple case
+branches. Then inlining it doesn't increase allocation, but it does
+increase the chance that the constructor won't be allocated at all in
+the branches that don't use it.
Note [Cast then apply]
~~~~~~~~~~~~~~~~~~~~~~
diff --git a/testsuite/tests/perf/compiler/all.T b/testsuite/tests/perf/compiler/all.T
index fbca75d..7718551 100644
--- a/testsuite/tests/perf/compiler/all.T
+++ b/testsuite/tests/perf/compiler/all.T
@@ -449,7 +449,7 @@ test('T5631',
# 2014-04-04: 346389856 (x86 Windows, 64 bit machine)
# 2014-12-01: 390199244 (Windows laptop)
# 2016-04-06: 570137436 (amd64/Linux) many reasons
- (wordsize(64), 1164944688, 5)]),
+ (wordsize(64), 1106015512, 5)]),
# expected value: 774595008 (amd64/Linux):
# expected value: 735486328 (amd64/Linux) 2012/12/12:
# expected value: 690742040 (amd64/Linux) Call Arity improvements
@@ -464,9 +464,8 @@ test('T5631',
# 2017-02-17: 1517484488 (amd64/Linux) Type-indexed Typeable
# 2017-03-03: 1065147968 (amd64/Linux) Share Typeable KindReps
# 2017-03-31: 1037482512 (amd64/Linux) Fix memory leak in simplifier
- # 2017-07-27: 1106015512 (Mac) Regresssion from tracking visibility in
- # TypeEqOrigin should be fixed by #14037
- # 2018-05-05: 1164944688 (amd64/Linux) Simplify callSiteInline a little
+ # 2017-07-27: 1106015512 (Mac) Regresssion from tracking visibility in TypeEqOrigin
+ # should be fixed by #14037
only_ways(['normal'])
],
compile,
@@ -1271,11 +1270,10 @@ test ('T9630',
[ compiler_stats_num_field('max_bytes_used', # Note [residency]
[(platform('x86_64-unknown-mingw32'), 39867088, 15),
# 2017-12-24: 34171816 (x64/Windows)
- (wordsize(64), 42664296, 15)
+ (wordsize(64), 35324712, 15)
# initial: 56955240
# 2017-06-07: 41568168 Stop the specialiser generating loopy code
# 2018-02-25: 35324712 It's not entirely clear
- # 2018-05-05: 42664296 Don't inline nested function bindings as aggressively
]),
extra_clean(['T9630a.hi', 'T9630a.o'])
],
More information about the ghc-commits
mailing list