[commit: ghc] master: Don't eta-expand PAPs (fixes Trac #9020) (79e46ae)
git at git.haskell.org
git at git.haskell.org
Thu Apr 24 07:44:00 UTC 2014
Repository : ssh://git@git.haskell.org/ghc
On branch : master
Link : http://ghc.haskell.org/trac/ghc/changeset/79e46aea1643b4dfdc7c846bbefe06b83b535efd/ghc
>---------------------------------------------------------------
commit 79e46aea1643b4dfdc7c846bbefe06b83b535efd
Author: Simon Peyton Jones <simonpj at microsoft.com>
Date: Thu Apr 24 08:43:07 2014 +0100
Don't eta-expand PAPs (fixes Trac #9020)
See Note [Do not eta-expand PAPs] in SimplUtils. This has a tremendously
good effect on compile times for some simple benchmarks.
The test is now where it belongs, in perf/compiler/T9020 (instead of simpl015).
I did a nofib run and got essentially zero change except for cacheprof which
gets 4% more allocation. I investigated. Turns out that we have
instance PP Reg where
pp ppm ST_0 = "%st"
pp ppm ST_1 = "%st(1)"
pp ppm ST_2 = "%st(2)"
pp ppm ST_3 = "%st(3)"
pp ppm ST_4 = "%st(4)"
pp ppm ST_5 = "%st(5)"
pp ppm ST_6 = "%st(6)"
pp ppm ST_7 = "%st(7)"
pp ppm r = "%" ++ map toLower (show r)
That (map toLower (show r) does a lot of map/toLowers. But if we inline show
we get something like
pp ppm ST_0 = "%st"
pp ppm ST_1 = "%st(1)"
pp ppm ST_2 = "%st(2)"
pp ppm ST_3 = "%st(3)"
pp ppm ST_4 = "%st(4)"
pp ppm ST_5 = "%st(5)"
pp ppm ST_6 = "%st(6)"
pp ppm ST_7 = "%st(7)"
pp ppm EAX = map toLower (show EAX)
pp ppm EBX = map toLower (show EBX)
...etc...
and all those map/toLower calls can now be floated to top level.
This gives a 4% decrease in allocation. But it depends on inlining
a pretty big 'show' function.
With this new patch we get slightly better eta-expansion, which makes
a function look slightly bigger, which just stops it being inlined.
The previous behaviour was luck, so I'm not going to worry about
losing it.
I've added some notes to nofib/Simon-nofib-notes
>---------------------------------------------------------------
79e46aea1643b4dfdc7c846bbefe06b83b535efd
compiler/coreSyn/CoreArity.lhs | 3 +-
compiler/simplCore/SimplUtils.lhs | 44 +++++++++++++++-----
.../simpl015.hs => perf/compiler/T9020.hs} | 0
testsuite/tests/perf/compiler/all.T | 8 ++++
.../should_compile => perf/compiler}/simpl015.hs | 0
testsuite/tests/simplCore/should_compile/all.T | 1 -
6 files changed, 43 insertions(+), 13 deletions(-)
diff --git a/compiler/coreSyn/CoreArity.lhs b/compiler/coreSyn/CoreArity.lhs
index 12d4274..ca7216f 100644
--- a/compiler/coreSyn/CoreArity.lhs
+++ b/compiler/coreSyn/CoreArity.lhs
@@ -73,7 +73,8 @@ should have arity 3, regardless of f's arity.
\begin{code}
manifestArity :: CoreExpr -> Arity
--- ^ manifestArity sees how many leading value lambdas there are
+-- ^ manifestArity sees how many leading value lambdas there are,
+-- after looking through casts
manifestArity (Lam v e) | isId v = 1 + manifestArity e
| otherwise = manifestArity e
manifestArity (Tick t e) | not (tickishIsCode t) = manifestArity e
diff --git a/compiler/simplCore/SimplUtils.lhs b/compiler/simplCore/SimplUtils.lhs
index bde7b6b..a3042a7 100644
--- a/compiler/simplCore/SimplUtils.lhs
+++ b/compiler/simplCore/SimplUtils.lhs
@@ -1190,15 +1190,14 @@ because the latter is not well-kinded.
\begin{code}
tryEtaExpandRhs :: SimplEnv -> OutId -> OutExpr -> SimplM (Arity, OutExpr)
-- See Note [Eta-expanding at let bindings]
--- and Note [Eta expansion to manifest arity]
tryEtaExpandRhs env bndr rhs
= do { dflags <- getDynFlags
; (new_arity, new_rhs) <- try_expand dflags
- ; WARN( new_arity < old_arity,
- (ptext (sLit "Arity decrease:") <+> (ppr bndr <+> ppr old_arity
- <+> ppr new_arity) $$ ppr new_rhs) )
- -- Note [Arity decrease]
+ ; WARN( new_arity < old_id_arity,
+ (ptext (sLit "Arity decrease:") <+> (ppr bndr <+> ppr old_id_arity
+ <+> ppr old_arity <+> ppr new_arity) $$ ppr new_rhs) )
+ -- Note [Arity decrease] in Simplify
return (new_arity, new_rhs) }
where
try_expand dflags
@@ -1209,14 +1208,14 @@ tryEtaExpandRhs env bndr rhs
, let new_arity1 = findRhsArity dflags bndr rhs old_arity
new_arity2 = idCallArity bndr
new_arity = max new_arity1 new_arity2
- , new_arity > manifest_arity -- And the curent manifest arity isn't enough
+ , new_arity > old_arity -- And the curent manifest arity isn't enough
= do { tick (EtaExpansion bndr)
; return (new_arity, etaExpand new_arity rhs) }
| otherwise
- = return (manifest_arity, rhs)
+ = return (old_arity, rhs)
- manifest_arity = manifestArity rhs
- old_arity = idArity bndr
+ old_arity = exprArity rhs -- See Note [Do not expand eta-expand PAPs]
+ old_id_arity = idArity bndr
\end{code}
Note [Eta-expanding at let bindings]
@@ -1225,7 +1224,7 @@ We now eta expand at let-bindings, which is where the payoff comes.
The most significant thing is that we can do a simple arity analysis
(in CoreArity.findRhsArity), which we can't do for free-floating lambdas
-One useful consequence is this example:
+One useful consequence of not eta-expanding lambdas is this example:
genMap :: C a => ...
{-# INLINE genMap #-}
genMap f xs = ...
@@ -1235,7 +1234,7 @@ One useful consequence is this example:
myMap = genMap
Notice that 'genMap' should only inline if applied to two arguments.
-In the InlineRule for myMap we'll have the unfolding
+In the stable unfolding for myMap we'll have the unfolding
(\d -> genMap Int (..d..))
We do not want to eta-expand to
(\d f xs -> genMap Int (..d..) f xs)
@@ -1243,6 +1242,29 @@ because then 'genMap' will inline, and it really shouldn't: at least
as far as the programmer is concerned, it's not applied to two
arguments!
+Note [Do not eta-expand PAPs]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We used to have old_arity = manifestArity rhs, which meant that we
+would eta-expand even PAPs. But this gives no particular advantage,
+and can lead to a massive blow-up in code size, exhibited by Trac #9020.
+Suppose we have a PAP
+ foo :: IO ()
+ foo = returnIO ()
+Then we can eta-expand do
+ foo = (\eta. (returnIO () |> sym g) eta) |> g
+where
+ g :: IO () ~ State# RealWorld -> (# State# RealWorld, () #)
+
+But there is really no point in doing this, and it generates masses of
+coercions and whatnot that eventually disappear again. For T9020, GHC
+allocated 6.6G beore, and 0.8G afterwards; and residency dropped from
+1.8G to 45M.
+
+But note that this won't eta-expand, say
+ f = \g -> map g
+Does it matter not eta-expanding such functions? I'm not sure. Perhaps
+strictness analysis will have less to bite on?
+
%************************************************************************
%* *
diff --git a/testsuite/tests/simplCore/should_compile/simpl015.hs b/testsuite/tests/perf/compiler/T9020.hs
similarity index 100%
copy from testsuite/tests/simplCore/should_compile/simpl015.hs
copy to testsuite/tests/perf/compiler/T9020.hs
diff --git a/testsuite/tests/perf/compiler/all.T b/testsuite/tests/perf/compiler/all.T
index 2f4151f..2bff1c7 100644
--- a/testsuite/tests/perf/compiler/all.T
+++ b/testsuite/tests/perf/compiler/all.T
@@ -407,3 +407,11 @@ test('T6048',
# 05/03/2014 110646312 amd64/Linux Call Arity became more elaborate
],
compile,[''])
+
+test('T9020',
+ [ only_ways(['optasm']),
+ compiler_stats_num_field('bytes allocated',
+ [(wordsize(32), 40000000, 10),
+ (wordsize(64), 795469104, 10)])
+ ],
+ compile,[''])
diff --git a/testsuite/tests/simplCore/should_compile/simpl015.hs b/testsuite/tests/perf/compiler/simpl015.hs
similarity index 100%
rename from testsuite/tests/simplCore/should_compile/simpl015.hs
rename to testsuite/tests/perf/compiler/simpl015.hs
diff --git a/testsuite/tests/simplCore/should_compile/all.T b/testsuite/tests/simplCore/should_compile/all.T
index 7239ffc..616b6cc 100644
--- a/testsuite/tests/simplCore/should_compile/all.T
+++ b/testsuite/tests/simplCore/should_compile/all.T
@@ -15,7 +15,6 @@ test('simpl011', normal, compile, [''])
test('simpl012', normal, compile, [''])
test('simpl013', normal, compile, [''])
test('simpl014', normal, compile, [''])
-test('simpl015', only_ways(['optasm']), compile, [''])
test('simpl016', normal, compile, ['-dsuppress-uniques'])
test('simpl017', normal, compile_fail, [''])
test('simpl018', normal, compile, [''])
More information about the ghc-commits
mailing list