[Git][ghc/ghc][wip/set-levels-hnfs] Do not float HNFs out of lambdas
Simon Peyton Jones (@simonpj)
gitlab at gitlab.haskell.org
Thu Apr 18 13:58:17 UTC 2024
Simon Peyton Jones pushed to branch wip/set-levels-hnfs at Glasgow Haskell Compiler / GHC
Commits:
1b04603e by Simon Peyton Jones at 2024-04-18T14:56:38+01:00
Do not float HNFs out of lambdas
This MR adjusts SetLevels so that it is less eager to float a
HNF (lambda or constructor application) out of a lambda, unless
it gets to top level.
Data suggests that this change is a small net win:
* nofib bytes-allocated falls by -0.09% (but a couple go up)
* perf/should_compile bytes-allocated falls by -0.5%
* perf/should_run bytes-allocated falls by -0.1%
See !12410 for more detail.
When fiddling elsewhere, I also found that this patch had a huge
positive effect on the (very delicate) test
perf/should_run/T21839r
But that improvement doesn't show up in this MR by itself.
Metric Decrease:
MultiLayerModulesRecomp
T15703
parsing001
- - - - -
3 changed files:
- compiler/GHC/Core/Opt/SetLevels.hs
- testsuite/tests/simplCore/should_compile/T18013.stderr
- testsuite/tests/simplCore/should_run/simplrun009.hs
Changes:
=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -700,15 +700,13 @@ lvlMFE env strict_ctxt ann_expr
-- that if we'll escape a value lambda, or will go to the top level.
float_me = saves_work || saves_alloc || is_mk_static
- -- We can save work if we can move a redex outside a value lambda
- -- But if float_is_new_lam is True, then the redex is wrapped in a
- -- a new lambda, so no work is saved
- saves_work = escapes_value_lam && not float_is_new_lam
-
+ -- See Note [Saving work]
+ saves_work = escapes_value_lam -- (a)
+ && not (exprIsHNF expr) -- (b)
+ && not float_is_new_lam -- (c)
escapes_value_lam = dest_lvl `ltMajLvl` (le_ctxt_lvl env)
- -- See Note [Escaping a value lambda]
- -- See Note [Floating to the top]
+ -- See Note [Saving allocation] and Note [Floating to the top]
saves_alloc = isTopLvl dest_lvl
&& floatConsts env
&& ( not strict_ctxt -- (a)
@@ -723,30 +721,105 @@ hasFreeJoin :: LevelEnv -> DVarSet -> Bool
hasFreeJoin env fvs
= not (maxFvLevel isJoinId env fvs == tOP_LEVEL)
-{- Note [Floating to the top]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Suppose saves_work is False, i.e.
- - 'e' does not escape a value lambda (escapes_value_lam), or
- - 'e' would have added value lambdas if floated (float_is_new_lam)
-Then we may still be keen to float a sub-expression 'e' to the top level,
-for two reasons:
-
- (i) Doing so makes the function smaller, by floating out
- bottoming expressions, or integer or string literals. That in
- turn makes it easier to inline, with less duplication.
- This only matters if the floated sub-expression is inside a
- value-lambda, which in turn may be easier to inline.
-
- (ii) (Minor) Doing so may turn a dynamic allocation (done by machine
- instructions) into a static one. Minor because we are assuming
- we are not escaping a value lambda.
-
-But only do so if (saves_alloc):
- (a) the context is lazy (so we get allocation), or
- (b) the expression is a HNF (so we get allocation), or
- (c) the expression is bottoming and (i) applies
- (NB: if the expression is a lambda, (b) will apply;
- so this case only catches bottoming thunks)
+{- Note [Saving work]
+~~~~~~~~~~~~~~~~~~~~~
+The key idea in let-floating is to
+ * float a redex out of a (value) lambda
+Doing so can save an unbounded amount of work.
+But see also Note [Saving allocation].
+
+So we definitely float an expression out if
+(a) It will escape a value lambda (escapes_value_lam)
+(b) The expression is not a head-normal form (exprIsHNF); see (SW1, SW2).
+(c) Floating does not require wrapping it in value lambdas (float_is_new_lam).
+ See (SW3) below
+
+Wrinkles:
+
+(SW1) Concerning (b) I experimented with using `exprIsCheap` rather than
+ `exprIsHNF` but the latter seems better, according to nofib
+ (`spectral/mate` got 10% worse with exprIsCheap). It's really a bit of a
+ heuristic.
+
+(SW2) What about omitting (b), and hence floating HNFs as well? The danger of
+ doing so is that we end up floating out a HNF from a cold path (where it
+ might never get allocated at all) and allocating it all the time
+ regardless. Example
+ f xs = case xs of
+ [x] | x>3 -> (y,y)
+ | otherwise -> (x,y)
+ (x:xs) -> ...f xs...
+ We can float (y,y) out, but in a particular call to `f` that path might
+ not be taken, so allocating it before the definition of `f` is a waste.
+
+ See !12410 for some data comparing the effect of omitting (b) altogether,
+ This doesn't apply, though, if we float the thing to the top level; see
+ Note [Floating to the top]. Bottom line (data from !12410): adding the
+ not.exprIsHNF test to `saves_work`:
+ - Decreases compiler allocations by 0.5%
+ - Occasionally decreases runtime allocation (T12996 -2.5%)
+ - Slightly mixed effect on nofib: (puzzle -10%, mate -5%, cichelli +5%)
+ but geometric mean is -0.09%.
+ Overall, a win.
+
+(SW3) Concerning (c), if we are wrapping the thing in extra value lambdas (in
+ abs_vars), then nothing is saved. E.g.
+ f = \xyz. ...(e1[y],e2)....
+ If we float
+ lvl = \y. (e1[y],e2)
+ f = \xyz. ...(lvl y)...
+ we have saved nothing: one pair will still be allocated for each
+ call of `f`. Hence the (not float_is_new_lam) in saves_work.
+
+Note [Saving allocation]
+~~~~~~~~~~~~~~~~~~~~~~~~
+Even if `saves_work` is false, we we may want to float even cheap/HNF
+expressions out of value lambdas, for several reasons:
+
+* Doing so may save allocation. Consider
+ f = \x. .. (\y.e) ...
+ Then we'd like to avoid allocating the (\y.e) every time we call f,
+ (assuming e does not mention x). An example where this really makes a
+ difference is simplrun009.
+
+* It may allow SpecContr to fire on functions. Consider
+ f = \x. ....(f (\y.e))....
+ After floating we get
+ lvl = \y.e
+ f = \x. ....(f lvl)...
+ Now it's easier for SpecConstr to generate a robust specialisation for f.
+
+* It makes the function smaller, and hence more likely to inline. This can make
+ a big difference for string literals and bottoming expressions: see Note
+ [Floating to the top]
+
+Data suggests, however, that it is better /only/ to float HNFS, /if/ they can go
+to top level. See (SW2) of Note [Saving work]. If the expression goes to top
+level we don't pay the cost of allocating cold-path thunks described in (SW2).
+
+Hence `isTopLvl dest_lvl` in `saves_alloc`.
+
+Note [Floating to the top]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Even though Note [Saving allocation] suggests that we should not, in
+general, float HNFs, the balance change if it goes to the top:
+
+* We don't pay an allocation cost for the floated expression; it
+ just becomes static data.
+
+* Floating string literal is valuable -- no point in duplicating the
+ at each call site!
+
+* Floating bottoming expressions is valuable: they are always cold
+ paths; we don't want to duplicate them at each call site; and they
+ can be quite big, inhibiting inlining. See Note [Bottoming floats]
+
+So we float an expression to the top if:
+ (a) the context is lazy (so we get allocation), or
+ (b) the expression is a HNF (so we get allocation), or
+ (c) the expression is bottoming and floating would escape a
+ value lambda (NB: if the expression itself is a lambda, (b)
+ will apply; so this case only catches bottoming thunks)
Examples:
@@ -1127,33 +1200,6 @@ But *coercion* arguments aren’t (see Note [Coercion tokens] in
"GHC.Core.Unfold"), so we still want to float out variables applied only to
coercion arguments.
-Note [Escaping a value lambda]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We want to float even cheap expressions out of value lambdas,
-because that saves allocation. Consider
- f = \x. .. (\y.e) ...
-Then we'd like to avoid allocating the (\y.e) every time we call f,
-(assuming e does not mention x). An example where this really makes a
-difference is simplrun009.
-
-Another reason it's good is because it makes SpecContr fire on functions.
-Consider
- f = \x. ....(f (\y.e))....
-After floating we get
- lvl = \y.e
- f = \x. ....(f lvl)...
-and that is much easier for SpecConstr to generate a robust
-specialisation for.
-
-However, if we are wrapping the thing in extra value lambdas (in
-abs_vars), then nothing is saved. E.g.
- f = \xyz. ...(e1[y],e2)....
-If we float
- lvl = \y. (e1[y],e2)
- f = \xyz. ...(lvl y)...
-we have saved nothing: one pair will still be allocated for each
-call of 'f'. Hence the (not float_is_lam) in float_me.
-
************************************************************************
* *
=====================================
testsuite/tests/simplCore/should_compile/T18013.stderr
=====================================
@@ -131,9 +131,9 @@ Rule fired: Class op fmap (BUILTIN)
==================== Tidy Core ====================
Result size of Tidy Core
- = {terms: 38, types: 90, coercions: 17, joins: 0/1}
+ = {terms: 36, types: 86, coercions: 17, joins: 0/0}
--- RHS size: {terms: 37, types: 78, coercions: 17, joins: 0/1}
+-- RHS size: {terms: 35, types: 74, coercions: 17, joins: 0/0}
mapMaybeRule [InlPrag=[2]]
:: forall a b. Rule IO a b -> Rule IO (Maybe a) (Maybe b)
[GblId,
@@ -182,12 +182,6 @@ mapMaybeRule [InlPrag=[2]]
mapMaybeRule
= \ (@a) (@b) (f :: Rule IO a b) ->
case f of { Rule @s ww ww1 ->
- let {
- lvl :: Result s (Maybe b)
- [LclId, Unf=OtherCon []]
- lvl
- = T18013a.Result
- @s @(Maybe b) ww (GHC.Internal.Maybe.Nothing @b) } in
T18013a.Rule
@IO
@(Maybe a)
@@ -198,7 +192,9 @@ mapMaybeRule
(a1 :: Maybe a)
(s1 :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case a1 of {
- Nothing -> (# s1, lvl #);
+ Nothing ->
+ (# s1,
+ T18013a.Result @s @(Maybe b) ww (GHC.Internal.Maybe.Nothing @b) #);
Just x ->
case ((ww1 s2 x)
`cast` <Co:4> :: IO (Result s b)
=====================================
testsuite/tests/simplCore/should_run/simplrun009.hs
=====================================
@@ -6,7 +6,7 @@
-- It produces a nested unfold that should look something
-- like the code below. Note the 'lvl1_shW'. It is BAD
-- if this is a lambda instead; you get a lot more allocation
--- See Note [Escaping a value lambda] in GHC.Core.Opt.SetLevels
+-- See Note [Saving allocation] in GHC.Core.Opt.SetLevels
{-
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/1b04603e9d6a374a9828369ca2e1139e1c40ac68
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/1b04603e9d6a374a9828369ca2e1139e1c40ac68
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240418/18312b36/attachment-0001.html>
More information about the ghc-commits
mailing list