[Git][ghc/ghc][wip/set-levels-hnfs] Do not float HNFs out of lambdas

Fri Apr 12 08:15:44 UTC 2024


Simon Peyton Jones pushed to branch wip/set-levels-hnfs at Glasgow Haskell Compiler / GHC


Commits:
123faa9b by Simon Peyton Jones at 2024-04-12T09:15:33+01:00
Do not float HNFs out of lambdas

This MR adjusts SetLevels so that it is less eager to float a
HNF (lambda or constructor application) out of a lambda, unless
it gets to top level.

Data suggests that this change is a small net win:
 * nofib bytes-allocated falls by -0.09% (but a couple go up)
 * perf/should_compile bytes-allocated falls by -0.5%
 * perf/should_run bytes-allocated falls by -0.1%
See !12410 for more detail.

I also found a big change in the (very delicate) test
  perf/should_run/T21839r
when I was fiddling with something else; this MR made a big
difference.

- - - - -


1 changed file:

- compiler/GHC/Core/Opt/SetLevels.hs


Changes:

=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -700,15 +700,13 @@ lvlMFE env strict_ctxt ann_expr
         -- that if we'll escape a value lambda, or will go to the top level.
     float_me = saves_work || saves_alloc || is_mk_static
 
-    -- We can save work if we can move a redex outside a value lambda
-    -- But if float_is_new_lam is True, then the redex is wrapped in a
-    -- a new lambda, so no work is saved
-    saves_work = escapes_value_lam && not float_is_new_lam
-
+    -- See Note [Saving work]
+    saves_work = escapes_value_lam        -- (a)
+                 && not (exprIsHNF expr)  -- (b)
+                 && not float_is_new_lam  -- (c)
     escapes_value_lam = dest_lvl `ltMajLvl` (le_ctxt_lvl env)
-                  -- See Note [Escaping a value lambda]
 
-    -- See Note [Floating to the top]
+    -- See Note [Saving allocation] and Note [Floating to the top]
     saves_alloc =  isTopLvl dest_lvl
                 && floatConsts env
                 && (   not strict_ctxt                     -- (a)
@@ -723,30 +721,98 @@ hasFreeJoin :: LevelEnv -> DVarSet -> Bool
 hasFreeJoin env fvs
   = not (maxFvLevel isJoinId env fvs == tOP_LEVEL)
 
-{- Note [Floating to the top]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Suppose saves_work is False, i.e.
- - 'e' does not escape a value lambda (escapes_value_lam), or
- - 'e' would have added value lambdas if floated (float_is_new_lam)
-Then we may still be keen to float a sub-expression 'e' to the top level,
-for two reasons:
-
- (i) Doing so makes the function smaller, by floating out
-     bottoming expressions, or integer or string literals.  That in
-     turn makes it easier to inline, with less duplication.
-     This only matters if the floated sub-expression is inside a
-     value-lambda, which in turn may be easier to inline.
-
- (ii) (Minor) Doing so may turn a dynamic allocation (done by machine
-      instructions) into a static one. Minor because we are assuming
-      we are not escaping a value lambda.
-
-But only do so if (saves_alloc):
-     (a) the context is lazy (so we get allocation), or
-     (b) the expression is a HNF (so we get allocation), or
-     (c) the expression is bottoming and (i) applies
-         (NB: if the expression is a lambda, (b) will apply;
-              so this case only catches bottoming thunks)
+{- Note [Saving work]
+~~~~~~~~~~~~~~~~~~~~~
+The key idea in let-floating is to
+  * float a redex out of a (value) lambda
+Doing so can save an unbounded amount of work.
+But see also Note [Saving allocation].
+
+So we definitely float an expression out if
+(a) It will escape a value lambda (escapes_value_lam)
+(b) The expression is not a head-normal form (exprIsHNF); see (SW1).
+(c) Floating does not require wrapping it in value lambdas (float_is_new_lam).
+    See (SW3) below
+
+Wrinkles:
+
+(SW1) Concerning (b) I experimented with using `exprIsCheap` rather than
+      `exprIsHNF` but the latter seems better, according to nofib
+      (`spectral/mate` got 10% worse with exprIsCheap).  It's really a bit of a
+      heuristic.
+
+(SW2) What about omitting (b), and hence floating HNFs as well?  The danger of doing
+      so is that we end up floating out a HNF from a cold path (where it might never
+      get allocated at all) and allocating it all the time regardless.  Example
+          f xs = case xs of
+                   [x] | x>3       -> (y,y)
+                       | otherwise -> (x,y)
+                   (x:xs) -> ...
+      We can float (y,y) out, but in a particular call to `f` that path might
+      not be taken, so allocating it before the definition of `f` is a waste.
+
+      See !12410 for some data comparing the effect of omitting (b) altogether,
+      This doesn't apply, though, if we float the thing to the top level; see
+      Note [Floating to the top]
+
+(SW3) Concerning (c), if we are wrapping the thing in extra value lambdas (in
+      abs_vars), then nothing is saved.  E.g.
+        f = \xyz. ...(e1[y],e2)....
+      If we float
+        lvl = \y. (e1[y],e2)
+        f = \xyz. ...(lvl y)...
+      we have saved nothing: one pair will still be allocated for each
+      call of `f`.  Hence the (not float_is_new_lam) in saves_work.
+
+Note [Saving allocation: saves_alloc]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Even if `saves_work` is false, we we may want to float even cheap/HNF
+expressions out of value lambdas, for several reasons:
+
+* Doing so may save allocation. Consider
+        f = \x.  .. (\y.e) ...
+  Then we'd like to avoid allocating the (\y.e) every time we call f,
+  (assuming e does not mention x). An example where this really makes a
+  difference is simplrun009.
+
+* It may allow SpecContr to fire on functions. Consider
+        f = \x. ....(f (\y.e))....
+  After floating we get
+        lvl = \y.e
+        f = \x. ....(f lvl)...
+  Now it's easier for SpecConstr to generate a robust specialisation for f.
+
+* It makes the function smaller, and hence more likely to inline.  This can make
+  a big difference for string literals and bottoming expressions: see Note
+  [Floating to the top]
+
+Data suggests, however, that it is better /only/ to float HNFS, /if/ they can go
+to top level. See (SW2) of Note [Saving work].  If the expression goes to top
+level we don't pay the cost of allocating cold-path thunks described in (SW2).
+
+Hence `isTopLvl dest_lvl` in `saves_alloc`.
+
+Note [Floating to the top]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+Even though Note [Saving allocation] suggests that we should not, in
+general, float HNFs, the balance change if it goes to the top:
+
+* We don't pay an allocation cost for the floated expression; it
+  just becomes static data.
+
+* Floating string literal is valuable -- no point in duplicating the
+  at each call site!
+
+* Floating bottoming expressions is valuable: they are always cold
+  paths; we don't want to duplicate them at each call site; and they
+  can be quite big, inhibiting inlining. See Note [Bottoming floats]
+
+So we float an expression to the top if:
+  (a) the context is lazy (so we get allocation), or
+  (b) the expression is a HNF (so we get allocation), or
+  (c) the expression is bottoming and floating would escape a
+      value lambda (NB: if the expression itself is a lambda, (b)
+      will apply; so this case only catches bottoming thunks)
 
 Examples:
 
@@ -1127,33 +1193,6 @@ But *coercion* arguments aren’t (see Note [Coercion tokens] in
 "GHC.Core.Unfold"), so we still want to float out variables applied only to
 coercion arguments.
 
-Note [Escaping a value lambda]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-We want to float even cheap expressions out of value lambdas,
-because that saves allocation.  Consider
-        f = \x.  .. (\y.e) ...
-Then we'd like to avoid allocating the (\y.e) every time we call f,
-(assuming e does not mention x). An example where this really makes a
-difference is simplrun009.
-
-Another reason it's good is because it makes SpecContr fire on functions.
-Consider
-        f = \x. ....(f (\y.e))....
-After floating we get
-        lvl = \y.e
-        f = \x. ....(f lvl)...
-and that is much easier for SpecConstr to generate a robust
-specialisation for.
-
-However, if we are wrapping the thing in extra value lambdas (in
-abs_vars), then nothing is saved.  E.g.
-        f = \xyz. ...(e1[y],e2)....
-If we float
-        lvl = \y. (e1[y],e2)
-        f = \xyz. ...(lvl y)...
-we have saved nothing: one pair will still be allocated for each
-call of 'f'.  Hence the (not float_is_lam) in float_me.
-
 
 ************************************************************************
 *                                                                      *



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/123faa9b7ffda0d8aabc8e401d74d63c783db3f3

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/123faa9b7ffda0d8aabc8e401d74d63c783db3f3
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240412/6005b8ec/attachment-0001.html>