[Git][ghc/ghc][wip/T24623] Comments

Fri Jun 14 10:33:21 UTC 2024

Simon Peyton Jones pushed to branch wip/T24623 at Glasgow Haskell Compiler / GHC


Commits:
4bfc49f9 by Simon Peyton Jones at 2024-06-14T11:33:01+01:00
Comments

- - - - -


4 changed files:

- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs
- compiler/GHC/Types/Demand.hs


Changes:

=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -1086,6 +1086,7 @@ dmdAnalRhsSig top_lvl rec_flag env let_subdmd id rhs
     (final_env, weak_fvs, final_id, final_rhs)
   where
     ww_arity = workWrapArity id rhs
+      -- See Note [WorkWrap arity and join points, point (1)]
 
     body_subdmd | isJoinId id = let_subdmd
                 | otherwise   = topSubDmd
@@ -1235,47 +1236,97 @@ Consider
                    B -> j 4
                    C -> (p,7))
 
-If j was a vanilla function definition, we'd analyse its body with
-evalDmd, and think that it was lazy in p.  But for join points we can
-do better!  We know that j's body will (if called at all) be evaluated
-with the demand that consumes the entire join-binding, in this case
-the argument demand from g.  Whizzo!  g evaluates both components of
-its argument pair, so p will certainly be evaluated if j is called.
+If j was a vanilla function definition, we'd analyse its body with evalDmd, and
+think that it was lazy in p.  But for join points we can do better!  We know
+that j's body will (if called at all) be evaluated with the demand that consumes
+the entire join-binding, in this case the argument demand from g.  Whizzo!  g
+evaluates both components of its argument pair, so p will certainly be evaluated
+if j is called.
 
-For f to be strict in p, we need /all/ paths to evaluate p; in this
-case the C branch does so too, so we are fine.  So, as usual, we need
-to transport demands on free variables to the call site(s).  Compare
-Note [Lazy and unleashable free variables].
+For f to be strict in p, we need /all/ paths to evaluate p; in this case the C
+branch does so too, so we are fine.  So, as usual, we need to transport demands
+on free variables to the call site(s).  Compare Note [Lazy and unleashable free
+variables].
 
-The implementation is easy.  When analysing a join point, we can
-analyse its body with the demand from the entire join-binding (written
-let_dmd here).
+The implementation is easy: see `body_subdmd` in`dmdAnalRhsSig`.  When analysing
+a join point, we can analyse its body (after stripping off the join binders,
+here just 'y') with the demand from the entire join-binding (written `let_subdmd`
+here).
 
 Another win for join points!  #13543.
 
-However, note that the strictness signature for a join point can
-look a little puzzling.  E.g.
+BUT see Note [Worker/wrapper arity and join points].
 
+Note we may analyse the rhs of a join point with a demand that is either
+bigger than, or smaller than, the number of lambdas syntactically visible.
+* More lambdas than call demands:
+       join j x = \p q r -> blah in ...
+  in a context with demand Top.
+
+* More call demands than lambdas:
+       (join j x = h in ..(j 2)..(j 3)) a b c
+
+Note [Worker/wrapper arity and join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
     (join j x = \y. error "urk")
     (in case v of              )
     (     A -> j 3             )  x
     (     B -> j 4             )
     (     C -> \y. blah        )
 
-The entire thing is in a C(1,L) context, so j's strictness signature
-will be    [A]b
-meaning one absent argument, returns bottom.  That seems odd because
-there's a \y inside.  But it's right because when consumed in a C(1,L)
-context the RHS of the join point is indeed bottom.
+The entire thing is in a C(1,L) context, so we will analyse j's body, namely
+   \y. error "urk"
+with demand C(C(1,L)).  See `rhs_subdmd` in `dmdAnalRhsSig`.  That will produce
+a demand signature of <A><A>b: and indeed `j` diverges when given two arguments.
+
+BUT we do /not/ want to worker/wrapper `j` with two arguments.  Suppose we have
+     join j2 :: Int -> Int -> blah
+          j2 x = rhs
+     in ...(j2 3)...(j2 4)...
+
+where j2's join-arity is 1, so calls to `j` will all have /one/ argument.
+Suppose the entire expression is in a called context (like `j` above) and `j2`
+gets the demand signature <P(L)><P(L)>, that is, strict in both arguments.
+
+we worker/wrapper'd `j2` with two args we'd get
+     join $wj2 x# y# = let x = I# x#; y = I# y# in rhs
+          j2 x = \y. case x of I# x# -> case y of I# y# -> $wj2 x# y#
+     in ...(j2 3)...(j2 4)...
+But now `$wj2`is no longer a join point. Boo.
+
+Instead if we w/w at all, we want to do so only with /one/ argument:
+     join $wj2 x# = let x = I# x# in rhs
+          j2 x = case x of I# x# -> $wj2 x#
+     in ...(j2 3)...(j2 4)...
+Now all is fine.  BUT in `finaliseArgBoxities` we should trim y's boxity,
+to reflect the fact tta we aren't going to unbox `y` at all.
 
-Note [Demand signatures are computed for a threshold arity based on idArity]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Given a binding { f = rhs }, we compute a "threshold arity", and do demand
-analysis based on a call with that many value arguments.
+Conclusion:
 
-The threshold we use is
+(1) The "worker/wrapper arity" of an Id is
+    * For non-join-points: idArity
+    * The join points: the join arity (Id part only of course)
+    This is the number of args we will use in worker/wrapper.
+    See `ww_arity` in `dmdAnalRhsSig`, and the function workWrapArity.
 
-* Ordinary bindings: idArity f.
+(2) A join point's demand-signature arity may exceed the Id's worker/wrapper
+    arity.  See the `arity_ok` assertion in `mkWwBodies`.
+
+(3) In `finaliseArgBoxities`, do trimBoxity on any argument demands beyond
+    the worker/wrapper arity.
+
+(4) In WorkWrap.splitFun, make sure we split based on the worker/wrapper
+    arity (re)-computed by workWrapArity.
+
+Note [The demand for the RHS of a binding]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Given a binding { f = rhs }, in `dmdAnalRhsSig` we compute a `rhs_subdmd` in
+which to analyse `rhs`.
+
+The demand we use is:
+
+* Ordinary bindings: a call-demand of depth (idArity f).
   Why idArity arguments? Because that's a conservative estimate of how many
   arguments we must feed a function before it does anything interesting with
   them.  Also it elegantly subsumes the trivial RHS and PAP case.  E.g. for
@@ -1285,22 +1336,17 @@ The threshold we use is
   idArity is /at least/ the number of manifest lambdas, but might be higher for
   PAPs and trivial RHS (see Note [Demand analysis for trivial right-hand sides]).
 
-* Join points: the value-binder subset of the JoinArity.  This can
-  be less than the number of visible lambdas; e.g.
-     join j x = \y. blah
-     in ...(jump j 2)....(jump j 3)....
-  We know that j will never be applied to more than 1 arg (its join
-  arity, and we don't eta-expand join points, so here a threshold
-  of 1 is the best we can do.
+* Join points: a call-demand of depth (value-binder subset of JoinArity),
+  wrapped around the incoming demand for the entire expression; see
+  Note [Demand analysis for join points]
 
 Note that the idArity of a function varies independently of its cardinality
 properties (cf. Note [idArity varies independently of dmdTypeDepth]), so we
-implicitly encode the arity for when a demand signature is sound to unleash
-in its 'dmdTypeDepth', not in its idArity (cf. Note [Understanding DmdType
-and DmdSig] in GHC.Types.Demand). It is unsound to unleash a demand
-signature when the incoming number of arguments is less than that. See
-GHC.Types.Demand Note [What are demand signatures?]  for more details on
-soundness.
+implicitly encode the arity for when a demand signature is sound to unleash in
+its 'dmdTypeDepth', not in its idArity (cf. Note [Understanding DmdType and
+DmdSig] in GHC.Types.Demand). It is unsound to unleash a demand signature when
+the incoming number of arguments is less than that. See GHC.Types.Demand
+Note [DmdSig: demand signatures, and demand-sig arity].
 
 Note that there might, in principle, be functions for which we might want to
 analyse for more incoming arguments than idArity. Example:
@@ -1929,7 +1975,7 @@ finaliseArgBoxities :: AnalEnv -> Id -> Arity
 -- Then:
 --     dmds' is the same as dmds (including length), except for boxity info
 --     rhs'  is the same as rhs, except for dmd info on lambda binders

--- NB: length dmds might be greater than ww_arity
+-- NB: For join points, length dmds might be greater than ww_arity
 finaliseArgBoxities env fn ww_arity arg_dmds div rhs
 
   -- Check for an OPAQUE function: see Note [OPAQUE pragma]
@@ -1952,8 +1998,7 @@ finaliseArgBoxities env fn ww_arity arg_dmds div rhs
   = (arg_dmds, rhs)
 
   -- The normal case
-  | otherwise -- NB: ww_arity might be less than
-              -- manifest arity for join points
+  | otherwise
   = -- pprTrace "finaliseArgBoxities" (
     --   vcat [text "function:" <+> ppr fn
     --        , text "max" <+> ppr max_wkr_args
@@ -1979,6 +2024,7 @@ finaliseArgBoxities env fn ww_arity arg_dmds div rhs
     arg_dmds' = ww_arg_dmds ++ map trimBoxity (drop ww_arity arg_dmds)
                 -- If ww_arity < length arg_dmds, the leftover ones
                 -- will not be w/w'd, so trimBoxity them
+                -- See Note [Worker/wrapper arity and join points] point (3)
 
     -- This is the key line, which uses almost-circular programming
     -- The remaining budget from one layer becomes the initial


=====================================
compiler/GHC/Core/Opt/WorkWrap.hs
=====================================
@@ -797,6 +797,7 @@ splitFun ww_opts fn_id rhs
     uf_opts  = so_uf_opts (wo_simple_opts ww_opts)
     fn_info  = idInfo fn_id
     ww_arity = workWrapArity fn_id rhs
+      -- workWrapArity: see (4) in Note [Worker/wrapper arity and join points] in DmdAnal
 
     (wrap_dmds, div) = splitDmdSig (dmdSigInfo fn_info)
 


=====================================
compiler/GHC/Core/Opt/WorkWrap/Utils.hs
=====================================
@@ -294,7 +294,7 @@ isWorkerSmallEnough max_worker_args old_n_args vars
     -- it takes <= 82 arguments afterwards.
 
 workWrapArity :: Id -> CoreExpr -> Arity
--- See Note [Demand signatures are computed for a threshold arity based on idArity]
+-- See Note [Worker/wrapper arity and join points] in DmdAnal
 workWrapArity fn rhs
   = case idJoinPointHood fn of
       JoinPoint join_arity -> count isId $ fst $ collectNBinders join_arity rhs


=====================================
compiler/GHC/Types/Demand.hs
=====================================
@@ -2084,6 +2084,11 @@ body of the function.
 *                                                                      *
 ************************************************************************
 
+Note [DmdSig: demand signatures, and demand-sig arity]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+See also
+  * Note [Demand signatures semantically]
+  * Note [Understanding DmdType and DmdSig]
 In a let-bound Id we record its demand signature.
 In principle, this demand signature is a demand transformer, mapping
 a demand on the Id into a DmdType, which gives
@@ -2094,20 +2099,22 @@ a demand on the Id into a DmdType, which gives
 
 However, in fact we store in the Id an extremely emasculated demand
 transformer, namely
-
-                a single DmdType
+        a single DmdType
 (Nevertheless we dignify DmdSig as a distinct type.)
 
-This DmdType gives the demands unleashed by the Id when it is applied
-to as many arguments as are given in by the arg demands in the DmdType.
+The DmdSig for an Id is a semantic thing.  Suppose a function `f` has a DmdSig of
+  DmdSig (DmdType (fv_dmds,res) [d1..dn])
+Here `n` is called the "demand-sig arity" of the DmdSig.  The signature means:
+  * If you apply `f` to n arguments (the demand-sig-arity)
+  * then you can unleash demands d1..dn on the arguments
+  * and demands fv_dmds on the free variables.
 Also see Note [Demand type Divergence] for the meaning of a Divergence in a
-strictness signature.
+demand signature.
 
-If an Id is applied to less arguments than its arity, it means that
-the demand on the function at a call site is weaker than the vanilla
-call demand, used for signature inference. Therefore we place a top
-demand on all arguments. Otherwise, the demand is specified by Id's
-signature.
+If `f` is applied to fewer value arguments than its demand-sig arity, it means
+that the demand on the function at a call site is weaker than the vanilla call
+demand, used for signature inference. Therefore we place a top demand on all
+arguments.
 
 For example, the demand transformer described by the demand signature
         DmdSig (DmdType {x -> <1L>} <A><1P(L,L)>)
@@ -2118,6 +2125,61 @@ and 1P(L,L) on the second.
 If this same function is applied to one arg, all we can say is that it
 uses x with 1L, and its arg with demand 1P(L,L).
 
+Note [Demand signatures semantically]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Demand analysis interprets expressions in the abstract domain of demand
+transformers. Given a (sub-)demand that denotes the evaluation context, the
+abstract transformer of an expression gives us back a demand type denoting
+how other things (like arguments and free vars) were used when the expression
+was evaluated. Here's an example:
+
+  f x y =
+    if x + expensive
+      then \z -> z + y * ...
+      else \z -> z * ...
+
+The abstract transformer (let's call it F_e) of the if expression (let's
+call it e) would transform an incoming (undersaturated!) head demand 1A into
+a demand type like {x-><1L>,y-><L>}<L>. In pictures:
+
+     Demand ---F_e---> DmdType
+     <1A>              {x-><1L>,y-><L>}<L>
+
+Let's assume that the demand transformers we compute for an expression are
+correct wrt. to some concrete semantics for Core. How do demand signatures fit
+in? They are strange beasts, given that they come with strict rules when to
+it's sound to unleash them.
+
+Fortunately, we can formalise the rules with Galois connections. Consider
+f's strictness signature, {}<1L><L>. It's a single-point approximation of
+the actual abstract transformer of f's RHS for arity 2. So, what happens is that
+we abstract *once more* from the abstract domain we already are in, replacing
+the incoming Demand by a simple lattice with two elements denoting incoming
+arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom
+element). Here's the diagram:
+
+     A_2 -----f_f----> DmdType
+      ^                   |
+      | α               γ |
+      |                   v
+  SubDemand --F_f----> DmdType
+
+With
+  α(C(1,C(1,_))) = >=2
+  α(_)         =  <2
+  γ(ty)        =  ty
+and F_f being the abstract transformer of f's RHS and f_f being the abstracted
+abstract transformer computable from our demand signature simply by
+
+  f_f(>=2) = {}<1L><L>
+  f_f(<2)  = multDmdType C_0N {}<1L><L>
+
+where multDmdType makes a proper top element out of the given demand type.
+
+In practice, the A_n domain is not just a simple Bool, but a Card, which is
+exactly the Card with which we have to multDmdType. The Card for arity n
+is computed by calling @peelManyCalls n@, which corresponds to α above.
+
 Note [Understanding DmdType and DmdSig]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Demand types are sound approximations of an expression's semantics relative to
@@ -2130,9 +2192,9 @@ Here is a table with demand types resulting from different incoming demands we
 put that expression under. Note the monotonicity; a stronger incoming demand
 yields a more precise demand type:
 
-    incoming demand   |  demand type
+    incoming demand       |  demand type
     --------------------------------
-    1A                  |  <L><L>{}
+    1A                    |  <L><L>{}
     C(1,C(1,L))           |  <1P(L)><L>{}
     C(1,C(1,1P(1P(L),A))) |  <1P(A)><A>{}
 
@@ -2154,11 +2216,11 @@ being a newtype wrapper around DmdType, it actually encodes two things:
   * A demand type that is sound to unleash when the minimum arity requirement is
     met.
 
-Here comes the subtle part: The threshold is encoded in the wrapped demand
-type's depth! So in mkDmdSigForArity we make sure to trim the list of
-argument demands to the given threshold arity. Call sites will make sure that
-this corresponds to the arity of the call demand that elicited the wrapped
-demand type. See also Note [What are demand signatures?].
+Here comes the subtle part: The threshold is encoded in the demand-sig arity!
+So in mkDmdSigForArity we make sure to trim the list of argument demands to the
+given threshold arity. Call sites will make sure that this corresponds to the
+arity of the call demand that elicited the wrapped demand type. See also Note
+[What are demand signatures?].
 -}
 
 -- | The depth of the wrapped 'DmdType' encodes the arity at which it is safe
@@ -2369,61 +2431,6 @@ dmdTransformDictSelSig (DmdSig (DmdType _ [_ :* prod])) call_sd
 dmdTransformDictSelSig sig sd = pprPanic "dmdTransformDictSelSig: no args" (ppr sig $$ ppr sd)
 
 {-
-Note [What are demand signatures?]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Demand analysis interprets expressions in the abstract domain of demand
-transformers. Given a (sub-)demand that denotes the evaluation context, the
-abstract transformer of an expression gives us back a demand type denoting
-how other things (like arguments and free vars) were used when the expression
-was evaluated. Here's an example:
-
-  f x y =
-    if x + expensive
-      then \z -> z + y * ...
-      else \z -> z * ...
-
-The abstract transformer (let's call it F_e) of the if expression (let's
-call it e) would transform an incoming (undersaturated!) head demand 1A into
-a demand type like {x-><1L>,y-><L>}<L>. In pictures:
-
-     Demand ---F_e---> DmdType
-     <1A>              {x-><1L>,y-><L>}<L>
-
-Let's assume that the demand transformers we compute for an expression are
-correct wrt. to some concrete semantics for Core. How do demand signatures fit
-in? They are strange beasts, given that they come with strict rules when to
-it's sound to unleash them.
-
-Fortunately, we can formalise the rules with Galois connections. Consider
-f's strictness signature, {}<1L><L>. It's a single-point approximation of
-the actual abstract transformer of f's RHS for arity 2. So, what happens is that
-we abstract *once more* from the abstract domain we already are in, replacing
-the incoming Demand by a simple lattice with two elements denoting incoming
-arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom
-element). Here's the diagram:
-
-     A_2 -----f_f----> DmdType
-      ^                   |
-      | α               γ |
-      |                   v
-  SubDemand --F_f----> DmdType
-
-With
-  α(C(1,C(1,_))) = >=2
-  α(_)         =  <2
-  γ(ty)        =  ty
-and F_f being the abstract transformer of f's RHS and f_f being the abstracted
-abstract transformer computable from our demand signature simply by
-
-  f_f(>=2) = {}<1L><L>
-  f_f(<2)  = multDmdType C_0N {}<1L><L>
-
-where multDmdType makes a proper top element out of the given demand type.
-
-In practice, the A_n domain is not just a simple Bool, but a Card, which is
-exactly the Card with which we have to multDmdType. The Card for arity n
-is computed by calling @peelManyCalls n@, which corresponds to α above.
-
 Note [Demand transformer for a dictionary selector]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Suppose we have a superclass selector 'sc_sel' and a class method



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/4bfc49f93cd7760e4375549375742d37d956c0e1

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/4bfc49f93cd7760e4375549375742d37d956c0e1
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240614/685c8588/attachment-0001.html>