[Git][ghc/ghc][wip/T24623] Comments
Simon Peyton Jones (@simonpj)
gitlab at gitlab.haskell.org
Fri Jun 14 10:33:21 UTC 2024
Simon Peyton Jones pushed to branch wip/T24623 at Glasgow Haskell Compiler / GHC
Commits:
4bfc49f9 by Simon Peyton Jones at 2024-06-14T11:33:01+01:00
Comments
- - - - -
4 changed files:
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs
- compiler/GHC/Types/Demand.hs
Changes:
=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -1086,6 +1086,7 @@ dmdAnalRhsSig top_lvl rec_flag env let_subdmd id rhs
(final_env, weak_fvs, final_id, final_rhs)
where
ww_arity = workWrapArity id rhs
+ -- See Note [WorkWrap arity and join points, point (1)]
body_subdmd | isJoinId id = let_subdmd
| otherwise = topSubDmd
@@ -1235,47 +1236,97 @@ Consider
B -> j 4
C -> (p,7))
-If j was a vanilla function definition, we'd analyse its body with
-evalDmd, and think that it was lazy in p. But for join points we can
-do better! We know that j's body will (if called at all) be evaluated
-with the demand that consumes the entire join-binding, in this case
-the argument demand from g. Whizzo! g evaluates both components of
-its argument pair, so p will certainly be evaluated if j is called.
+If j was a vanilla function definition, we'd analyse its body with evalDmd, and
+think that it was lazy in p. But for join points we can do better! We know
+that j's body will (if called at all) be evaluated with the demand that consumes
+the entire join-binding, in this case the argument demand from g. Whizzo! g
+evaluates both components of its argument pair, so p will certainly be evaluated
+if j is called.
-For f to be strict in p, we need /all/ paths to evaluate p; in this
-case the C branch does so too, so we are fine. So, as usual, we need
-to transport demands on free variables to the call site(s). Compare
-Note [Lazy and unleashable free variables].
+For f to be strict in p, we need /all/ paths to evaluate p; in this case the C
+branch does so too, so we are fine. So, as usual, we need to transport demands
+on free variables to the call site(s). Compare Note [Lazy and unleashable free
+variables].
-The implementation is easy. When analysing a join point, we can
-analyse its body with the demand from the entire join-binding (written
-let_dmd here).
+The implementation is easy: see `body_subdmd` in`dmdAnalRhsSig`. When analysing
+a join point, we can analyse its body (after stripping off the join binders,
+here just 'y') with the demand from the entire join-binding (written `let_subdmd`
+here).
Another win for join points! #13543.
-However, note that the strictness signature for a join point can
-look a little puzzling. E.g.
+BUT see Note [Worker/wrapper arity and join points].
+Note we may analyse the rhs of a join point with a demand that is either
+bigger than, or smaller than, the number of lambdas syntactically visible.
+* More lambdas than call demands:
+ join j x = \p q r -> blah in ...
+ in a context with demand Top.
+
+* More call demands than lambdas:
+ (join j x = h in ..(j 2)..(j 3)) a b c
+
+Note [Worker/wrapper arity and join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
(join j x = \y. error "urk")
(in case v of )
( A -> j 3 ) x
( B -> j 4 )
( C -> \y. blah )
-The entire thing is in a C(1,L) context, so j's strictness signature
-will be [A]b
-meaning one absent argument, returns bottom. That seems odd because
-there's a \y inside. But it's right because when consumed in a C(1,L)
-context the RHS of the join point is indeed bottom.
+The entire thing is in a C(1,L) context, so we will analyse j's body, namely
+ \y. error "urk"
+with demand C(C(1,L)). See `rhs_subdmd` in `dmdAnalRhsSig`. That will produce
+a demand signature of <A><A>b: and indeed `j` diverges when given two arguments.
+
+BUT we do /not/ want to worker/wrapper `j` with two arguments. Suppose we have
+ join j2 :: Int -> Int -> blah
+ j2 x = rhs
+ in ...(j2 3)...(j2 4)...
+
+where j2's join-arity is 1, so calls to `j` will all have /one/ argument.
+Suppose the entire expression is in a called context (like `j` above) and `j2`
+gets the demand signature <P(L)><P(L)>, that is, strict in both arguments.
+
+we worker/wrapper'd `j2` with two args we'd get
+ join $wj2 x# y# = let x = I# x#; y = I# y# in rhs
+ j2 x = \y. case x of I# x# -> case y of I# y# -> $wj2 x# y#
+ in ...(j2 3)...(j2 4)...
+But now `$wj2`is no longer a join point. Boo.
+
+Instead if we w/w at all, we want to do so only with /one/ argument:
+ join $wj2 x# = let x = I# x# in rhs
+ j2 x = case x of I# x# -> $wj2 x#
+ in ...(j2 3)...(j2 4)...
+Now all is fine. BUT in `finaliseArgBoxities` we should trim y's boxity,
+to reflect the fact tta we aren't going to unbox `y` at all.
-Note [Demand signatures are computed for a threshold arity based on idArity]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Given a binding { f = rhs }, we compute a "threshold arity", and do demand
-analysis based on a call with that many value arguments.
+Conclusion:
-The threshold we use is
+(1) The "worker/wrapper arity" of an Id is
+ * For non-join-points: idArity
+ * The join points: the join arity (Id part only of course)
+ This is the number of args we will use in worker/wrapper.
+ See `ww_arity` in `dmdAnalRhsSig`, and the function workWrapArity.
-* Ordinary bindings: idArity f.
+(2) A join point's demand-signature arity may exceed the Id's worker/wrapper
+ arity. See the `arity_ok` assertion in `mkWwBodies`.
+
+(3) In `finaliseArgBoxities`, do trimBoxity on any argument demands beyond
+ the worker/wrapper arity.
+
+(4) In WorkWrap.splitFun, make sure we split based on the worker/wrapper
+ arity (re)-computed by workWrapArity.
+
+Note [The demand for the RHS of a binding]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Given a binding { f = rhs }, in `dmdAnalRhsSig` we compute a `rhs_subdmd` in
+which to analyse `rhs`.
+
+The demand we use is:
+
+* Ordinary bindings: a call-demand of depth (idArity f).
Why idArity arguments? Because that's a conservative estimate of how many
arguments we must feed a function before it does anything interesting with
them. Also it elegantly subsumes the trivial RHS and PAP case. E.g. for
@@ -1285,22 +1336,17 @@ The threshold we use is
idArity is /at least/ the number of manifest lambdas, but might be higher for
PAPs and trivial RHS (see Note [Demand analysis for trivial right-hand sides]).
-* Join points: the value-binder subset of the JoinArity. This can
- be less than the number of visible lambdas; e.g.
- join j x = \y. blah
- in ...(jump j 2)....(jump j 3)....
- We know that j will never be applied to more than 1 arg (its join
- arity, and we don't eta-expand join points, so here a threshold
- of 1 is the best we can do.
+* Join points: a call-demand of depth (value-binder subset of JoinArity),
+ wrapped around the incoming demand for the entire expression; see
+ Note [Demand analysis for join points]
Note that the idArity of a function varies independently of its cardinality
properties (cf. Note [idArity varies independently of dmdTypeDepth]), so we
-implicitly encode the arity for when a demand signature is sound to unleash
-in its 'dmdTypeDepth', not in its idArity (cf. Note [Understanding DmdType
-and DmdSig] in GHC.Types.Demand). It is unsound to unleash a demand
-signature when the incoming number of arguments is less than that. See
-GHC.Types.Demand Note [What are demand signatures?] for more details on
-soundness.
+implicitly encode the arity for when a demand signature is sound to unleash in
+its 'dmdTypeDepth', not in its idArity (cf. Note [Understanding DmdType and
+DmdSig] in GHC.Types.Demand). It is unsound to unleash a demand signature when
+the incoming number of arguments is less than that. See GHC.Types.Demand
+Note [DmdSig: demand signatures, and demand-sig arity].
Note that there might, in principle, be functions for which we might want to
analyse for more incoming arguments than idArity. Example:
@@ -1929,7 +1975,7 @@ finaliseArgBoxities :: AnalEnv -> Id -> Arity
-- Then:
-- dmds' is the same as dmds (including length), except for boxity info
-- rhs' is the same as rhs, except for dmd info on lambda binders
--- NB: length dmds might be greater than ww_arity
+-- NB: For join points, length dmds might be greater than ww_arity
finaliseArgBoxities env fn ww_arity arg_dmds div rhs
-- Check for an OPAQUE function: see Note [OPAQUE pragma]
@@ -1952,8 +1998,7 @@ finaliseArgBoxities env fn ww_arity arg_dmds div rhs
= (arg_dmds, rhs)
-- The normal case
- | otherwise -- NB: ww_arity might be less than
- -- manifest arity for join points
+ | otherwise
= -- pprTrace "finaliseArgBoxities" (
-- vcat [text "function:" <+> ppr fn
-- , text "max" <+> ppr max_wkr_args
@@ -1979,6 +2024,7 @@ finaliseArgBoxities env fn ww_arity arg_dmds div rhs
arg_dmds' = ww_arg_dmds ++ map trimBoxity (drop ww_arity arg_dmds)
-- If ww_arity < length arg_dmds, the leftover ones
-- will not be w/w'd, so trimBoxity them
+ -- See Note [Worker/wrapper arity and join points] point (3)
-- This is the key line, which uses almost-circular programming
-- The remaining budget from one layer becomes the initial
=====================================
compiler/GHC/Core/Opt/WorkWrap.hs
=====================================
@@ -797,6 +797,7 @@ splitFun ww_opts fn_id rhs
uf_opts = so_uf_opts (wo_simple_opts ww_opts)
fn_info = idInfo fn_id
ww_arity = workWrapArity fn_id rhs
+ -- workWrapArity: see (4) in Note [Worker/wrapper arity and join points] in DmdAnal
(wrap_dmds, div) = splitDmdSig (dmdSigInfo fn_info)
=====================================
compiler/GHC/Core/Opt/WorkWrap/Utils.hs
=====================================
@@ -294,7 +294,7 @@ isWorkerSmallEnough max_worker_args old_n_args vars
-- it takes <= 82 arguments afterwards.
workWrapArity :: Id -> CoreExpr -> Arity
--- See Note [Demand signatures are computed for a threshold arity based on idArity]
+-- See Note [Worker/wrapper arity and join points] in DmdAnal
workWrapArity fn rhs
= case idJoinPointHood fn of
JoinPoint join_arity -> count isId $ fst $ collectNBinders join_arity rhs
=====================================
compiler/GHC/Types/Demand.hs
=====================================
@@ -2084,6 +2084,11 @@ body of the function.
* *
************************************************************************
+Note [DmdSig: demand signatures, and demand-sig arity]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+See also
+ * Note [Demand signatures semantically]
+ * Note [Understanding DmdType and DmdSig]
In a let-bound Id we record its demand signature.
In principle, this demand signature is a demand transformer, mapping
a demand on the Id into a DmdType, which gives
@@ -2094,20 +2099,22 @@ a demand on the Id into a DmdType, which gives
However, in fact we store in the Id an extremely emasculated demand
transformer, namely
-
- a single DmdType
+ a single DmdType
(Nevertheless we dignify DmdSig as a distinct type.)
-This DmdType gives the demands unleashed by the Id when it is applied
-to as many arguments as are given in by the arg demands in the DmdType.
+The DmdSig for an Id is a semantic thing. Suppose a function `f` has a DmdSig of
+ DmdSig (DmdType (fv_dmds,res) [d1..dn])
+Here `n` is called the "demand-sig arity" of the DmdSig. The signature means:
+ * If you apply `f` to n arguments (the demand-sig-arity)
+ * then you can unleash demands d1..dn on the arguments
+ * and demands fv_dmds on the free variables.
Also see Note [Demand type Divergence] for the meaning of a Divergence in a
-strictness signature.
+demand signature.
-If an Id is applied to less arguments than its arity, it means that
-the demand on the function at a call site is weaker than the vanilla
-call demand, used for signature inference. Therefore we place a top
-demand on all arguments. Otherwise, the demand is specified by Id's
-signature.
+If `f` is applied to fewer value arguments than its demand-sig arity, it means
+that the demand on the function at a call site is weaker than the vanilla call
+demand, used for signature inference. Therefore we place a top demand on all
+arguments.
For example, the demand transformer described by the demand signature
DmdSig (DmdType {x -> <1L>} <A><1P(L,L)>)
@@ -2118,6 +2125,61 @@ and 1P(L,L) on the second.
If this same function is applied to one arg, all we can say is that it
uses x with 1L, and its arg with demand 1P(L,L).
+Note [Demand signatures semantically]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Demand analysis interprets expressions in the abstract domain of demand
+transformers. Given a (sub-)demand that denotes the evaluation context, the
+abstract transformer of an expression gives us back a demand type denoting
+how other things (like arguments and free vars) were used when the expression
+was evaluated. Here's an example:
+
+ f x y =
+ if x + expensive
+ then \z -> z + y * ...
+ else \z -> z * ...
+
+The abstract transformer (let's call it F_e) of the if expression (let's
+call it e) would transform an incoming (undersaturated!) head demand 1A into
+a demand type like {x-><1L>,y-><L>}<L>. In pictures:
+
+ Demand ---F_e---> DmdType
+ <1A> {x-><1L>,y-><L>}<L>
+
+Let's assume that the demand transformers we compute for an expression are
+correct wrt. to some concrete semantics for Core. How do demand signatures fit
+in? They are strange beasts, given that they come with strict rules when to
+it's sound to unleash them.
+
+Fortunately, we can formalise the rules with Galois connections. Consider
+f's strictness signature, {}<1L><L>. It's a single-point approximation of
+the actual abstract transformer of f's RHS for arity 2. So, what happens is that
+we abstract *once more* from the abstract domain we already are in, replacing
+the incoming Demand by a simple lattice with two elements denoting incoming
+arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom
+element). Here's the diagram:
+
+ A_2 -----f_f----> DmdType
+ ^ |
+ | α γ |
+ | v
+ SubDemand --F_f----> DmdType
+
+With
+ α(C(1,C(1,_))) = >=2
+ α(_) = <2
+ γ(ty) = ty
+and F_f being the abstract transformer of f's RHS and f_f being the abstracted
+abstract transformer computable from our demand signature simply by
+
+ f_f(>=2) = {}<1L><L>
+ f_f(<2) = multDmdType C_0N {}<1L><L>
+
+where multDmdType makes a proper top element out of the given demand type.
+
+In practice, the A_n domain is not just a simple Bool, but a Card, which is
+exactly the Card with which we have to multDmdType. The Card for arity n
+is computed by calling @peelManyCalls n@, which corresponds to α above.
+
Note [Understanding DmdType and DmdSig]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demand types are sound approximations of an expression's semantics relative to
@@ -2130,9 +2192,9 @@ Here is a table with demand types resulting from different incoming demands we
put that expression under. Note the monotonicity; a stronger incoming demand
yields a more precise demand type:
- incoming demand | demand type
+ incoming demand | demand type
--------------------------------
- 1A | <L><L>{}
+ 1A | <L><L>{}
C(1,C(1,L)) | <1P(L)><L>{}
C(1,C(1,1P(1P(L),A))) | <1P(A)><A>{}
@@ -2154,11 +2216,11 @@ being a newtype wrapper around DmdType, it actually encodes two things:
* A demand type that is sound to unleash when the minimum arity requirement is
met.
-Here comes the subtle part: The threshold is encoded in the wrapped demand
-type's depth! So in mkDmdSigForArity we make sure to trim the list of
-argument demands to the given threshold arity. Call sites will make sure that
-this corresponds to the arity of the call demand that elicited the wrapped
-demand type. See also Note [What are demand signatures?].
+Here comes the subtle part: The threshold is encoded in the demand-sig arity!
+So in mkDmdSigForArity we make sure to trim the list of argument demands to the
+given threshold arity. Call sites will make sure that this corresponds to the
+arity of the call demand that elicited the wrapped demand type. See also Note
+[What are demand signatures?].
-}
-- | The depth of the wrapped 'DmdType' encodes the arity at which it is safe
@@ -2369,61 +2431,6 @@ dmdTransformDictSelSig (DmdSig (DmdType _ [_ :* prod])) call_sd
dmdTransformDictSelSig sig sd = pprPanic "dmdTransformDictSelSig: no args" (ppr sig $$ ppr sd)
{-
-Note [What are demand signatures?]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Demand analysis interprets expressions in the abstract domain of demand
-transformers. Given a (sub-)demand that denotes the evaluation context, the
-abstract transformer of an expression gives us back a demand type denoting
-how other things (like arguments and free vars) were used when the expression
-was evaluated. Here's an example:
-
- f x y =
- if x + expensive
- then \z -> z + y * ...
- else \z -> z * ...
-
-The abstract transformer (let's call it F_e) of the if expression (let's
-call it e) would transform an incoming (undersaturated!) head demand 1A into
-a demand type like {x-><1L>,y-><L>}<L>. In pictures:
-
- Demand ---F_e---> DmdType
- <1A> {x-><1L>,y-><L>}<L>
-
-Let's assume that the demand transformers we compute for an expression are
-correct wrt. to some concrete semantics for Core. How do demand signatures fit
-in? They are strange beasts, given that they come with strict rules when to
-it's sound to unleash them.
-
-Fortunately, we can formalise the rules with Galois connections. Consider
-f's strictness signature, {}<1L><L>. It's a single-point approximation of
-the actual abstract transformer of f's RHS for arity 2. So, what happens is that
-we abstract *once more* from the abstract domain we already are in, replacing
-the incoming Demand by a simple lattice with two elements denoting incoming
-arity: A_2 = {<2, >=2} (where '<2' is the top element and >=2 the bottom
-element). Here's the diagram:
-
- A_2 -----f_f----> DmdType
- ^ |
- | α γ |
- | v
- SubDemand --F_f----> DmdType
-
-With
- α(C(1,C(1,_))) = >=2
- α(_) = <2
- γ(ty) = ty
-and F_f being the abstract transformer of f's RHS and f_f being the abstracted
-abstract transformer computable from our demand signature simply by
-
- f_f(>=2) = {}<1L><L>
- f_f(<2) = multDmdType C_0N {}<1L><L>
-
-where multDmdType makes a proper top element out of the given demand type.
-
-In practice, the A_n domain is not just a simple Bool, but a Card, which is
-exactly the Card with which we have to multDmdType. The Card for arity n
-is computed by calling @peelManyCalls n@, which corresponds to α above.
-
Note [Demand transformer for a dictionary selector]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Suppose we have a superclass selector 'sc_sel' and a class method
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/4bfc49f93cd7760e4375549375742d37d956c0e1
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/4bfc49f93cd7760e4375549375742d37d956c0e1
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240614/685c8588/attachment-0001.html>
More information about the ghc-commits
mailing list