[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 3 commits: WorkWrap: Unboxing unboxed tuples is not always useful (#22388)
Marge Bot (@marge-bot)
gitlab at gitlab.haskell.org
Thu Nov 10 17:44:53 UTC 2022
Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC
Commits:
b85a8080 by Sebastian Graf at 2022-11-10T12:44:36-05:00
WorkWrap: Unboxing unboxed tuples is not always useful (#22388)
See Note [Unboxing through unboxed tuples].
Fixes #22388.
- - - - -
b69660e3 by Sebastian Graf at 2022-11-10T12:44:36-05:00
Boxity: Handle argument budget of unboxed tuples correctly (#21737)
Now Budget roughly tracks the combined width of all arguments after unarisation.
See the changes to `Note [Worker argument budgets]`.
Fixes #21737.
- - - - -
7027c04d by Simon Peyton Jones at 2022-11-10T12:44:37-05:00
Add a fast path for data constructor workers
See Note [Fast path for data constructors] in
GHC.Core.Opt.Simplify.Iteration
This bypasses lots of expensive logic, in the special case of
applications of data constructors. It is a surprisingly worthwhile
improvement, as you can see in the figures below.
Metrics: compile_time/bytes allocated
------------------------------------------------
CoOpt_Read(normal) -2.0%
CoOpt_Singletons(normal) -2.0%
ManyConstructors(normal) -1.3%
T10421(normal) -1.9% GOOD
T10421a(normal) -1.5%
T10858(normal) -1.6%
T11545(normal) -1.7%
T12234(optasm) -1.3%
T12425(optasm) -1.9% GOOD
T13035(normal) -1.0% GOOD
T13056(optasm) -1.8%
T13253(normal) -3.3% GOOD
T15164(normal) -1.7%
T15304(normal) -3.4%
T15630(normal) -2.8%
T16577(normal) -4.3% GOOD
T17096(normal) -1.1%
T17516(normal) -3.1%
T18282(normal) -1.9%
T18304(normal) -1.2%
T18698a(normal) -1.2% GOOD
T18698b(normal) -1.5% GOOD
T18923(normal) -1.3%
T1969(normal) -1.3% GOOD
T19695(normal) -4.4% GOOD
T21839c(normal) -2.7% GOOD
T21839r(normal) -2.7% GOOD
T4801(normal) -3.8% GOOD
T5642(normal) -3.1% GOOD
T6048(optasm) -2.5% GOOD
T9020(optasm) -2.7% GOOD
T9630(normal) -2.1% GOOD
T9961(normal) -11.7% GOOD
WWRec(normal) -1.0%
geo. mean -1.1%
minimum -11.7%
maximum +0.1%
Metric Decrease:
T10421
T12425
T13035
T13253
T16577
T18698a
T18698b
T1969
T19695
T21839c
T21839r
T4801
T5642
T6048
T9020
T9630
T9961
- - - - -
12 changed files:
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs
- compiler/GHC/Core/Rules.hs
- compiler/GHC/Types/Id/Make.hs
- + testsuite/tests/stranal/should_compile/T22388.hs
- + testsuite/tests/stranal/should_compile/T22388.stderr
- testsuite/tests/stranal/should_compile/all.T
- + testsuite/tests/stranal/sigs/T21737.hs
- + testsuite/tests/stranal/sigs/T21737.stderr
- testsuite/tests/stranal/sigs/all.T
Changes:
=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -45,6 +45,7 @@ import GHC.Builtin.PrimOps
import GHC.Builtin.Types.Prim ( realWorldStatePrimTy )
import GHC.Types.Unique.Set
import GHC.Types.Unique.MemoFun
+import GHC.Types.RepType
{-
@@ -1762,7 +1763,7 @@ Note [Worker argument budget]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In 'finaliseArgBoxities' we don't want to generate workers with zillions of
argument when, say given a strict record with zillions of fields. So we
-limit the maximum number of worker args to the maximum of
+limit the maximum number of worker args ('max_wkr_args') to the maximum of
- -fmax-worker-args=N
- The number of args in the original function; if it already has has
zillions of arguments we don't want to seek /fewer/ args in the worker.
@@ -1771,10 +1772,91 @@ limit the maximum number of worker args to the maximum of
We pursue a "layered" strategy for unboxing: we unbox the top level of the
argument(s), subject to budget; if there are any arguments left we unbox the
next layer, using that depleted budget.
+Unboxing an argument *increases* the budget for the inner layer roughly
+according to how many registers that argument takes (unboxed tuples take
+multiple registers, see below), as determined by 'unariseArity'.
+Budget is spent when we have to pass a non-absent field as a parameter.
To achieve this, we use the classic almost-circular programming technique in
which we we write one pass that takes a lazy list of the Budgets for every
-layer.
+layer. The effect is that of a breadth-first search (over argument type and
+demand structure) to compute Budgets followed by a depth-first search to
+construct the product demands, but laziness allows us to do it all in one
+pass and without intermediate data structures.
+
+Suppose we have -fmax-worker-args=4 for the remainder of this Note.
+Then consider this example function:
+
+ boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+ boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+
+With a budget of 4 args to spend (number of args is only 2), we'd be served well
+to unbox both pairs, but not the triple. Indeed, that is what the algorithm
+computes, and the following pictogram shows how the budget layers are computed.
+Each layer is started with `n ~>`, where `n` is the budget at the start of the
+layer. We write -n~> when we spend budget (and n is the remaining budget) and
++n~> when we earn budget. We separate unboxed args with ][ and indicate
+inner budget threads becoming negative in braces {{}}, so that we see which
+unboxing decision we do *not* commit to. Without further ado:
+
+ 4 ~> ][ (a,b) -3~> ][ (c, ...) -2~>
+ ][ | | ][ | |
+ ][ | +-------------+ ][ | +-----------------+
+ ][ | | ][ | |
+ ][ v v ][ v v
+ 2 ~> ][ +3~> a -2~> ][ b -1~> ][ +2~> c -1~> ][ (d, e, f) -0~>
+ ][ | ][ | ][ | ][ {{ | | | }}
+ ][ | ][ | ][ | ][ {{ | | +----------------+ }}
+ ][ v ][ v ][ v ][ {{ v +------v v }}
+ 0 ~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ {{ +1~> d -0~> ][ e -(-1)~> ][ f -(-2)~> }}
+
+Unboxing increments the budget we have on the next layer (because we don't need
+to retain the boxed arg), but in turn the inner layer must afford to retain all
+non-absent fields, each decrementing the budget. Note how the budget becomes
+negative when trying to unbox the triple and the unboxing decision is "rolled
+back". This is done by the 'positiveTopBudget' guard.
+
+There's a bit of complication as a result of handling unboxed tuples correctly;
+specifically, handling nested unboxed tuples. Consider (#21737)
+
+ unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+ unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+
+Recall that unboxed tuples will be flattened to individual arguments during
+unarisation. Here, `unboxed` will have 5 arguments at runtime because of the
+nested unboxed tuple, which will be flattened to 4 args. So it's best to leave
+`(a,b)` boxed (because we already are above our arg threshold), but unbox `c`
+through `f` because that doesn't increase the number of args post unarisation.
+
+Note that the challenge is that syntactically, `(# d, e, f #)` occurs in a
+deeper layer than `(a, b)`. Treating unboxed tuples as a regular data type, we'd
+make the same unboxing decisions as for `boxed` above; although our starting
+budget is 5 (Here, the number of args is greater than -fmax-worker-args), it's
+not enough to unbox the triple (we'd finish with budget -1). So we'd unbox `a`
+through `c`, but not `d` through `f`, which is silly, because then we'd end up
+having 6 arguments at runtime, of which `d` through `f` weren't unboxed.
+
+Hence we pretend that the fields of unboxed tuples appear in the same budget
+layer as the tuple itself. For example at the top-level, `(# x,y #)` is to be
+treated just like two arguments `x` and `y`.
+Of course, for that to work, our budget calculations must initialise
+'max_wkr_args' to 5, based on the 'unariseArity' of each Core arg: That would be
+1 for the pair and 4 for the unboxed pair. Then when we decide whether to unbox
+the unboxed pair, we *directly* recurse into the fields, spending our budget
+on retaining `c` and (after recursing once more) `d` through `f` as arguments,
+depleting our budget completely in the first layer. Pictorially:
+
+ 5 ~> ][ (a,b) -4~> ][ (# c, ... #)
+ ][ {{ | | }} ][ c -3~> ][ (# d, e, f #)
+ ][ {{ | +-------+ }} ][ | ][ d -2~> ][ e -1~> ][ f -0~>
+ ][ {{ | | }} ][ | ][ | ][ | ][ |
+ ][ {{ v v }} ][ v ][ v ][ v ][ v
+ 0 ~> ][ {{ +1~> a -0~> ][ b -(-1)~> }} ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~>
+
+As you can see, we have no budget left to justify unboxing `(a,b)` on the second
+layer, which is good, because it would increase the number of args. Also note
+that we can still unbox `c` through `f` in this layer, because doing so has a
+net zero effect on budget.
Note [The OPAQUE pragma and avoiding the reboxing of arguments]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1795,10 +1877,17 @@ W/W-transformation code that boxed arguments of 'f' must definitely be passed
along in boxed form and as such dissuade the creation of reboxing workers.
-}
-data Budgets = MkB Arity Budgets -- An infinite list of arity budgets
+-- | How many registers does this type take after unarisation?
+unariseArity :: Type -> Arity
+unariseArity ty = length (typePrimRep ty)
-incTopBudget :: Budgets -> Budgets
-incTopBudget (MkB n bg) = MkB (n+1) bg
+data Budgets = MkB !Arity Budgets -- An infinite list of arity budgets
+
+earnTopBudget :: Budgets -> Budgets
+earnTopBudget (MkB n bg) = MkB (n+1) bg
+
+spendTopBudget :: Arity -> Budgets -> Budgets
+spendTopBudget m (MkB n bg) = MkB (n-m) bg
positiveTopBudget :: Budgets -> Bool
positiveTopBudget (MkB n _) = n >= 0
@@ -1811,7 +1900,8 @@ finaliseArgBoxities env fn arity rhs div
-- Then there are no binders; we don't worker/wrapper; and we
-- simply want to give f the same demand signature as g
- | otherwise
+ | otherwise -- NB: arity is the threshold_arity, which might be less than
+ -- manifest arity for join points
= -- pprTrace "finaliseArgBoxities" (
-- vcat [text "function:" <+> ppr fn
-- , text "dmds before:" <+> ppr (map idDemandInfo (filter isId bndrs))
@@ -1823,8 +1913,10 @@ finaliseArgBoxities env fn arity rhs div
where
opts = ae_opts env
(bndrs, _body) = collectBinders rhs
- max_wkr_args = dmd_max_worker_args opts `max` arity
- -- See Note [Worker argument budget]
+ unarise_arity = sum [ unariseArity (idType b) | b <- bndrs, isId b ]
+ max_wkr_args = dmd_max_worker_args opts `max` unarise_arity
+ -- This is the budget initialisation step of
+ -- Note [Worker argument budget]
-- This is the key line, which uses almost-circular programming
-- The remaining budget from one layer becomes the initial
@@ -1868,22 +1960,49 @@ finaliseArgBoxities env fn arity rhs div
= case wantToUnboxArg env ty str_mark dmd of
DropAbsent -> (bg, dmd)
- DontUnbox | is_bot_fn, isTyVarTy ty -> (decremented_bg, dmd)
- | otherwise -> (decremented_bg, trimBoxity dmd)
+ DontUnbox | is_bot_fn, isTyVarTy ty -> (retain_budget, dmd)
+ | otherwise -> (retain_budget, trimBoxity dmd)
-- If bot: Keep deep boxity even though WW won't unbox
-- See Note [Boxity for bottoming functions] case (A)
-- trimBoxity: see Note [No lazy, Unboxed demands in demand signature]
-
- DoUnbox triples -> (MkB (bg_top-1) final_bg_inner, final_dmd)
where
- (bg_inner', dmds') = go_args (incTopBudget bg_inner) triples
- -- incTopBudget: give one back for the arg we are unboxing
+ retain_budget = spendTopBudget (unariseArity ty) bg
+ -- spendTopBudget: spend from our budget the cost of the
+ -- retaining the arg
+ -- The unboxed case does happen here, for example
+ -- app g x = g x :: (# Int, Int #)
+ -- here, `x` is used `L`azy and thus Boxed
+
+ DoUnbox triples
+ | isUnboxedTupleType ty
+ , (bg', dmds') <- go_args bg triples
+ -> (bg', n :* (mkProd Unboxed $! dmds'))
+ -- See Note [Worker argument budget]
+ -- unboxed tuples are always unboxed, deeply
+ -- NB: Recurse with bg, *not* bg_inner! The unboxed fields
+ -- are at the same budget layer.
+
+ | isUnboxedSumType ty
+ -> pprPanic "Unboxing through unboxed sum" (ppr fn <+> ppr ty)
+ -- We currently don't return DoUnbox for unboxed sums.
+ -- But hopefully we will at some point. When that happens,
+ -- it would still be impossible to predict the effect
+ -- of dropping absent fields and unboxing others on the
+ -- unariseArity of the sum without losing sanity.
+ -- We could overwrite bg_top with the one from
+ -- retain_budget while still unboxing inside the alts as in
+ -- the tuple case for a conservative solution, though.
+
+ | otherwise
+ -> (spendTopBudget 1 (MkB bg_top final_bg_inner), final_dmd)
+ where
+ (bg_inner', dmds') = go_args (earnTopBudget bg_inner) triples
+ -- earnTopBudget: give back the cost of retaining the
+ -- arg we are insted unboxing.
dmd' = n :* (mkProd Unboxed $! dmds')
- (final_bg_inner, final_dmd)
+ ~(final_bg_inner, final_dmd) -- "~": This match *must* be lazy!
| positiveTopBudget bg_inner' = (bg_inner', dmd')
| otherwise = (bg_inner, trimBoxity dmd)
- where
- decremented_bg = MkB (bg_top-1) bg_inner
add_demands :: [Demand] -> CoreExpr -> CoreExpr
-- Attach the demands to the outer lambdas of this expression
=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -1497,9 +1497,10 @@ rebuild env expr cont
ApplyToTy { sc_arg_ty = ty, sc_cont = cont}
-> rebuild env (App expr (Type ty)) cont
- ApplyToVal { sc_arg = arg, sc_env = se, sc_dup = dup_flag, sc_cont = cont}
+ ApplyToVal { sc_arg = arg, sc_env = se, sc_dup = dup_flag
+ , sc_cont = cont, sc_hole_ty = fun_ty }
-- See Note [Avoid redundant simplification]
- -> do { (_, _, arg') <- simplArg env dup_flag se arg
+ -> do { (_, _, arg') <- simplArg env dup_flag fun_ty se arg
; rebuild env (App expr arg') cont }
completeBindX :: SimplEnv
@@ -1598,7 +1599,8 @@ simplCast env body co0 cont0
-- co1 :: t1 ~ s1
-- co2 :: s2 ~ t2
addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
- , sc_dup = dup, sc_cont = tail })
+ , sc_dup = dup, sc_cont = tail
+ , sc_hole_ty = fun_ty })
| Just (m_co1, m_co2) <- pushCoValArg co
, fixed_rep m_co1
= {-#SCC "addCoerce-pushCoValArg" #-}
@@ -1610,7 +1612,7 @@ simplCast env body co0 cont0
-- See Note [Avoiding exponential behaviour]
MCo co1 ->
- do { (dup', arg_se', arg') <- simplArg env dup arg_se arg
+ do { (dup', arg_se', arg') <- simplArg env dup fun_ty arg_se arg
-- When we build the ApplyTo we can't mix the OutCoercion
-- 'co' with the InExpr 'arg', so we simplify
-- to make it all consistent. It's a bit messy.
@@ -1636,14 +1638,16 @@ simplCast env body co0 cont0
-- See Note [Representation polymorphism invariants] in GHC.Core
-- test: typecheck/should_run/EtaExpandLevPoly
-simplArg :: SimplEnv -> DupFlag -> StaticEnv -> CoreExpr
+simplArg :: SimplEnv -> DupFlag
+ -> OutType -- Type of the function applied to this arg
+ -> StaticEnv -> CoreExpr -- Expression with its static envt
-> SimplM (DupFlag, StaticEnv, OutExpr)
-simplArg env dup_flag arg_env arg
+simplArg env dup_flag fun_ty arg_env arg
| isSimplified dup_flag
= return (dup_flag, arg_env, arg)
| otherwise
= do { let arg_env' = arg_env `setInScopeFromE` env
- ; arg' <- simplExpr arg_env' arg
+ ; arg' <- simplExprC arg_env' arg (mkBoringStop (funArgTy fun_ty))
; return (Simplified, zapSubstEnv arg_env', arg') }
-- Return a StaticEnv that includes the in-scope set from 'env',
-- because arg' may well mention those variables (#20639)
@@ -2029,6 +2033,21 @@ zap the SubstEnv. This is VITAL. Consider
We'll clone the inner \x, adding x->x' in the id_subst Then when we
inline y, we must *not* replace x by x' in the inlined copy!!
+
+Note [Fast path for data constructors]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For applications of a data constructor worker, the full glory of
+rebuildCall is a waste of effort;
+* They never inline, obviously
+* They have no rewrite rules
+* They are not strict (see Note [Data-con worker strictness]
+ in GHC.Core.DataCon)
+So it's fine to zoom straight to `rebuild` which just rebuilds the
+call in a very straightforward way.
+
+Some programs have a /lot/ of data constructors in the source program
+(compiler/perf/T9961 is an example), so this fast path can be very
+valuable.
-}
simplVar :: SimplEnv -> InVar -> SimplM OutExpr
@@ -2046,6 +2065,9 @@ simplVar env var
simplIdF :: SimplEnv -> InId -> SimplCont -> SimplM (SimplFloats, OutExpr)
simplIdF env var cont
+ | isDataConWorkId var -- See Note [Fast path for data constructors]
+ = rebuild env (Var var) cont
+ | otherwise
= case substId env var of
ContEx tvs cvs ids e -> simplExprF env' e cont
-- Don't trimJoinCont; haven't already simplified e,
@@ -2315,6 +2337,8 @@ field of the ArgInfo record is the state of a little state-machine:
If we inline `f` before simplifying `BIG` well use preInlineUnconditionally,
and we'll simplify BIG once, at x's occurrence, rather than twice.
+* GHC.Core.Opt.Simplify.Utils. mkRewriteCall: if there are no rules, and no
+ unfolding, we can skip both TryRules and TryInlining, which saves work.
Note [Avoid redundant simplification]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -3645,7 +3669,7 @@ mkDupableContWithDmds env dmds
do { let (dmd:cont_dmds) = dmds -- Never fails
; (floats1, cont') <- mkDupableContWithDmds env cont_dmds cont
; let env' = env `setInScopeFromF` floats1
- ; (_, se', arg') <- simplArg env' dup se arg
+ ; (_, se', arg') <- simplArg env' dup hole_ty se arg
; (let_floats2, arg'') <- makeTrivial env NotTopLevel dmd (fsLit "karg") arg'
; let all_floats = floats1 `addLetFloats` let_floats2
; return ( all_floats
=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -425,12 +425,22 @@ decArgCount :: RewriteCall -> RewriteCall
decArgCount (TryRules n rules) = TryRules (n-1) rules
decArgCount rew = rew
-mkTryRules :: [CoreRule] -> RewriteCall
+mkRewriteCall :: Id -> RuleEnv -> RewriteCall
-- See Note [Rewrite rules and inlining] in GHC.Core.Opt.Simplify.Iteration
-mkTryRules [] = TryInlining
-mkTryRules rs = TryRules n_required rs
+-- We try to skip any unnecessary stages:
+-- No rules => skip TryRules
+-- No unfolding => skip TryInlining
+-- This skipping is "just" for efficiency. But rebuildCall is
+-- quite a heavy hammer, so skipping stages is a good plan.
+-- And it's extremely simple to do.
+mkRewriteCall fun rule_env
+ | not (null rules) = TryRules n_required rules
+ | canUnfold unf = TryInlining
+ | otherwise = TryNothing
where
- n_required = maximum (map ruleArity rs)
+ n_required = maximum (map ruleArity rules)
+ rules = getRules rule_env fun
+ unf = idUnfolding fun
{-
************************************************************************
@@ -604,21 +614,23 @@ mkArgInfo :: SimplEnv -> RuleEnv -> Id -> SimplCont -> ArgInfo
mkArgInfo env rule_base fun cont
| n_val_args < idArity fun -- Note [Unsaturated functions]
= ArgInfo { ai_fun = fun, ai_args = []
- , ai_rewrite = fun_rules
+ , ai_rewrite = fun_rewrite
, ai_encl = False
, ai_dmds = vanilla_dmds
, ai_discs = vanilla_discounts }
| otherwise
= ArgInfo { ai_fun = fun
, ai_args = []
- , ai_rewrite = fun_rules
- , ai_encl = notNull rules || contHasRules cont
+ , ai_rewrite = fun_rewrite
+ , ai_encl = fun_has_rules || contHasRules cont
, ai_dmds = add_type_strictness (idType fun) arg_dmds
, ai_discs = arg_discounts }
where
- rules = getRules rule_base fun
- fun_rules = mkTryRules rules
- n_val_args = countValArgs cont
+ n_val_args = countValArgs cont
+ fun_rewrite = mkRewriteCall fun rule_base
+ fun_has_rules = case fun_rewrite of
+ TryRules {} -> True
+ _ -> False
vanilla_discounts, arg_discounts :: [Int]
vanilla_discounts = repeat 0
=====================================
compiler/GHC/Core/Opt/WorkWrap/Utils.hs
=====================================
@@ -15,7 +15,7 @@ module GHC.Core.Opt.WorkWrap.Utils
, findTypeShape, IsRecDataConResult(..), isRecDataCon
, mkAbsentFiller
, isWorkerSmallEnough, dubiousDataConInstArgTys
- , isGoodWorker, badWorker , goodWorker
+ , boringSplit , usefulSplit
)
where
@@ -571,23 +571,24 @@ data UnboxingDecision unboxing_info
-- returned product was constructed, so unbox it.
| DropAbsent -- ^ The argument/field was absent. Drop it.
--- Do we want to create workers just for unlifting?
-wwForUnlifting :: WwOpts -> Bool
-wwForUnlifting !opts
+-- | Do we want to create workers just for unlifting?
+wwUseForUnlifting :: WwOpts -> WwUse
+wwUseForUnlifting !opts
-- Always unlift if possible
- | wo_unlift_strict opts = goodWorker
+ | wo_unlift_strict opts = usefulSplit
-- Don't unlift it would cause additional W/W splits.
- | otherwise = badWorker
+ | otherwise = boringSplit
-badWorker :: Bool
-badWorker = False
+-- | Is the worker/wrapper split profitable?
+type WwUse = Bool
-goodWorker :: Bool
-goodWorker = True
-
-isGoodWorker :: Bool -> Bool
-isGoodWorker = id
+-- | WW split not profitable
+boringSplit :: WwUse
+boringSplit = False
+-- | WW split profitable
+usefulSplit :: WwUse
+usefulSplit = True
-- | Unwraps the 'Boxity' decision encoded in the given 'SubDemand' and returns
-- a 'DataConPatContext' as well the nested demands on fields of the 'DataCon'
@@ -826,7 +827,7 @@ Is this a win? Not always:
So there is a flag, `-fworker-wrapper-cbv`, to control whether we do
w/w on strict arguments (internally `Opt_WorkerWrapperUnlift`). The
flag is off by default. The choice is made in
-GHC.Core.Opt.WorkWrape.Utils.wwForUnlifting
+GHC.Core.Opt.WorkWrape.Utils.wwUseForUnlifting
See also `Note [WW for calling convention]` in GHC.Core.Opt.WorkWrap.Utils
-}
@@ -843,7 +844,7 @@ mkWWstr :: WwOpts
-> [Var] -- Wrapper args; have their demand info on them
-- *Includes type variables*
-> [StrictnessMark] -- Strictness-mark info for arguments
- -> UniqSM (Bool, -- Will this result in a useful worker
+ -> UniqSM (WwUse, -- Will this result in a useful worker
[(Var,StrictnessMark)], -- Worker args/their call-by-value semantics.
CoreExpr -> CoreExpr, -- Wrapper body, lacking the worker call
-- and without its lambdas
@@ -855,7 +856,7 @@ mkWWstr opts args str_marks
= -- pprTrace "mkWWstr" (ppr args) $
go args str_marks
where
- go [] _ = return (badWorker, [], nop_fn, [])
+ go [] _ = return (boringSplit, [], nop_fn, [])
go (arg : args) (str:strs)
= do { (useful1, args1, wrap_fn1, wrap_arg) <- mkWWstr_one opts arg str
; (useful2, args2, wrap_fn2, wrap_args) <- go args strs
@@ -875,7 +876,7 @@ mkWWstr opts args str_marks
mkWWstr_one :: WwOpts
-> Var
-> StrictnessMark
- -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+ -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
mkWWstr_one opts arg str_mark =
-- pprTrace "mkWWstr_one" (ppr arg <+> (if isId arg then ppr arg_ty $$ ppr arg_dmd else text "type arg")) $
case canUnboxArg fam_envs arg_ty arg_dmd of
@@ -887,7 +888,7 @@ mkWWstr_one opts arg str_mark =
-- We can't always handle absence for arbitrary
-- unlifted types, so we need to choose just the cases we can
-- (that's what mkAbsentFiller does)
- -> return (goodWorker, [], nop_fn, absent_filler)
+ -> return (usefulSplit, [], nop_fn, absent_filler)
| otherwise -> do_nothing
DoUnbox dcpc -> -- pprTrace "mkWWstr_one:1" (ppr (dcpc_dc dcpc) <+> ppr (dcpc_tc_args dcpc) $$ ppr (dcpc_args dcpc)) $
@@ -895,12 +896,12 @@ mkWWstr_one opts arg str_mark =
DontUnbox
| isStrictDmd arg_dmd || isMarkedStrict str_mark
- , wwForUnlifting opts -- See Note [CBV Function Ids]
+ , wwUseForUnlifting opts -- See Note [CBV Function Ids]
, not (isFunTy arg_ty)
, not (isUnliftedType arg_ty) -- Already unlifted!
-- NB: function arguments have a fixed RuntimeRep,
-- so it's OK to call isUnliftedType here
- -> return (goodWorker, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
+ -> return (usefulSplit, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
| otherwise -> do_nothing
@@ -910,11 +911,11 @@ mkWWstr_one opts arg str_mark =
arg_dmd = idDemandInfo arg
arg_str | isTyVar arg = NotMarkedStrict -- Type args don't get strictness marks
| otherwise = str_mark
- do_nothing = return (badWorker, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
+ do_nothing = return (boringSplit, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
unbox_one_arg :: WwOpts
- -> Var-> DataConPatContext Demand
- -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+ -> Var -> DataConPatContext Demand
+ -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
unbox_one_arg opts arg_var
DataConPatContext { dcpc_dc = dc, dcpc_tc_args = tc_args
, dcpc_co = co, dcpc_args = ds }
@@ -939,13 +940,14 @@ unbox_one_arg opts arg_var
-- See Note [Call-by-value for worker args]
all_str_marks = (map (const NotMarkedStrict) ex_tvs') ++ con_str_marks
- ; (_sub_args_quality, worker_args, wrap_fn, wrap_args)
+ ; (nested_useful, worker_args, wrap_fn, wrap_args)
<- mkWWstr opts (ex_tvs' ++ arg_ids') all_str_marks
; let wrap_arg = mkConApp dc (map Type tc_args ++ wrap_args) `mkCast` mkSymCo co
-
- ; return (goodWorker, worker_args, unbox_fn . wrap_fn, wrap_arg) }
- -- Don't pass the arg, rebox instead
+ -- See Note [Unboxing through unboxed tuples]
+ ; return $ if isUnboxedTupleDataCon dc && not nested_useful
+ then (boringSplit, [(arg_var,NotMarkedStrict)], nop_fn, varToCoreExpr arg_var)
+ else (usefulSplit, worker_args, unbox_fn . wrap_fn, wrap_arg) }
-- | Tries to find a suitable absent filler to bind the given absent identifier
-- to. See Note [Absent fillers].
@@ -1195,6 +1197,26 @@ fragile
because `MkT` is strict in its Int# argument, so we get an absentError
exception when we shouldn't. Very annoying!
+Note [Unboxing through unboxed tuples]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We should not to a worker/wrapper split just for unboxing the components of
+an unboxed tuple (in the result *or* argument, #22388). Consider
+ boring_res x y = (# y, x #)
+It's entirely pointless to split for the constructed unboxed pair to
+ $wboring_res x y = (# y, x #)
+ boring_res = case $wboring_res x y of (# a, b #) -> (# a, b #)
+`boring_res` will immediately simplify to an alias for `$wboring_res`!
+
+Similarly, the unboxed tuple might occur in argument position
+ boring_arg (# x, y, z #) = (# z, x, y #)
+It's entirely pointless to "unbox" the triple
+ $wboring_arg x y z = (# z, x, y #)
+ boring_arg (# x, y, z #) = $wboring_arg x y z
+because after unarisation, `boring_arg` is just an alias for `$wboring_arg`.
+
+Conclusion: Only consider unboxing an unboxed tuple useful when we will
+also unbox its components. That is governed by the `usefulSplit` mechanism.
+
************************************************************************
* *
Type scrutiny that is specific to demand analysis
@@ -1376,12 +1398,12 @@ mkWWcpr_entry
:: WwOpts
-> Type -- function body
-> Cpr -- CPR analysis results
- -> UniqSM (Bool, -- Is w/w'ing useful?
+ -> UniqSM (WwUse, -- Is w/w'ing useful?
CoreExpr -> CoreExpr, -- New wrapper. 'nop_fn' if not useful
CoreExpr -> CoreExpr) -- New worker. 'nop_fn' if not useful
-- ^ Entrypoint to CPR W/W. See Note [Worker/wrapper for CPR] for an overview.
mkWWcpr_entry opts body_ty body_cpr
- | not (wo_cpr_anal opts) = return (badWorker, nop_fn, nop_fn)
+ | not (wo_cpr_anal opts) = return (boringSplit, nop_fn, nop_fn)
| otherwise = do
-- Part (1)
res_bndr <- mk_res_bndr body_ty
@@ -1398,8 +1420,8 @@ mkWWcpr_entry opts body_ty body_cpr
let wrap_fn = unbox_transit_tup rebuilt_result -- 3 2
work_fn body = bind_res_bndr body (work_unpack_res transit_tup) -- 1 2 3
return $ if not useful
- then (badWorker, nop_fn, nop_fn)
- else (goodWorker, wrap_fn, work_fn)
+ then (boringSplit, nop_fn, nop_fn)
+ else (usefulSplit, wrap_fn, work_fn)
-- | Part (1) of Note [Worker/wrapper for CPR].
mk_res_bndr :: Type -> UniqSM Id
@@ -1411,18 +1433,18 @@ mk_res_bndr body_ty = do
-- | What part (2) of Note [Worker/wrapper for CPR] collects.
--
--- 1. A Bool capturing whether the transformation did anything useful.
+-- 1. A 'WwUse' capturing whether the split does anything useful.
-- 2. The list of transit variables (see the Note).
-- 3. The result builder expression for the wrapper. The original case binder if not useful.
-- 4. The result unpacking expression for the worker. 'nop_fn' if not useful.
-type CprWwResultOne = (Bool, OrdList Var, CoreExpr , CoreExpr -> CoreExpr)
-type CprWwResultMany = (Bool, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
+type CprWwResultOne = (WwUse, OrdList Var, CoreExpr , CoreExpr -> CoreExpr)
+type CprWwResultMany = (WwUse, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
mkWWcpr :: WwOpts -> [Id] -> [Cpr] -> UniqSM CprWwResultMany
mkWWcpr _opts vars [] =
-- special case: No CPRs means all top (for example from FlatConCpr),
-- hence stop WW.
- return (badWorker, toOL vars, map varToCoreExpr vars, nop_fn)
+ return (boringSplit, toOL vars, map varToCoreExpr vars, nop_fn)
mkWWcpr opts vars cprs = do
-- No existentials in 'vars'. 'canUnboxResult' should have checked that.
massertPpr (not (any isTyVar vars)) (ppr vars $$ ppr cprs)
@@ -1441,7 +1463,7 @@ mkWWcpr_one opts res_bndr cpr
, DoUnbox dcpc <- canUnboxResult (wo_fam_envs opts) (idType res_bndr) cpr
= unbox_one_result opts res_bndr dcpc
| otherwise
- = return (badWorker, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
+ = return (boringSplit, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
unbox_one_result
:: WwOpts -> Id -> DataConPatContext Cpr -> UniqSM CprWwResultOne
@@ -1467,11 +1489,10 @@ unbox_one_result opts res_bndr
-- this_work_unbox_res alt = (case res_bndr |> co of C a b -> <alt>[a,b])
this_work_unbox_res = mkUnpackCase (Var res_bndr) co cprCaseBndrMult dc arg_ids
- -- Don't try to WW an unboxed tuple return type when there's nothing inside
- -- to unbox further.
+ -- See Note [Unboxing through unboxed tuples]
return $ if isUnboxedTupleDataCon dc && not nested_useful
- then ( badWorker, unitOL res_bndr, Var res_bndr, nop_fn )
- else ( goodWorker
+ then ( boringSplit, unitOL res_bndr, Var res_bndr, nop_fn )
+ else ( usefulSplit
, transit_vars
, rebuilt_result
, this_work_unbox_res . work_unbox_res
=====================================
compiler/GHC/Core/Rules.hs
=====================================
@@ -9,7 +9,7 @@
-- The 'CoreRule' datatype itself is declared elsewhere.
module GHC.Core.Rules (
-- ** Looking up rules
- lookupRule,
+ RuleEnv, lookupRule,
-- ** RuleBase, RuleEnv
emptyRuleBase, mkRuleBase, extendRuleBaseList,
=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -585,6 +585,7 @@ mkDataConWorkId wkr_name data_con
`setInlinePragInfo` wkr_inline_prag
`setUnfoldingInfo` evaldUnfolding -- Record that it's evaluated,
-- even if arity = 0
+ -- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
wkr_arity = dataConRepArity data_con
=====================================
testsuite/tests/stranal/should_compile/T22388.hs
=====================================
@@ -0,0 +1,14 @@
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Unboxing through unboxed tuples]
+module T22388 where
+
+-- Don't split, because neither the result not arg cancels away a box.
+boring :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+boring (# x, y, z #) = (# y, z, x #)
+{-# NOINLINE boring #-}
+
+-- Do split, because we get to drop z and pass x and y unboxed
+interesting :: (# Int, Int, Int #) -> (# Int #)
+interesting (# x, y, z #) = let !t = x + y in (# t #)
+{-# NOINLINE interesting #-}
=====================================
testsuite/tests/stranal/should_compile/T22388.stderr
=====================================
@@ -0,0 +1,92 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+ = {terms: 48, types: 81, coercions: 0, joins: 0/0}
+
+-- RHS size: {terms: 8, types: 23, coercions: 0, joins: 0/0}
+boring [InlPrag=NOINLINE]
+ :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+[GblId, Arity=1, Str=<1!P(L,L,L)>, Cpr=1, Unf=OtherCon []]
+boring
+ = \ (ds :: (# Int, Int, Int #)) ->
+ case ds of { (# x, y, z #) -> (# y, z, x #) }
+
+-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
+T22388.$winteresting [InlPrag=NOINLINE]
+ :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
+[GblId, Arity=2, Str=<L><L>, Unf=OtherCon []]
+T22388.$winteresting
+ = \ (ww :: GHC.Prim.Int#) (ww1 :: GHC.Prim.Int#) ->
+ GHC.Prim.+# ww ww1
+
+-- RHS size: {terms: 18, types: 24, coercions: 0, joins: 0/0}
+interesting [InlPrag=NOINLINE[final]]
+ :: (# Int, Int, Int #) -> (# Int #)
+[GblId,
+ Arity=1,
+ Str=<1!P(1!P(L),1!P(L),A)>,
+ Cpr=1(1),
+ Unf=Unf{Src=StableSystem, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True,
+ Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+ Tmpl= \ (ds [Occ=Once1!] :: (# Int, Int, Int #)) ->
+ case ds of
+ { (# ww [Occ=Once1!], ww1 [Occ=Once1!], _ [Occ=Dead] #) ->
+ case ww of { GHC.Types.I# ww3 [Occ=Once1] ->
+ case ww1 of { GHC.Types.I# ww4 [Occ=Once1] ->
+ case T22388.$winteresting ww3 ww4 of ww5 [Occ=Once1] { __DEFAULT ->
+ (# GHC.Types.I# ww5 #)
+ }
+ }
+ }
+ }}]
+interesting
+ = \ (ds :: (# Int, Int, Int #)) ->
+ case ds of { (# ww, ww1, ww2 #) ->
+ case ww of { GHC.Types.I# ww3 ->
+ case ww1 of { GHC.Types.I# ww4 ->
+ case T22388.$winteresting ww3 ww4 of ww5 { __DEFAULT ->
+ (# GHC.Types.I# ww5 #)
+ }
+ }
+ }
+ }
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule4 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
+T22388.$trModule4 = "main"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule3 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule3 = GHC.Types.TrNameS T22388.$trModule4
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule2 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}]
+T22388.$trModule2 = "T22388"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule1 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule1 = GHC.Types.TrNameS T22388.$trModule2
+
+-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule :: GHC.Types.Module
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+ WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule
+ = GHC.Types.Module T22388.$trModule3 T22388.$trModule1
+
+
+
=====================================
testsuite/tests/stranal/should_compile/all.T
=====================================
@@ -86,3 +86,5 @@ test('T21128', [ grep_errmsg(r'let { y = I\#') ], multimod_compile, ['T21128', '
test('T21265', normal, compile, [''])
test('EtaExpansion', normal, compile, [''])
test('T22039', normal, compile, [''])
+# T22388: Should see $winteresting but not $wboring
+test('T22388', [ grep_errmsg(r'^\S+\$w\S+') ], compile, ['-dsuppress-uniques -ddump-simpl'])
=====================================
testsuite/tests/stranal/sigs/T21737.hs
=====================================
@@ -0,0 +1,47 @@
+{-# OPTIONS_GHC -fmax-worker-args=4 #-}
+
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Worker argument budget]
+module T21737 where
+
+data T = MkT (# Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- We should unbox through the unboxed pair but not T
+{-# NOINLINE f #-}
+f :: Int -> (# Int, Int #) -> T -> Int
+f x (# y, z #) (MkT (# x1, x2, x3, x4 #)) = x + y + z + x1 + x2 + x3 + x4
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the triple *even if* that gets us to 6 args,
+-- because the triple will take 3 registers anyway (not 1)
+-- and we get to unbox a b c.
+yes :: (# Int, Int, Int #) -> Int -> Int -> Int -> Int
+yes (# a, b, c #) d e f = a + b + c + d + e + f
+{-# NOINLINE yes #-}
+
+data U = MkU (# Int, Int, Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Don't unbox U, because then we'll pass an unboxed 6-tuple, all in registers.
+no :: U -> Int
+no (MkU (# a, b, c, d, e, f #)) = a + b + c + d + e + f
+{-# NOINLINE no #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Hence do not unbox the nested triple.
+boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+{-# NOINLINE boxed #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the inner unboxed triple *even if* that gets us to 5 args, because
+-- the function will take 5 args anyway. But don't split the pair!
+unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+{-# NOINLINE unboxed #-}
+
+-- Point: Demand on `x` is lazy and thus Unboxed
+app :: ((# Int, Int #) -> (# Int, Int #)) -> (# Int, Int #) -> (# Int, Int #)
+app g x = g x
=====================================
testsuite/tests/stranal/sigs/T21737.stderr
=====================================
@@ -0,0 +1,30 @@
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+
+
+==================== Cpr signatures ====================
+T21737.app:
+T21737.boxed: 1
+T21737.f: 1
+T21737.no: 1
+T21737.unboxed: 1
+T21737.yes: 1
+
+
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+
=====================================
testsuite/tests/stranal/sigs/all.T
=====================================
@@ -38,3 +38,4 @@ test('T21754', normal, compile, [''])
test('T21888', normal, compile, [''])
test('T21888a', normal, compile, [''])
test('T22241', normal, compile, [''])
+test('T21737', normal, compile, [''])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/cd1d2d1a49bf96ffff9772bf60a820e1d0bbbc7b...7027c04d6f373607bb15ce3ad403b14feb3302ab
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/cd1d2d1a49bf96ffff9772bf60a820e1d0bbbc7b...7027c04d6f373607bb15ce3ad403b14feb3302ab
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221110/991b4d1f/attachment-0001.html>
More information about the ghc-commits
mailing list