[Git][ghc/ghc][master] 2 commits: WorkWrap: Unboxing unboxed tuples is not always useful (#22388)

Marge Bot (@marge-bot) gitlab at gitlab.haskell.org
Fri Nov 11 02:16:26 UTC 2022



Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC


Commits:
dac0682a by Sebastian Graf at 2022-11-10T21:16:01-05:00
WorkWrap: Unboxing unboxed tuples is not always useful (#22388)

See Note [Unboxing through unboxed tuples].

Fixes #22388.

- - - - -
1230c268 by Sebastian Graf at 2022-11-10T21:16:01-05:00
Boxity: Handle argument budget of unboxed tuples correctly (#21737)

Now Budget roughly tracks the combined width of all arguments after unarisation.
See the changes to `Note [Worker argument budgets]`.

Fixes #21737.

- - - - -


8 changed files:

- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs
- + testsuite/tests/stranal/should_compile/T22388.hs
- + testsuite/tests/stranal/should_compile/T22388.stderr
- testsuite/tests/stranal/should_compile/all.T
- + testsuite/tests/stranal/sigs/T21737.hs
- + testsuite/tests/stranal/sigs/T21737.stderr
- testsuite/tests/stranal/sigs/all.T


Changes:

=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -45,6 +45,7 @@ import GHC.Builtin.PrimOps
 import GHC.Builtin.Types.Prim ( realWorldStatePrimTy )
 import GHC.Types.Unique.Set
 import GHC.Types.Unique.MemoFun
+import GHC.Types.RepType
 
 
 {-
@@ -1765,7 +1766,7 @@ Note [Worker argument budget]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In 'finaliseArgBoxities' we don't want to generate workers with zillions of
 argument when, say given a strict record with zillions of fields.  So we
-limit the maximum number of worker args to the maximum of
+limit the maximum number of worker args ('max_wkr_args') to the maximum of
   - -fmax-worker-args=N
   - The number of args in the original function; if it already has has
     zillions of arguments we don't want to seek /fewer/ args in the worker.
@@ -1774,10 +1775,91 @@ limit the maximum number of worker args to the maximum of
 We pursue a "layered" strategy for unboxing: we unbox the top level of the
 argument(s), subject to budget; if there are any arguments left we unbox the
 next layer, using that depleted budget.
+Unboxing an argument *increases* the budget for the inner layer roughly
+according to how many registers that argument takes (unboxed tuples take
+multiple registers, see below), as determined by 'unariseArity'.
+Budget is spent when we have to pass a non-absent field as a parameter.
 
 To achieve this, we use the classic almost-circular programming technique in
 which we we write one pass that takes a lazy list of the Budgets for every
-layer.
+layer. The effect is that of a breadth-first search (over argument type and
+demand structure) to compute Budgets followed by a depth-first search to
+construct the product demands, but laziness allows us to do it all in one
+pass and without intermediate data structures.
+
+Suppose we have -fmax-worker-args=4 for the remainder of this Note.
+Then consider this example function:
+
+  boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+  boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+
+With a budget of 4 args to spend (number of args is only 2), we'd be served well
+to unbox both pairs, but not the triple. Indeed, that is what the algorithm
+computes, and the following pictogram shows how the budget layers are computed.
+Each layer is started with `n ~>`, where `n` is the budget at the start of the
+layer. We write -n~> when we spend budget (and n is the remaining budget) and
++n~> when we earn budget. We separate unboxed args with ][ and indicate
+inner budget threads becoming negative in braces {{}}, so that we see which
+unboxing decision we do *not* commit to. Without further ado:
+
+  4 ~> ][     (a,b) -3~>               ][     (c, ...) -2~>
+       ][      | |                     ][      |   |
+       ][      | +-------------+       ][      |   +-----------------+
+       ][      |               |       ][      |                     |
+       ][      v               v       ][      v                     v
+  2 ~> ][ +3~> a  -2~> ][      b  -1~> ][ +2~> c  -1~> ][        (d, e, f) -0~>
+       ][      |       ][      |       ][      |       ][ {{      |  |  |                          }}
+       ][      |       ][      |       ][      |       ][ {{      |  |  +----------------+         }}
+       ][      v       ][      v       ][      v       ][ {{      v  +------v            v         }}
+  0 ~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ {{ +1~> d -0~> ][ e -(-1)~> ][ f -(-2)~> }}
+
+Unboxing increments the budget we have on the next layer (because we don't need
+to retain the boxed arg), but in turn the inner layer must afford to retain all
+non-absent fields, each decrementing the budget. Note how the budget becomes
+negative when trying to unbox the triple and the unboxing decision is "rolled
+back". This is done by the 'positiveTopBudget' guard.
+
+There's a bit of complication as a result of handling unboxed tuples correctly;
+specifically, handling nested unboxed tuples. Consider (#21737)
+
+  unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+  unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+
+Recall that unboxed tuples will be flattened to individual arguments during
+unarisation. Here, `unboxed` will have 5 arguments at runtime because of the
+nested unboxed tuple, which will be flattened to 4 args. So it's best to leave
+`(a,b)` boxed (because we already are above our arg threshold), but unbox `c`
+through `f` because that doesn't increase the number of args post unarisation.
+
+Note that the challenge is that syntactically, `(# d, e, f #)` occurs in a
+deeper layer than `(a, b)`. Treating unboxed tuples as a regular data type, we'd
+make the same unboxing decisions as for `boxed` above; although our starting
+budget is 5 (Here, the number of args is greater than -fmax-worker-args), it's
+not enough to unbox the triple (we'd finish with budget -1). So we'd unbox `a`
+through `c`, but not `d` through `f`, which is silly, because then we'd end up
+having 6 arguments at runtime, of which `d` through `f` weren't unboxed.
+
+Hence we pretend that the fields of unboxed tuples appear in the same budget
+layer as the tuple itself. For example at the top-level, `(# x,y #)` is to be
+treated just like two arguments `x` and `y`.
+Of course, for that to work, our budget calculations must initialise
+'max_wkr_args' to 5, based on the 'unariseArity' of each Core arg: That would be
+1 for the pair and 4 for the unboxed pair. Then when we decide whether to unbox
+the unboxed pair, we *directly* recurse into the fields, spending our budget
+on retaining `c` and (after recursing once more) `d` through `f` as arguments,
+depleting our budget completely in the first layer. Pictorially:
+
+  5 ~> ][         (a,b) -4~>             ][         (# c, ... #)
+       ][ {{      | |                 }} ][      c  -3~> ][ (# d, e, f #)
+       ][ {{      | +-------+         }} ][      |       ][      d  -2~> ][      e  -1~> ][      f  -0~>
+       ][ {{      |         |         }} ][      |       ][      |       ][      |       ][      |
+       ][ {{      v         v         }} ][      v       ][      v       ][      v       ][      v
+  0 ~> ][ {{ +1~> a -0~> ][ b -(-1)~> }} ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~>
+
+As you can see, we have no budget left to justify unboxing `(a,b)` on the second
+layer, which is good, because it would increase the number of args. Also note
+that we can still unbox `c` through `f` in this layer, because doing so has a
+net zero effect on budget.
 
 Note [The OPAQUE pragma and avoiding the reboxing of arguments]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1798,10 +1880,17 @@ W/W-transformation code that boxed arguments of 'f' must definitely be passed
 along in boxed form and as such dissuade the creation of reboxing workers.
 -}
 
-data Budgets = MkB Arity Budgets   -- An infinite list of arity budgets
+-- | How many registers does this type take after unarisation?
+unariseArity :: Type -> Arity
+unariseArity ty = length (typePrimRep ty)
 
-incTopBudget :: Budgets -> Budgets
-incTopBudget (MkB n bg) = MkB (n+1) bg
+data Budgets = MkB !Arity Budgets   -- An infinite list of arity budgets
+
+earnTopBudget :: Budgets -> Budgets
+earnTopBudget (MkB n bg) = MkB (n+1) bg
+
+spendTopBudget :: Arity -> Budgets -> Budgets
+spendTopBudget m (MkB n bg) = MkB (n-m) bg
 
 positiveTopBudget :: Budgets -> Bool
 positiveTopBudget (MkB n _) = n >= 0
@@ -1814,7 +1903,8 @@ finaliseArgBoxities env fn arity rhs div
              -- Then there are no binders; we don't worker/wrapper; and we
              -- simply want to give f the same demand signature as g
 
-  | otherwise
+  | otherwise -- NB: arity is the threshold_arity, which might be less than
+              -- manifest arity for join points
   = -- pprTrace "finaliseArgBoxities" (
     --   vcat [text "function:" <+> ppr fn
     --        , text "dmds before:" <+> ppr (map idDemandInfo (filter isId bndrs))
@@ -1826,8 +1916,10 @@ finaliseArgBoxities env fn arity rhs div
   where
     opts            = ae_opts env
     (bndrs, _body)  = collectBinders rhs
-    max_wkr_args    = dmd_max_worker_args opts `max` arity
-                      -- See Note [Worker argument budget]
+    unarise_arity   = sum [ unariseArity (idType b) | b <- bndrs, isId b ]
+    max_wkr_args    = dmd_max_worker_args opts `max` unarise_arity
+                      -- This is the budget initialisation step of
+                      -- Note [Worker argument budget]
 
     -- This is the key line, which uses almost-circular programming
     -- The remaining budget from one layer becomes the initial
@@ -1871,22 +1963,49 @@ finaliseArgBoxities env fn arity rhs div
       = case wantToUnboxArg env ty str_mark dmd of
           DropAbsent -> (bg, dmd)
 
-          DontUnbox | is_bot_fn, isTyVarTy ty -> (decremented_bg, dmd)
-                    | otherwise               -> (decremented_bg, trimBoxity dmd)
+          DontUnbox | is_bot_fn, isTyVarTy ty -> (retain_budget, dmd)
+                    | otherwise               -> (retain_budget, trimBoxity dmd)
             -- If bot: Keep deep boxity even though WW won't unbox
             -- See Note [Boxity for bottoming functions] case (A)
             -- trimBoxity: see Note [No lazy, Unboxed demands in demand signature]
-
-          DoUnbox triples -> (MkB (bg_top-1) final_bg_inner, final_dmd)
             where
-              (bg_inner', dmds') = go_args (incTopBudget bg_inner) triples
-                     -- incTopBudget: give one back for the arg we are unboxing
+              retain_budget = spendTopBudget (unariseArity ty) bg
+                -- spendTopBudget: spend from our budget the cost of the
+                -- retaining the arg
+                -- The unboxed case does happen here, for example
+                --   app g x = g x :: (# Int, Int #)
+                -- here, `x` is used `L`azy and thus Boxed
+
+          DoUnbox triples
+            | isUnboxedTupleType ty
+            , (bg', dmds') <- go_args bg triples
+            -> (bg', n :* (mkProd Unboxed $! dmds'))
+                     -- See Note [Worker argument budget]
+                     -- unboxed tuples are always unboxed, deeply
+                     -- NB: Recurse with bg, *not* bg_inner! The unboxed fields
+                     -- are at the same budget layer.
+
+            | isUnboxedSumType ty
+            -> pprPanic "Unboxing through unboxed sum" (ppr fn <+> ppr ty)
+                     -- We currently don't return DoUnbox for unboxed sums.
+                     -- But hopefully we will at some point. When that happens,
+                     -- it would still be impossible to predict the effect
+                     -- of dropping absent fields and unboxing others on the
+                     -- unariseArity of the sum without losing sanity.
+                     -- We could overwrite bg_top with the one from
+                     -- retain_budget while still unboxing inside the alts as in
+                     -- the tuple case for a conservative solution, though.
+
+            | otherwise
+            -> (spendTopBudget 1 (MkB bg_top final_bg_inner), final_dmd)
+            where
+              (bg_inner', dmds') = go_args (earnTopBudget bg_inner) triples
+                     -- earnTopBudget: give back the cost of retaining the
+                     -- arg we are insted unboxing.
               dmd' = n :* (mkProd Unboxed $! dmds')
-              (final_bg_inner, final_dmd)
+              ~(final_bg_inner, final_dmd) -- "~": This match *must* be lazy!
                  | positiveTopBudget bg_inner' = (bg_inner', dmd')
                  | otherwise                   = (bg_inner,  trimBoxity dmd)
-      where
-        decremented_bg = MkB (bg_top-1) bg_inner
 
     add_demands :: [Demand] -> CoreExpr -> CoreExpr
     -- Attach the demands to the outer lambdas of this expression


=====================================
compiler/GHC/Core/Opt/WorkWrap/Utils.hs
=====================================
@@ -15,7 +15,7 @@ module GHC.Core.Opt.WorkWrap.Utils
    , findTypeShape, IsRecDataConResult(..), isRecDataCon
    , mkAbsentFiller
    , isWorkerSmallEnough, dubiousDataConInstArgTys
-   , isGoodWorker, badWorker , goodWorker
+   , boringSplit , usefulSplit
    )
 where
 
@@ -571,23 +571,24 @@ data UnboxingDecision unboxing_info
                             -- returned product was constructed, so unbox it.
   | DropAbsent              -- ^ The argument/field was absent. Drop it.
 
--- Do we want to create workers just for unlifting?
-wwForUnlifting :: WwOpts -> Bool
-wwForUnlifting !opts
+-- | Do we want to create workers just for unlifting?
+wwUseForUnlifting :: WwOpts -> WwUse
+wwUseForUnlifting !opts
     -- Always unlift if possible
-    | wo_unlift_strict opts = goodWorker
+    | wo_unlift_strict opts = usefulSplit
     -- Don't unlift  it would cause additional W/W splits.
-    | otherwise = badWorker
+    | otherwise             = boringSplit
 
-badWorker :: Bool
-badWorker = False
+-- | Is the worker/wrapper split profitable?
+type WwUse = Bool
 
-goodWorker :: Bool
-goodWorker = True
-
-isGoodWorker :: Bool -> Bool
-isGoodWorker = id
+-- | WW split not profitable
+boringSplit :: WwUse
+boringSplit = False
 
+-- | WW split profitable
+usefulSplit :: WwUse
+usefulSplit = True
 
 -- | Unwraps the 'Boxity' decision encoded in the given 'SubDemand' and returns
 -- a 'DataConPatContext' as well the nested demands on fields of the 'DataCon'
@@ -826,7 +827,7 @@ Is this a win?  Not always:
 So there is a flag, `-fworker-wrapper-cbv`, to control whether we do
 w/w on strict arguments (internally `Opt_WorkerWrapperUnlift`).  The
 flag is off by default.  The choice is made in
-GHC.Core.Opt.WorkWrape.Utils.wwForUnlifting
+GHC.Core.Opt.WorkWrape.Utils.wwUseForUnlifting
 
 See also `Note [WW for calling convention]` in GHC.Core.Opt.WorkWrap.Utils
 -}
@@ -843,7 +844,7 @@ mkWWstr :: WwOpts
         -> [Var]                         -- Wrapper args; have their demand info on them
                                          --  *Includes type variables*
         -> [StrictnessMark]              -- Strictness-mark info for arguments
-        -> UniqSM (Bool,                 -- Will this result in a useful worker
+        -> UniqSM (WwUse,                -- Will this result in a useful worker
                    [(Var,StrictnessMark)],      -- Worker args/their call-by-value semantics.
                    CoreExpr -> CoreExpr, -- Wrapper body, lacking the worker call
                                          -- and without its lambdas
@@ -855,7 +856,7 @@ mkWWstr opts args str_marks
   = -- pprTrace "mkWWstr" (ppr args) $
     go args str_marks
   where
-    go [] _ = return (badWorker, [], nop_fn, [])
+    go [] _ = return (boringSplit, [], nop_fn, [])
     go (arg : args) (str:strs)
       = do { (useful1, args1, wrap_fn1, wrap_arg)  <- mkWWstr_one opts arg str
            ; (useful2, args2, wrap_fn2, wrap_args) <- go args strs
@@ -875,7 +876,7 @@ mkWWstr opts args str_marks
 mkWWstr_one :: WwOpts
             -> Var
             -> StrictnessMark
-            -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+            -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
 mkWWstr_one opts arg str_mark =
   -- pprTrace "mkWWstr_one" (ppr arg <+> (if isId arg then ppr arg_ty  $$ ppr arg_dmd else text "type arg")) $
   case canUnboxArg fam_envs arg_ty arg_dmd of
@@ -887,7 +888,7 @@ mkWWstr_one opts arg str_mark =
          -- We can't always handle absence for arbitrary
          -- unlifted types, so we need to choose just the cases we can
          -- (that's what mkAbsentFiller does)
-      -> return (goodWorker, [], nop_fn, absent_filler)
+      -> return (usefulSplit, [], nop_fn, absent_filler)
       | otherwise -> do_nothing
 
     DoUnbox dcpc -> -- pprTrace "mkWWstr_one:1" (ppr (dcpc_dc dcpc) <+> ppr (dcpc_tc_args dcpc) $$ ppr (dcpc_args dcpc)) $
@@ -895,12 +896,12 @@ mkWWstr_one opts arg str_mark =
 
     DontUnbox
       | isStrictDmd arg_dmd || isMarkedStrict str_mark
-      , wwForUnlifting opts  -- See Note [CBV Function Ids]
+      , wwUseForUnlifting opts  -- See Note [CBV Function Ids]
       , not (isFunTy arg_ty)
       , not (isUnliftedType arg_ty) -- Already unlifted!
         -- NB: function arguments have a fixed RuntimeRep,
         -- so it's OK to call isUnliftedType here
-      -> return  (goodWorker, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
+      -> return  (usefulSplit, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
 
       | otherwise -> do_nothing
 
@@ -910,11 +911,11 @@ mkWWstr_one opts arg str_mark =
     arg_dmd    = idDemandInfo arg
     arg_str    | isTyVar arg = NotMarkedStrict -- Type args don't get strictness marks
                | otherwise   = str_mark
-    do_nothing = return (badWorker, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
+    do_nothing = return (boringSplit, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
 
 unbox_one_arg :: WwOpts
-              -> Var-> DataConPatContext Demand
-              -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+              -> Var -> DataConPatContext Demand
+              -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
 unbox_one_arg opts arg_var
               DataConPatContext { dcpc_dc = dc, dcpc_tc_args = tc_args
                                 , dcpc_co = co, dcpc_args = ds }
@@ -939,13 +940,14 @@ unbox_one_arg opts arg_var
              -- See Note [Call-by-value for worker args]
              all_str_marks = (map (const NotMarkedStrict) ex_tvs') ++ con_str_marks
 
-       ; (_sub_args_quality, worker_args, wrap_fn, wrap_args)
+       ; (nested_useful, worker_args, wrap_fn, wrap_args)
              <- mkWWstr opts (ex_tvs' ++ arg_ids') all_str_marks
 
        ; let wrap_arg = mkConApp dc (map Type tc_args ++ wrap_args) `mkCast` mkSymCo co
-
-       ; return (goodWorker, worker_args, unbox_fn . wrap_fn, wrap_arg) }
-                          -- Don't pass the arg, rebox instead
+       -- See Note [Unboxing through unboxed tuples]
+       ; return $ if isUnboxedTupleDataCon dc && not nested_useful
+                     then (boringSplit, [(arg_var,NotMarkedStrict)], nop_fn, varToCoreExpr arg_var)
+                     else (usefulSplit, worker_args, unbox_fn . wrap_fn, wrap_arg) }
 
 -- | Tries to find a suitable absent filler to bind the given absent identifier
 -- to. See Note [Absent fillers].
@@ -1195,6 +1197,26 @@ fragile
    because `MkT` is strict in its Int# argument, so we get an absentError
    exception when we shouldn't.  Very annoying!
 
+Note [Unboxing through unboxed tuples]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We should not to a worker/wrapper split just for unboxing the components of
+an unboxed tuple (in the result *or* argument, #22388). Consider
+  boring_res x y = (# y, x #)
+It's entirely pointless to split for the constructed unboxed pair to
+  $wboring_res x y = (# y, x #)
+  boring_res = case $wboring_res x y of (# a, b #) -> (# a, b #)
+`boring_res` will immediately simplify to an alias for `$wboring_res`!
+
+Similarly, the unboxed tuple might occur in argument position
+  boring_arg (# x, y, z #) = (# z, x, y #)
+It's entirely pointless to "unbox" the triple
+  $wboring_arg x y z = (# z, x, y #)
+  boring_arg (# x, y, z #) = $wboring_arg x y z
+because after unarisation, `boring_arg` is just an alias for `$wboring_arg`.
+
+Conclusion: Only consider unboxing an unboxed tuple useful when we will
+also unbox its components. That is governed by the `usefulSplit` mechanism.
+
 ************************************************************************
 *                                                                      *
          Type scrutiny that is specific to demand analysis
@@ -1376,12 +1398,12 @@ mkWWcpr_entry
   :: WwOpts
   -> Type                              -- function body
   -> Cpr                               -- CPR analysis results
-  -> UniqSM (Bool,            -- Is w/w'ing useful?
+  -> UniqSM (WwUse,                    -- Is w/w'ing useful?
              CoreExpr -> CoreExpr,     -- New wrapper. 'nop_fn' if not useful
              CoreExpr -> CoreExpr)     -- New worker.  'nop_fn' if not useful
 -- ^ Entrypoint to CPR W/W. See Note [Worker/wrapper for CPR] for an overview.
 mkWWcpr_entry opts body_ty body_cpr
-  | not (wo_cpr_anal opts) = return (badWorker, nop_fn, nop_fn)
+  | not (wo_cpr_anal opts) = return (boringSplit, nop_fn, nop_fn)
   | otherwise = do
     -- Part (1)
     res_bndr <- mk_res_bndr body_ty
@@ -1398,8 +1420,8 @@ mkWWcpr_entry opts body_ty body_cpr
     let wrap_fn      = unbox_transit_tup rebuilt_result                 -- 3 2
         work_fn body = bind_res_bndr body (work_unpack_res transit_tup) -- 1 2 3
     return $ if not useful
-                then (badWorker, nop_fn, nop_fn)
-                else (goodWorker, wrap_fn, work_fn)
+                then (boringSplit, nop_fn, nop_fn)
+                else (usefulSplit, wrap_fn, work_fn)
 
 -- | Part (1) of Note [Worker/wrapper for CPR].
 mk_res_bndr :: Type -> UniqSM Id
@@ -1411,18 +1433,18 @@ mk_res_bndr body_ty = do
 
 -- | What part (2) of Note [Worker/wrapper for CPR] collects.
 --
---   1. A Bool capturing whether the transformation did anything useful.
+--   1. A 'WwUse' capturing whether the split does anything useful.
 --   2. The list of transit variables (see the Note).
 --   3. The result builder expression for the wrapper.  The original case binder if not useful.
 --   4. The result unpacking expression for the worker. 'nop_fn' if not useful.
-type CprWwResultOne  = (Bool, OrdList Var,  CoreExpr , CoreExpr -> CoreExpr)
-type CprWwResultMany = (Bool, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
+type CprWwResultOne  = (WwUse, OrdList Var,  CoreExpr , CoreExpr -> CoreExpr)
+type CprWwResultMany = (WwUse, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
 
 mkWWcpr :: WwOpts -> [Id] -> [Cpr] -> UniqSM CprWwResultMany
 mkWWcpr _opts vars []   =
   -- special case: No CPRs means all top (for example from FlatConCpr),
   -- hence stop WW.
-  return (badWorker, toOL vars, map varToCoreExpr vars, nop_fn)
+  return (boringSplit, toOL vars, map varToCoreExpr vars, nop_fn)
 mkWWcpr opts  vars cprs = do
   -- No existentials in 'vars'. 'canUnboxResult' should have checked that.
   massertPpr (not (any isTyVar vars)) (ppr vars $$ ppr cprs)
@@ -1441,7 +1463,7 @@ mkWWcpr_one opts res_bndr cpr
   , DoUnbox dcpc <- canUnboxResult (wo_fam_envs opts) (idType res_bndr) cpr
   = unbox_one_result opts res_bndr dcpc
   | otherwise
-  = return (badWorker, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
+  = return (boringSplit, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
 
 unbox_one_result
   :: WwOpts -> Id -> DataConPatContext Cpr -> UniqSM CprWwResultOne
@@ -1467,11 +1489,10 @@ unbox_one_result opts res_bndr
       -- this_work_unbox_res alt = (case res_bndr |> co of C a b -> <alt>[a,b])
       this_work_unbox_res = mkUnpackCase (Var res_bndr) co cprCaseBndrMult dc arg_ids
 
-  -- Don't try to WW an unboxed tuple return type when there's nothing inside
-  -- to unbox further.
+  -- See Note [Unboxing through unboxed tuples]
   return $ if isUnboxedTupleDataCon dc && not nested_useful
-              then ( badWorker, unitOL res_bndr, Var res_bndr, nop_fn )
-              else ( goodWorker
+              then ( boringSplit, unitOL res_bndr, Var res_bndr, nop_fn )
+              else ( usefulSplit
                    , transit_vars
                    , rebuilt_result
                    , this_work_unbox_res . work_unbox_res


=====================================
testsuite/tests/stranal/should_compile/T22388.hs
=====================================
@@ -0,0 +1,14 @@
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Unboxing through unboxed tuples]
+module T22388 where
+
+-- Don't split, because neither the result not arg cancels away a box.
+boring :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+boring (# x, y, z #) = (# y, z, x #)
+{-# NOINLINE boring #-}
+
+-- Do split, because we get to drop z and pass x and y unboxed
+interesting :: (# Int, Int, Int #) -> (# Int #)
+interesting (# x, y, z #) = let !t = x + y in (# t #)
+{-# NOINLINE interesting #-}


=====================================
testsuite/tests/stranal/should_compile/T22388.stderr
=====================================
@@ -0,0 +1,92 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 48, types: 81, coercions: 0, joins: 0/0}
+
+-- RHS size: {terms: 8, types: 23, coercions: 0, joins: 0/0}
+boring [InlPrag=NOINLINE]
+  :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+[GblId, Arity=1, Str=<1!P(L,L,L)>, Cpr=1, Unf=OtherCon []]
+boring
+  = \ (ds :: (# Int, Int, Int #)) ->
+      case ds of { (# x, y, z #) -> (# y, z, x #) }
+
+-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
+T22388.$winteresting [InlPrag=NOINLINE]
+  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
+[GblId, Arity=2, Str=<L><L>, Unf=OtherCon []]
+T22388.$winteresting
+  = \ (ww :: GHC.Prim.Int#) (ww1 :: GHC.Prim.Int#) ->
+      GHC.Prim.+# ww ww1
+
+-- RHS size: {terms: 18, types: 24, coercions: 0, joins: 0/0}
+interesting [InlPrag=NOINLINE[final]]
+  :: (# Int, Int, Int #) -> (# Int #)
+[GblId,
+ Arity=1,
+ Str=<1!P(1!P(L),1!P(L),A)>,
+ Cpr=1(1),
+ Unf=Unf{Src=StableSystem, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+         Tmpl= \ (ds [Occ=Once1!] :: (# Int, Int, Int #)) ->
+                 case ds of
+                 { (# ww [Occ=Once1!], ww1 [Occ=Once1!], _ [Occ=Dead] #) ->
+                 case ww of { GHC.Types.I# ww3 [Occ=Once1] ->
+                 case ww1 of { GHC.Types.I# ww4 [Occ=Once1] ->
+                 case T22388.$winteresting ww3 ww4 of ww5 [Occ=Once1] { __DEFAULT ->
+                 (# GHC.Types.I# ww5 #)
+                 }
+                 }
+                 }
+                 }}]
+interesting
+  = \ (ds :: (# Int, Int, Int #)) ->
+      case ds of { (# ww, ww1, ww2 #) ->
+      case ww of { GHC.Types.I# ww3 ->
+      case ww1 of { GHC.Types.I# ww4 ->
+      case T22388.$winteresting ww3 ww4 of ww5 { __DEFAULT ->
+      (# GHC.Types.I# ww5 #)
+      }
+      }
+      }
+      }
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule4 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
+T22388.$trModule4 = "main"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule3 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule3 = GHC.Types.TrNameS T22388.$trModule4
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule2 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}]
+T22388.$trModule2 = "T22388"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule1 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule1 = GHC.Types.TrNameS T22388.$trModule2
+
+-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule :: GHC.Types.Module
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule
+  = GHC.Types.Module T22388.$trModule3 T22388.$trModule1
+
+
+


=====================================
testsuite/tests/stranal/should_compile/all.T
=====================================
@@ -86,3 +86,5 @@ test('T21128', [ grep_errmsg(r'let { y = I\#') ], multimod_compile, ['T21128', '
 test('T21265', normal, compile, [''])
 test('EtaExpansion', normal, compile, [''])
 test('T22039', normal, compile, [''])
+# T22388: Should see $winteresting but not $wboring
+test('T22388', [ grep_errmsg(r'^\S+\$w\S+') ], compile, ['-dsuppress-uniques -ddump-simpl'])


=====================================
testsuite/tests/stranal/sigs/T21737.hs
=====================================
@@ -0,0 +1,47 @@
+{-# OPTIONS_GHC -fmax-worker-args=4 #-}
+
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Worker argument budget]
+module T21737 where
+
+data T = MkT (# Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- We should unbox through the unboxed pair but not T
+{-# NOINLINE f #-}
+f :: Int -> (# Int, Int #) -> T -> Int
+f x (# y, z #) (MkT (# x1, x2, x3, x4 #)) = x + y + z + x1 + x2 + x3 + x4
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the triple *even if* that gets us to 6 args,
+-- because the triple will take 3 registers anyway (not 1)
+-- and we get to unbox a b c.
+yes :: (# Int, Int, Int #) -> Int -> Int -> Int -> Int
+yes (# a, b, c #) d e f = a + b + c + d + e + f
+{-# NOINLINE yes #-}
+
+data U = MkU (# Int, Int, Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Don't unbox U, because then we'll pass an unboxed 6-tuple, all in registers.
+no :: U -> Int
+no (MkU (# a, b, c, d, e, f #)) = a + b + c + d + e + f
+{-# NOINLINE no #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Hence do not unbox the nested triple.
+boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+{-# NOINLINE boxed #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the inner unboxed triple *even if* that gets us to 5 args, because
+-- the function will take 5 args anyway. But don't split the pair!
+unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+{-# NOINLINE unboxed #-}
+
+-- Point: Demand on `x` is lazy and thus Unboxed
+app :: ((# Int, Int #) -> (# Int, Int #)) -> (# Int, Int #) -> (# Int, Int #)
+app g x = g x


=====================================
testsuite/tests/stranal/sigs/T21737.stderr
=====================================
@@ -0,0 +1,30 @@
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+
+
+==================== Cpr signatures ====================
+T21737.app:
+T21737.boxed: 1
+T21737.f: 1
+T21737.no: 1
+T21737.unboxed: 1
+T21737.yes: 1
+
+
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+


=====================================
testsuite/tests/stranal/sigs/all.T
=====================================
@@ -38,3 +38,4 @@ test('T21754', normal, compile, [''])
 test('T21888', normal, compile, [''])
 test('T21888a', normal, compile, [''])
 test('T22241', normal, compile, [''])
+test('T21737', normal, compile, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/399e921b05493d79f04e77806c1562806f118d4a...1230c2689db2e510b5a9b280c1a4eca832c23ccc

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/399e921b05493d79f04e77806c1562806f118d4a...1230c2689db2e510b5a9b280c1a4eca832c23ccc
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221110/55fd83a1/attachment-0001.html>


More information about the ghc-commits mailing list