[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 6 commits: Improve SpecConstr for evals
Marge Bot (@marge-bot)
gitlab at gitlab.haskell.org
Sat Aug 27 02:19:21 UTC 2022
Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC
Commits:
4451ba28 by Simon Peyton Jones at 2022-08-26T22:18:54-04:00
Improve SpecConstr for evals
As #21763 showed, we were over-specialising in some cases, when
the function involved was doing a simple 'eval', but not taking
the value apart, or branching on it.
This MR fixes the problem. See Note [Do not specialise evals].
Nofib barely budges, except that spectral/cichelli allocates about
3% less.
Compiler bytes-allocated improves a bit
geo. mean -0.1%
minimum -0.5%
maximum +0.0%
The -0.5% is on T11303b, for what it's worth.
- - - - -
0ea4d0c6 by Matthew Pickering at 2022-08-26T22:18:57-04:00
Revert "Revert "Refactor SpecConstr to use treat bindings uniformly""
This reverts commit 851d8dd89a7955864b66a3da8b25f1dd88a503f8.
This commit was originally reverted due to an increase in space usage.
This was diagnosed as because the SCE increased in size and that was
being retained by another leak. See #22102
- - - - -
bad5e5c2 by Matthew Pickering at 2022-08-26T22:18:57-04:00
Avoid retaining bindings via ModGuts held on the stack
It's better to overwrite the bindings fields of the ModGuts before
starting an iteration as then all the old bindings can be collected as
soon as the simplifier has processed them. Otherwise we end up with the
old bindings being alive until right at the end of the simplifier pass
as the mg_binds field is only modified right at the end.
- - - - -
0025aee1 by Matthew Pickering at 2022-08-26T22:18:57-04:00
Force imposs_deflt_cons in filterAlts
This fixes a pretty serious space leak as the forced thunk would retain
`Alt b` values which would then contain reference to a lot of old
bindings and other simplifier gunk.
The OtherCon unfolding was not forced on subsequent simplifier runs so
more and more old stuff would be retained until the end of
simplification.
Fixing this has a drastic effect on maximum residency for the mmark
package which goes from
```
45,005,401,056 bytes allocated in the heap
17,227,721,856 bytes copied during GC
818,281,720 bytes maximum residency (33 sample(s))
9,659,144 bytes maximum slop
2245 MiB total memory in use (0 MB lost due to fragmentation)
```
to
```
45,039,453,304 bytes allocated in the heap
13,128,181,400 bytes copied during GC
331,546,608 bytes maximum residency (40 sample(s))
7,471,120 bytes maximum slop
916 MiB total memory in use (0 MB lost due to fragmentation)
```
See #21993 for some more discussion.
- - - - -
69f816e6 by Matthew Pickering at 2022-08-26T22:18:57-04:00
Use Solo to avoid retaining the SCE but to avoid performing the substitution
The use of Solo here allows us to force the selection into the SCE to obtain
the Subst but without forcing the substitution to be applied. The resulting thunk
is placed into a lazy field which is rarely forced, so forcing it regresses
peformance.
- - - - -
acb617f2 by Simon Peyton Jones at 2022-08-26T22:18:58-04:00
Fix a nasty loop in Tidy
As the remarkably-simple #22112 showed, we were making a black hole
in the unfolding of a self-recursive binding. Boo!
It's a bit tricky. Documented in GHC.Iface.Tidy,
Note [tidyTopUnfolding: avoiding black holes]
- - - - -
15 changed files:
- compiler/GHC/Core/Lint.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/SpecConstr.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Core/Utils.hs
- compiler/GHC/Iface/Tidy.hs
- compiler/GHC/Types/Id/Info.hs
- + testsuite/tests/simplCore/should_compile/T21763.hs
- + testsuite/tests/simplCore/should_compile/T21763.stderr
- + testsuite/tests/simplCore/should_compile/T21763a.hs
- + testsuite/tests/simplCore/should_compile/T21763a.stderr
- + testsuite/tests/simplCore/should_compile/T22112.hs
- + testsuite/tests/simplCore/should_compile/T22112.stderr
- testsuite/tests/simplCore/should_compile/all.T
Changes:
=====================================
compiler/GHC/Core/Lint.hs
=====================================
@@ -3437,24 +3437,26 @@ lintAnnots pname pass guts = {-# SCC "lintAnnots" #-} do
logger <- getLogger
when (gopt Opt_DoAnnotationLinting dflags) $
liftIO $ Err.showPass logger "Annotation linting - first run"
- nguts <- pass guts
-- If appropriate re-run it without debug annotations to make sure
-- that they made no difference.
- when (gopt Opt_DoAnnotationLinting dflags) $ do
- liftIO $ Err.showPass logger "Annotation linting - second run"
- nguts' <- withoutAnnots pass guts
- -- Finally compare the resulting bindings
- liftIO $ Err.showPass logger "Annotation linting - comparison"
- let binds = flattenBinds $ mg_binds nguts
- binds' = flattenBinds $ mg_binds nguts'
- (diffs,_) = diffBinds True (mkRnEnv2 emptyInScopeSet) binds binds'
- when (not (null diffs)) $ GHC.Core.Opt.Monad.putMsg $ vcat
- [ lint_banner "warning" pname
- , text "Core changes with annotations:"
- , withPprStyle defaultDumpStyle $ nest 2 $ vcat diffs
- ]
- -- Return actual new guts
- return nguts
+ if gopt Opt_DoAnnotationLinting dflags
+ then do
+ nguts <- pass guts
+ liftIO $ Err.showPass logger "Annotation linting - second run"
+ nguts' <- withoutAnnots pass guts
+ -- Finally compare the resulting bindings
+ liftIO $ Err.showPass logger "Annotation linting - comparison"
+ let binds = flattenBinds $ mg_binds nguts
+ binds' = flattenBinds $ mg_binds nguts'
+ (diffs,_) = diffBinds True (mkRnEnv2 emptyInScopeSet) binds binds'
+ when (not (null diffs)) $ GHC.Core.Opt.Monad.putMsg $ vcat
+ [ lint_banner "warning" pname
+ , text "Core changes with annotations:"
+ , withPprStyle defaultDumpStyle $ nest 2 $ vcat diffs
+ ]
+ return nguts
+ else
+ pass guts
-- | Run the given pass without annotations. This means that we both
-- set the debugLevel setting to 0 in the environment as well as all
=====================================
compiler/GHC/Core/Opt/Simplify.hs
=====================================
@@ -153,7 +153,7 @@ simplifyPgm logger unit_env opts
, mg_binds = binds, mg_rules = rules
, mg_fam_inst_env = fam_inst_env })
= do { (termination_msg, it_count, counts_out, guts')
- <- do_iteration 1 [] binds rules
+ <- do_iteration 1 [] binds rules
; when (logHasDumpFlag logger Opt_D_verbose_core2core
&& logHasDumpFlag logger Opt_D_dump_simpl_stats) $
@@ -175,6 +175,9 @@ simplifyPgm logger unit_env opts
print_unqual = mkPrintUnqualified unit_env rdr_env
active_rule = activeRule mode
active_unf = activeUnfolding mode
+ -- Note the bang in !guts_no_binds. If you don't force `guts_no_binds`
+ -- the old bindings are retained until the end of all simplifier iterations
+ !guts_no_binds = guts { mg_binds = [], mg_rules = [] }
do_iteration :: Int -- Counts iterations
-> [SimplCount] -- Counts from earlier iterations, reversed
@@ -198,7 +201,7 @@ simplifyPgm logger unit_env opts
-- number of iterations we actually completed
return ( "Simplifier baled out", iteration_no - 1
, totalise counts_so_far
- , guts { mg_binds = binds, mg_rules = rules } )
+ , guts_no_binds { mg_binds = binds, mg_rules = rules } )
-- Try and force thunks off the binds; significantly reduces
-- space usage, especially with -O. JRS, 000620.
@@ -253,7 +256,7 @@ simplifyPgm logger unit_env opts
if isZeroSimplCount counts1 then
return ( "Simplifier reached fixed point", iteration_no
, totalise (counts1 : counts_so_far) -- Include "free" ticks
- , guts { mg_binds = binds1, mg_rules = rules1 } )
+ , guts_no_binds { mg_binds = binds1, mg_rules = rules1 } )
else do {
-- Short out indirections
-- We do this *after* at least one run of the simplifier
=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -2257,7 +2257,7 @@ prepareAlts tries these things:
case e of x { (a,b) -> rhs }
where the type is a single constructor type. This gives better code
when rhs also scrutinises x or e.
- See CoreUtils Note [Refine DEFAULT case alternatives]
+ See GHC.Core.Utils Note [Refine DEFAULT case alternatives]
3. combineIdenticalAlts: combine identical alternatives into a DEFAULT.
See CoreUtils Note [Combine identical alternatives], which also
=====================================
compiler/GHC/Core/Opt/SpecConstr.hs
=====================================
@@ -77,7 +77,9 @@ import GHC.Serialized ( deserializeWithData )
import Control.Monad ( zipWithM )
import Data.List (nubBy, sortBy, partition, dropWhileEnd, mapAccumL )
+import Data.Maybe( mapMaybe )
import Data.Ord( comparing )
+import Data.Tuple
{-
-----------------------------------------------------
@@ -374,11 +376,14 @@ The recursive call ends up looking like
So we want to spot the constructor application inside the cast.
That's why we have the Cast case in argToPat
-Note [Local recursive groups]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-For a *local* recursive group, we can see all the calls to the
-function, so we seed the specialisation loop from the calls in the
-body, not from the calls in the RHS. Consider:
+Note [Seeding recursive groups]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For a recursive group that is either
+ * nested, or
+ * top-level, but with no exported Ids
+we can see all the calls to the function, so we seed the specialisation
+loop from the calls in the body, and /not/ from the calls in the RHS.
+Consider:
bar m n = foo n (n,n) (n,n) (n,n) (n,n)
where
@@ -401,52 +406,42 @@ a local function.
In a case like the above we end up never calling the original un-specialised
function. (Although we still leave its code around just in case.)
-However, if we find any boring calls in the body, including *unsaturated*
-ones, such as
+Wrinkles
+
+* Boring calls. If we find any boring calls in the body, including
+ *unsaturated* ones, such as
letrec foo x y = ....foo...
in map foo xs
-then we will end up calling the un-specialised function, so then we *should*
-use the calls in the un-specialised RHS as seeds. We call these
-"boring call patterns", and callsToPats reports if it finds any of these.
+ then we will end up calling the un-specialised function, so then we
+ *should* use the calls in the un-specialised RHS as seeds. We call
+ these "boring call patterns", and callsToNewPats reports if it finds
+ any of these. Then 'specialise' unleashes the usage info from the
+ un-specialised RHS.
-Note [Seeding top-level recursive groups]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This seeding is done in the binding for seed_calls in specRec.
-
-1. If all the bindings in a top-level recursive group are local (not
- exported), then all the calls are in the rest of the top-level
- bindings. This means we can specialise with those call patterns
- ONLY, and NOT with the RHSs of the recursive group (exactly like
- Note [Local recursive groups])
-
-2. But if any of the bindings are exported, the function may be called
- with any old arguments, so (for lack of anything better) we specialise
- based on
- (a) the call patterns in the RHS
- (b) the call patterns in the rest of the top-level bindings
- NB: before Apr 15 we used (a) only, but Dimitrios had an example
- where (b) was crucial, so I added that.
- Adding (b) also improved nofib allocation results:
- multiplier: 4% better
- minimax: 2.8% better
-
-Actually in case (2), instead of using the calls from the RHS, it
-would be better to specialise in the importing module. We'd need to
-add an INLINABLE pragma to the function, and then it can be
-specialised in the importing scope, just as is done for type classes
-in GHC.Core.Opt.Specialise.specImports. This remains to be done (#10346).
-
-Note [Top-level recursive groups]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To get the call usage information from "the rest of the top level
-bindings" (c.f. Note [Seeding top-level recursive groups]), we work
-backwards through the top-level bindings so we see the usage before we
-get to the binding of the function. Before we can collect the usage
-though, we go through all the bindings and add them to the
-environment. This is necessary because usage is only tracked for
-functions in the environment. These two passes are called
- 'go' and 'goEnv'
-in specConstrProgram. (Looks a bit revolting to me.)
+* Exported Ids. `specialise` /also/ unleashes `si_mb_unspec`
+ for exported Ids. That way we are sure to generate usage info from
+ the /un-specialised/ RHS of an exported function.
+
+More precisely:
+
+* Always start from the calls in the body of the let or (for top level)
+ calls in the rest of the module. See the body_calls in the call to
+ `specialise` in `specNonRec`, and to `go` in `specRec`.
+
+* si_mb_unspec holds the usage from the unspecialised RHS.
+ See `initSpecInfo`.
+
+* `specialise` will unleash si_mb_unspec, if
+ - `callsToNewPats` reports "boring calls found", or
+ - this is a top-level exported Id.
+
+Historical note. At an earlier point, if a top-level Id was exported,
+we used only seeds from the RHS, and /not/from the body. But Dimitrios
+had an example where using call patterns from the body (the other defns
+in the module) was crucial. And doing so improved nofib allocation results:
+ multiplier: 4% better
+ minimax: 2.8% better
+In any case, it is easier to do!
Note [Do not specialise diverging functions]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -671,14 +666,16 @@ But regardless, SpecConstr can and should! It's easy:
well as constructor applications.
Wrinkles:
+
* This should all work perfectly fine for newtype classes. Mind you,
currently newtype classes are inlined fairly agressively, but we
may change that. And it would take extra code to exclude them, as
well as being unnecessary.
-* We (mis-) use LambdaVal for this purpose, because ConVal
- requires us to list the data constructor and fields, and that
- is (a) inconvenient and (b) unnecessary for class methods.
+* In isValue, we (mis-) use LambdaVal for this ($fblah d1 .. dn)
+ because ConVal requires us to list the data constructor and
+ fields, and that is (a) inconvenient and (b) unnecessary for
+ class methods.
-----------------------------------------------------
Stuff not yet handled
@@ -764,35 +761,18 @@ unbox the strict fields, because T is polymorphic!)
specConstrProgram :: ModGuts -> CoreM ModGuts
specConstrProgram guts
- = do
- dflags <- getDynFlags
- us <- getUniqueSupplyM
- (_, annos) <- getFirstAnnotations deserializeWithData guts
- this_mod <- getModule
- -- pprTraceM "specConstrInput" (ppr $ mg_binds guts)
- let binds' = reverse $ fst $ initUs us $ do
- -- Note [Top-level recursive groups]
- (env, binds) <- goEnv (initScEnv (initScOpts dflags this_mod) annos)
- (mg_binds guts)
- -- binds is identical to (mg_binds guts), except that the
- -- binders on the LHS have been replaced by extendBndr
- -- (SPJ this seems like overkill; I don't think the binders
- -- will change at all; and we don't substitute in the RHSs anyway!!)
- go env nullUsage (reverse binds)
-
- return (guts { mg_binds = binds' })
- where
- -- See Note [Top-level recursive groups]
- goEnv env [] = return (env, [])
- goEnv env (bind:binds) = do (env', bind') <- scTopBindEnv env bind
- (env'', binds') <- goEnv env' binds
- return (env'', bind' : binds')
-
- -- Arg list of bindings is in reverse order
- go _ _ [] = return []
- go env usg (bind:binds) = do (usg', bind') <- scTopBind env usg bind
- binds' <- go env usg' binds
- return (bind' : binds')
+ = do { env0 <- initScEnv guts
+ ; us <- getUniqueSupplyM
+ ; let (_usg, binds') = initUs_ us $
+ scTopBinds env0 (mg_binds guts)
+
+ ; return (guts { mg_binds = binds' }) }
+
+scTopBinds :: ScEnv -> [InBind] -> UniqSM (ScUsage, [OutBind])
+scTopBinds _env [] = return (nullUsage, [])
+scTopBinds env (b:bs) = do { (usg, b', bs') <- scBind TopLevel env b $
+ (\env -> scTopBinds env bs)
+ ; return (usg, b' ++ bs') }
{-
************************************************************************
@@ -956,14 +936,24 @@ initScOpts dflags this_mod = SpecConstrOpts
sc_keen = gopt Opt_SpecConstrKeen dflags
}
-initScEnv :: SpecConstrOpts -> UniqFM Name SpecConstrAnnotation -> ScEnv
-initScEnv opts anns
- = SCE { sc_opts = opts,
- sc_force = False,
- sc_subst = emptySubst,
- sc_how_bound = emptyVarEnv,
- sc_vals = emptyVarEnv,
- sc_annotations = anns }
+initScEnv :: ModGuts -> CoreM ScEnv
+initScEnv guts
+ = do { dflags <- getDynFlags
+ ; (_, anns) <- getFirstAnnotations deserializeWithData guts
+ ; this_mod <- getModule
+ ; return (SCE { sc_opts = initScOpts dflags this_mod,
+ sc_force = False,
+ sc_subst = init_subst,
+ sc_how_bound = emptyVarEnv,
+ sc_vals = emptyVarEnv,
+ sc_annotations = anns }) }
+ where
+ init_subst = mkEmptySubst $ mkInScopeSet $ mkVarSet $
+ bindersOfBinds (mg_binds guts)
+ -- Acccount for top-level bindings that are not in dependency order;
+ -- see Note [Glomming] in GHC.Core.Opt.OccurAnal
+ -- Easiest thing is to bring all the top level binders into scope at once,
+ -- as if at once, as if all the top-level decls were mutually recursive.
data HowBound = RecFun -- These are the recursive functions for which
-- we seek interesting call patterns
@@ -984,8 +974,18 @@ lookupHowBound env id = lookupVarEnv (sc_how_bound env) id
scSubstId :: ScEnv -> InId -> OutExpr
scSubstId env v = lookupIdSubst (sc_subst env) v
-scSubstTy :: ScEnv -> InType -> OutType
-scSubstTy env ty = substTyUnchecked (sc_subst env) ty
+-- The !subst ensures that we force the selection `(sc_subst env)`, which avoids
+-- retaining all of `env` when we only need `subst`. The `Solo` means that the
+-- substitution itself is lazy, because that type is often discarded.
+-- The callers of `scSubstTy` always force the result (to unpack the `Solo`)
+-- so we get the desired effect: we leave a thunk, but retain only the subst,
+-- not the whole env.
+--
+-- Fully forcing the result of `scSubstTy` regresses performance (#22102)
+scSubstTy :: ScEnv -> InType -> Solo OutType
+scSubstTy env ty =
+ let !subst = sc_subst env
+ in Solo (substTyUnchecked subst ty)
scSubstCo :: ScEnv -> Coercion -> Coercion
scSubstCo env co = substCo (sc_subst env) co
@@ -1187,8 +1187,8 @@ data ScUsage
scu_occs :: !(IdEnv ArgOcc) -- Information on argument occurrences
} -- The domain is OutIds
-type CallEnv = IdEnv [Call]
-data Call = Call Id [CoreArg] ValueEnv
+type CallEnv = IdEnv [Call] -- Domain is OutIds
+data Call = Call OutId [CoreArg] ValueEnv
-- The arguments of the call, together with the
-- env giving the constructor bindings at the call site
-- We keep the function mainly for debug output
@@ -1210,6 +1210,9 @@ nullUsage = SCU { scu_calls = emptyVarEnv, scu_occs = emptyVarEnv }
combineCalls :: CallEnv -> CallEnv -> CallEnv
combineCalls = plusVarEnv_C (++)
+delCallsFor :: ScUsage -> [Var] -> ScUsage
+delCallsFor env bndrs = env { scu_calls = scu_calls env `delVarEnvList` bndrs }
+
combineUsage :: ScUsage -> ScUsage -> ScUsage
combineUsage u1 u2 = SCU { scu_calls = combineCalls (scu_calls u1) (scu_calls u2),
scu_occs = plusVarEnv_C combineOcc (scu_occs u1) (scu_occs u2) }
@@ -1227,7 +1230,20 @@ data ArgOcc = NoOcc -- Doesn't occur at all; or a type argument
| UnkOcc -- Used in some unknown way
| ScrutOcc -- See Note [ScrutOcc]
- (DataConEnv [ArgOcc]) -- How the sub-components are used
+ (DataConEnv [ArgOcc])
+ -- [ArgOcc]: how the sub-components are used
+
+deadArgOcc :: ArgOcc -> Bool
+deadArgOcc (ScrutOcc {}) = False
+deadArgOcc UnkOcc = False
+deadArgOcc NoOcc = True
+
+specialisableArgOcc :: ArgOcc -> Bool
+-- | Does this occurence represent one worth specializing for.
+specialisableArgOcc UnkOcc = False
+specialisableArgOcc NoOcc = False
+specialisableArgOcc (ScrutOcc {}) = True
+
{- Note [ScrutOcc]
~~~~~~~~~~~~~~~~~~
@@ -1253,6 +1269,9 @@ instance Outputable ArgOcc where
ppr NoOcc = text "no-occ"
evalScrutOcc :: ArgOcc
+-- We use evalScrutOcc for
+-- - mkVarUsage: applied functions
+-- - scApp: dicts that are the arugment of a classop
evalScrutOcc = ScrutOcc emptyUFM
-- Experimentally, this version of combineOcc makes ScrutOcc "win", so
@@ -1292,6 +1311,121 @@ The main recursive function gathers up usage information, and
creates specialised versions of functions.
-}
+scBind :: TopLevelFlag -> ScEnv -> InBind
+ -> (ScEnv -> UniqSM (ScUsage, a)) -- Specialise the scope of the binding
+ -> UniqSM (ScUsage, [OutBind], a)
+scBind top_lvl env (NonRec bndr rhs) do_body
+ | isTyVar bndr -- Type-lets may be created by doBeta
+ = do { (final_usage, body') <- do_body (extendScSubst env bndr rhs)
+ ; return (final_usage, [], body') }
+
+ | not (isTopLevel top_lvl) -- Nested non-recursive value binding
+ -- See Note [Specialising local let bindings]
+ = do { let (body_env, bndr') = extendBndr env bndr
+ -- Not necessary at top level; but here we are nested
+
+ ; rhs_info <- scRecRhs env (bndr',rhs)
+
+ ; let body_env2 = extendHowBound body_env [bndr'] RecFun
+ rhs' = ri_new_rhs rhs_info
+ body_env3 = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs')
+
+ ; (body_usg, body') <- do_body body_env3
+
+ -- Now make specialised copies of the binding,
+ -- based on calls in body_usg
+ ; (spec_usg, specs) <- specNonRec env (scu_calls body_usg) rhs_info
+ -- NB: For non-recursive bindings we inherit sc_force flag from
+ -- the parent function (see Note [Forcing specialisation])
+
+ -- Specialized + original binding
+ ; let spec_bnds = [NonRec b r | (b,r) <- ruleInfoBinds rhs_info specs]
+ bind_usage = (body_usg `delCallsFor` [bndr'])
+ `combineUsage` spec_usg -- Note [spec_usg includes rhs_usg]
+
+ ; return (bind_usage, spec_bnds, body')
+ }
+
+ | otherwise -- Top-level, non-recursive value binding
+ -- At top level we do not specialise non-recursive bindings; that
+ -- is, we do not call specNonRec, passing the calls from the body.
+ -- The original paper only specialised /recursive/ bindings, but
+ -- we later started specialising nested non-recursive bindings:
+ -- see Note [Specialising local let bindings]
+ --
+ -- I tried always specialising non-recursive top-level bindings too,
+ -- but found some regressions (see !8135). So I backed off.
+ = do { (rhs_usage, rhs') <- scExpr env rhs
+
+ -- At top level, we've already put all binders into scope; see initScEnv
+ -- Hence no need to call `extendBndr`. But we still want to
+ -- extend the `ValueEnv` to record the value of this binder.
+ ; let body_env = extendValEnv env bndr (isValue (sc_vals env) rhs')
+ ; (body_usage, body') <- do_body body_env
+
+ ; return (rhs_usage `combineUsage` body_usage, [NonRec bndr rhs'], body') }
+
+scBind top_lvl env (Rec prs) do_body
+ | isTopLevel top_lvl
+ , Just threshold <- sc_size (sc_opts env)
+ , not force_spec
+ , not (all (couldBeSmallEnoughToInline (sc_uf_opts (sc_opts env)) threshold) rhss)
+ = -- Do no specialisation if the RHSs are too big
+ -- ToDo: I'm honestly not sure of the rationale of this size-testing, nor
+ -- why it only applies at top level. But that's the way it has been
+ -- for a while. See #21456.
+ do { (body_usg, body') <- do_body rhs_env2
+ ; (rhs_usgs, rhss') <- mapAndUnzipM (scExpr env) rhss
+ ; let all_usg = (combineUsages rhs_usgs `combineUsage` body_usg)
+ `delCallsFor` bndrs'
+ bind' = Rec (bndrs' `zip` rhss')
+ ; return (all_usg, [bind'], body') }
+
+ | otherwise
+ = do { rhs_infos <- mapM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
+ ; (body_usg, body') <- do_body rhs_env2
+
+ ; (spec_usg, specs) <- specRec (scForce rhs_env2 force_spec)
+ (scu_calls body_usg) rhs_infos
+ -- Do not unconditionally generate specialisations from rhs_usgs
+ -- Instead use them only if we find an unspecialised call
+ -- See Note [Seeding recursive groups]
+
+ ; let all_usg = (spec_usg `combineUsage` body_usg) -- Note [spec_usg includes rhs_usg]
+ `delCallsFor` bndrs'
+ bind' = Rec (concat (zipWithEqual "scExpr'" ruleInfoBinds rhs_infos specs))
+ -- zipWithEqual: length of returned [SpecInfo]
+ -- should be the same as incoming [RhsInfo]
+
+ ; return (all_usg, [bind'], body') }
+ where
+ (bndrs,rhss) = unzip prs
+ force_spec = any (forceSpecBndr env) bndrs -- Note [Forcing specialisation]
+
+ (rhs_env1,bndrs') | isTopLevel top_lvl = (env, bndrs)
+ | otherwise = extendRecBndrs env bndrs
+ -- At top level, we've already put all binders into scope; see initScEnv
+
+ rhs_env2 = extendHowBound rhs_env1 bndrs' RecFun
+
+{- Note [Specialising local let bindings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+It is not uncommon to find this
+
+ let $j = \x. <blah> in ...$j True...$j True...
+
+Here $j is an arbitrary let-bound function, but it often comes up for
+join points. We might like to specialise $j for its call patterns.
+Notice the difference from a letrec, where we look for call patterns
+in the *RHS* of the function. Here we look for call patterns in the
+*body* of the let.
+
+At one point I predicated this on the RHS mentioning the outer
+recursive function, but that's not essential and might even be
+harmful. I'm not sure.
+-}
+
+------------------------
scExpr, scExpr' :: ScEnv -> CoreExpr -> UniqSM (ScUsage, CoreExpr)
-- The unique supply is needed when we invent
-- a new name for the specialised function and its args
@@ -1302,7 +1436,9 @@ scExpr' env (Var v) = case scSubstId env v of
Var v' -> return (mkVarUsage env v' [], Var v')
e' -> scExpr (zapScSubst env) e'
-scExpr' env (Type t) = return (nullUsage, Type (scSubstTy env t))
+scExpr' env (Type t) =
+ let !(Solo ty') = scSubstTy env t
+ in return (nullUsage, Type ty')
scExpr' env (Coercion c) = return (nullUsage, Coercion (scSubstCo env c))
scExpr' _ e@(Lit {}) = return (nullUsage, e)
scExpr' env (Tick t e) = do (usg, e') <- scExpr env e
@@ -1316,6 +1452,11 @@ scExpr' env (Lam b e) = do let (env', b') = extendBndr env b
(usg, e') <- scExpr env' e
return (usg, Lam b' e')
+scExpr' env (Let bind body)
+ = do { (final_usage, binds', body') <- scBind NotTopLevel env bind $
+ (\env -> scExpr env body)
+ ; return (final_usage, mkLets binds' body') }
+
scExpr' env (Case scrut b ty alts)
= do { (scrut_usg, scrut') <- scExpr env scrut
; case isValue (sc_vals env) scrut' of
@@ -1333,17 +1474,19 @@ scExpr' env (Case scrut b ty alts)
= do { let (alt_env,b') = extendBndrWith RecArg env b
-- Record RecArg for the components
- ; (alt_usgs, alt_occs, alts')
- <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts
+ ; (alt_usgs, alt_occs, alts') <- mapAndUnzip3M (sc_alt alt_env scrut' b') alts
; let scrut_occ = foldr combineOcc NoOcc alt_occs
scrut_usg' = setScrutOcc env scrut_usg scrut' scrut_occ
-- The combined usage of the scrutinee is given
- -- by scrut_occ, which is passed to scScrut, which
+ -- by scrut_occ, which is passed to setScrutOcc, which
-- in turn treats a bare-variable scrutinee specially
+ ; let !(Solo ty') = scSubstTy env ty
; return (foldr combineUsage scrut_usg' alt_usgs,
- Case scrut' b' (scSubstTy env ty) alts') }
+ Case scrut' b' ty' alts') }
+
+ single_alt = isSingleton alts
sc_alt env scrut' b' (Alt con bs rhs)
= do { let (env1, bs1) = extendBndrsWith RecArg env bs
@@ -1351,82 +1494,52 @@ scExpr' env (Case scrut b ty alts)
; (usg, rhs') <- scExpr env2 rhs
; let (usg', b_occ:arg_occs) = lookupOccs usg (b':bs2)
scrut_occ = case con of
- DataAlt dc -> ScrutOcc (unitUFM dc arg_occs)
- _ -> evalScrutOcc
+ DataAlt dc -- See Note [Do not specialise evals]
+ | not (single_alt && all deadArgOcc arg_occs)
+ -> ScrutOcc (unitUFM dc arg_occs)
+ _ -> UnkOcc
; return (usg', b_occ `combineOcc` scrut_occ, Alt con bs2 rhs') }
-scExpr' env (Let (NonRec bndr rhs) body)
- | isTyVar bndr -- Type-lets may be created by doBeta
- = scExpr' (extendScSubst env bndr rhs) body
-
- | otherwise
- = do { let (body_env, bndr') = extendBndr env bndr
- ; rhs_info <- scRecRhs env (bndr',rhs)
-
- ; let body_env2 = extendHowBound body_env [bndr'] RecFun
- -- See Note [Local let bindings]
- rhs' = ri_new_rhs rhs_info
- body_env3 = extendValEnv body_env2 bndr' (isValue (sc_vals env) rhs')
-
- ; (body_usg, body') <- scExpr body_env3 body
-
- -- NB: For non-recursive bindings we inherit sc_force flag from
- -- the parent function (see Note [Forcing specialisation])
- ; (spec_usg, specs) <- specNonRec env body_usg rhs_info
- -- Specialized + original binding
- ; let spec_bnds = mkLets [NonRec b r | (b,r) <- ruleInfoBinds rhs_info specs] body'
- -- ; pprTraceM "spec_bnds" $ (ppr spec_bnds)
-
- ; return (body_usg { scu_calls = scu_calls body_usg `delVarEnv` bndr' }
- `combineUsage` spec_usg, -- Note [spec_usg includes rhs_usg]
- spec_bnds
- )
- }
-
-
--- A *local* recursive group: see Note [Local recursive groups]
-scExpr' env (Let (Rec prs) body)
- = do { let (bndrs,rhss) = unzip prs
- (rhs_env1,bndrs') = extendRecBndrs env bndrs
- rhs_env2 = extendHowBound rhs_env1 bndrs' RecFun
- force_spec = any (forceSpecBndr env) bndrs'
- -- Note [Forcing specialisation]
-
- ; rhs_infos <- mapM (scRecRhs rhs_env2) (bndrs' `zip` rhss)
- ; (body_usg, body') <- scExpr rhs_env2 body
-
- -- NB: start specLoop from body_usg
- ; (spec_usg, specs) <- specRec NotTopLevel (scForce rhs_env2 force_spec)
- body_usg rhs_infos
- -- Do not unconditionally generate specialisations from rhs_usgs
- -- Instead use them only if we find an unspecialised call
- -- See Note [Local recursive groups]
- ; let all_usg = spec_usg `combineUsage` body_usg -- Note [spec_usg includes rhs_usg]
- bind' = Rec (concat (zipWithEqual "scExpr'" ruleInfoBinds rhs_infos specs))
- -- zipWithEqual: length of returned [SpecInfo]
- -- should be the same as incoming [RhsInfo]
-
- ; return (all_usg { scu_calls = scu_calls all_usg `delVarEnvList` bndrs' },
- Let bind' body') }
-
-{-
-Note [Local let bindings]
-~~~~~~~~~~~~~~~~~~~~~~~~~
-It is not uncommon to find this
-
- let $j = \x. <blah> in ...$j True...$j True...
-
-Here $j is an arbitrary let-bound function, but it often comes up for
-join points. We might like to specialise $j for its call patterns.
-Notice the difference from a letrec, where we look for call patterns
-in the *RHS* of the function. Here we look for call patterns in the
-*body* of the let.
-
-At one point I predicated this on the RHS mentioning the outer
-recursive function, but that's not essential and might even be
-harmful. I'm not sure.
+{- Note [Do not specialise evals]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+ f x y = case x of I# _ ->
+ if y>1 then f x (y-1) else x
+
+Here `x` is scrutinised by a case, but only in an eval-like way; the
+/component/ of the I# is unused. We don't want to specialise this
+function, even if we find a call (f (I# z)), because nothing is gained
+ * No case branches are discarded
+ * No allocation in removed
+The specialised version would take an unboxed Int#, pass it along,
+and rebox it at the end.
+
+In fact this can cause significant regression. In #21763 we had:
+like
+ f = ... case x of x' { I# n ->
+ join j y = rhs
+ in ...jump j x'...
+
+Now if we specialise `j` for the argument `I# n`, we'll end up reboxing
+it in `j`, without even removing an allocation from the call site.
+
+Reboxing is always a worry. But here we can ameliorate the problem as
+follows.
+
+* In scExpr (Case ...), for a /single-alternative/ case expression, in
+ which the pattern binders are all unused, we build a UnkOcc for
+ the scrutinee, not one that maps the data constructor; we don't treat
+ this occurrence as a reason for specialisation.
+
+* Conveniently, SpecConstr is doing its own occurrence analysis, so
+ the "unused" bit is just looking for NoOcc
+
+* Note that if we have
+ f x = case x of { True -> e1; False -> e2 }
+ then even though the pattern binders are unused (there are none), it is
+ still worth specialising on x. Hence the /single-alternative/ guard.
-}
scApp :: ScEnv -> (InExpr, [InExpr]) -> UniqSM (ScUsage, CoreExpr)
@@ -1478,55 +1591,9 @@ mkVarUsage env fn args
, scu_occs = unitVarEnv fn arg_occ }
Nothing -> nullUsage
where
- -- I rather think we could use UnkOcc all the time
arg_occ | null args = UnkOcc
| otherwise = evalScrutOcc
-----------------------
-scTopBindEnv :: ScEnv -> CoreBind -> UniqSM (ScEnv, CoreBind)
-scTopBindEnv env (Rec prs)
- = do { let (rhs_env1,bndrs') = extendRecBndrs env bndrs
- rhs_env2 = extendHowBound rhs_env1 bndrs RecFun
-
- prs' = zip bndrs' rhss
- ; return (rhs_env2, Rec prs') }
- where
- (bndrs,rhss) = unzip prs
-
-scTopBindEnv env (NonRec bndr rhs)
- = do { let (env1, bndr') = extendBndr env bndr
- env2 = extendValEnv env1 bndr' (isValue (sc_vals env) rhs)
- ; return (env2, NonRec bndr' rhs) }
-
-----------------------
-scTopBind :: ScEnv -> ScUsage -> CoreBind -> UniqSM (ScUsage, CoreBind)
-
-scTopBind env body_usage (Rec prs)
- | Just threshold <- sc_size $ sc_opts env
- , not force_spec
- , not (all (couldBeSmallEnoughToInline (sc_uf_opts $ sc_opts env) threshold) rhss)
- -- No specialisation
- = -- pprTrace "scTopBind: nospec" (ppr bndrs) $
- do { (rhs_usgs, rhss') <- mapAndUnzipM (scExpr env) rhss
- ; return (body_usage `combineUsage` combineUsages rhs_usgs, Rec (bndrs `zip` rhss')) }
-
- | otherwise -- Do specialisation
- = do { rhs_infos <- mapM (scRecRhs env) prs
-
- ; (spec_usage, specs) <- specRec TopLevel (scForce env force_spec)
- body_usage rhs_infos
-
- ; return (body_usage `combineUsage` spec_usage,
- Rec (concat (zipWith ruleInfoBinds rhs_infos specs))) }
- where
- (bndrs,rhss) = unzip prs
- force_spec = any (forceSpecBndr env) bndrs
- -- Note [Forcing specialisation]
-
-scTopBind env usage (NonRec bndr rhs) -- Oddly, we don't seem to specialise top-level non-rec functions
- = do { (rhs_usg', rhs') <- scExpr env rhs
- ; return (usage `combineUsage` rhs_usg', NonRec bndr rhs') }
-
----------------------
scRecRhs :: ScEnv -> (OutId, InExpr) -> UniqSM RhsInfo
scRecRhs env (bndr,rhs)
@@ -1574,7 +1641,8 @@ data RhsInfo
}
data SpecInfo -- Info about specialisations for a particular Id
- = SI { si_specs :: [OneSpec] -- The specialisations we have generated
+ = SI { si_specs :: [OneSpec] -- The specialisations we have
+ -- generated for this function
, si_n_specs :: Int -- Length of si_specs; used for numbering them
@@ -1585,7 +1653,7 @@ data SpecInfo -- Info about specialisations for a particular Id
-- RHS usage (which has not yet been
-- unleashed)
-- Nothing => we have
- -- See Note [Local recursive groups]
+ -- See Note [Seeding recursive groups]
-- See Note [spec_usg includes rhs_usg]
-- One specialisation: Rule plus definition
@@ -1595,57 +1663,62 @@ data OneSpec =
, os_id :: OutId -- Spec id
, os_rhs :: OutExpr } -- Spec rhs
-noSpecInfo :: SpecInfo
-noSpecInfo = SI { si_specs = [], si_n_specs = 0, si_mb_unspec = Nothing }
+initSpecInfo :: RhsInfo -> SpecInfo
+initSpecInfo (RI { ri_rhs_usg = rhs_usg })
+ = SI { si_specs = [], si_n_specs = 0, si_mb_unspec = Just rhs_usg }
+ -- si_mb_unspec: add in rhs_usg if there are any boring calls,
+ -- or if the bndr is exported
----------------------
specNonRec :: ScEnv
- -> ScUsage -- Body usage
+ -> CallEnv -- Calls in body
-> RhsInfo -- Structure info usage info for un-specialised RHS
-> UniqSM (ScUsage, SpecInfo) -- Usage from RHSs (specialised and not)
-- plus details of specialisations
-specNonRec env body_usg rhs_info
- = specialise env (scu_calls body_usg) rhs_info
- (noSpecInfo { si_mb_unspec = Just (ri_rhs_usg rhs_info) })
+specNonRec env body_calls rhs_info
+ = specialise env body_calls rhs_info (initSpecInfo rhs_info)
----------------------
-specRec :: TopLevelFlag -> ScEnv
- -> ScUsage -- Body usage
+specRec :: ScEnv
+ -> CallEnv -- Calls in body
-> [RhsInfo] -- Structure info and usage info for un-specialised RHSs
-> UniqSM (ScUsage, [SpecInfo]) -- Usage from all RHSs (specialised and not)
-- plus details of specialisations
-specRec top_lvl env body_usg rhs_infos
- = go 1 seed_calls nullUsage init_spec_infos
+specRec env body_calls rhs_infos
+ = go 1 body_calls nullUsage (map initSpecInfo rhs_infos)
+ -- body_calls: see Note [Seeding recursive groups]
+ -- NB: 'go' always calls 'specialise' once, which in turn unleashes
+ -- si_mb_unspec if there are any boring calls in body_calls,
+ -- or if any of the Id(s) are exported
where
opts = sc_opts env
- (seed_calls, init_spec_infos) -- Note [Seeding top-level recursive groups]
- | isTopLevel top_lvl
- , any (isExportedId . ri_fn) rhs_infos -- Seed from body and RHSs
- = (all_calls, [noSpecInfo | _ <- rhs_infos])
- | otherwise -- Seed from body only
- = (calls_in_body, [noSpecInfo { si_mb_unspec = Just (ri_rhs_usg ri) }
- | ri <- rhs_infos])
-
- calls_in_body = scu_calls body_usg
- calls_in_rhss = foldr (combineCalls . scu_calls . ri_rhs_usg) emptyVarEnv rhs_infos
- all_calls = calls_in_rhss `combineCalls` calls_in_body
-- Loop, specialising, until you get no new specialisations
- go :: Int -- Which iteration of the "until no new specialisations"
- -- loop we are on; first iteration is 1
- -> CallEnv -- Seed calls
- -- Two accumulating parameters:
- -> ScUsage -- Usage from earlier specialisations
- -> [SpecInfo] -- Details of specialisations so far
- -> UniqSM (ScUsage, [SpecInfo])
+ go, go_again :: Int -- Which iteration of the "until no new specialisations"
+ -- loop we are on; first iteration is 1
+ -> CallEnv -- Seed calls
+ -- Two accumulating parameters:
+ -> ScUsage -- Usage from earlier specialisations
+ -> [SpecInfo] -- Details of specialisations so far
+ -> UniqSM (ScUsage, [SpecInfo])
go n_iter seed_calls usg_so_far spec_infos
+ = -- pprTrace "specRec3" (vcat [ text "bndrs" <+> ppr (map ri_fn rhs_infos)
+ -- , text "iteration" <+> int n_iter
+ -- , text "spec_infos" <+> ppr (map (map os_pat . si_specs) spec_infos)
+ -- ]) $
+ do { specs_w_usg <- zipWithM (specialise env seed_calls) rhs_infos spec_infos
+ ; let (extra_usg_s, all_spec_infos) = unzip specs_w_usg
+ extra_usg = combineUsages extra_usg_s
+ all_usg = usg_so_far `combineUsage` extra_usg
+ new_calls = scu_calls extra_usg
+ ; go_again n_iter new_calls all_usg all_spec_infos }
+
+ -- go_again deals with termination
+ go_again n_iter seed_calls usg_so_far spec_infos
| isEmptyVarEnv seed_calls
- = -- pprTrace "specRec1" (vcat [ ppr (map ri_fn rhs_infos)
- -- , ppr seed_calls
- -- , ppr body_usg ]) $
- return (usg_so_far, spec_infos)
+ = return (usg_so_far, spec_infos)
-- Limit recursive specialisation
-- See Note [Limit recursive specialisation]
@@ -1654,26 +1727,20 @@ specRec top_lvl env body_usg rhs_infos
-- If both of these are false, the sc_count
-- threshold will prevent non-termination
, any ((> the_limit) . si_n_specs) spec_infos
- = -- pprTrace "specRec2" (ppr (map (map os_pat . si_specs) spec_infos)) $
- return (usg_so_far, spec_infos)
+ = -- Give up on specialisation, but don't forget to include the rhs_usg
+ -- for the unspecialised function, since it may now be called
+ -- pprTrace "specRec2" (ppr (map (map os_pat . si_specs) spec_infos)) $
+ let rhs_usgs = combineUsages (mapMaybe si_mb_unspec spec_infos)
+ in return (usg_so_far `combineUsage` rhs_usgs, spec_infos)
| otherwise
- = -- pprTrace "specRec3" (vcat [ text "bndrs" <+> ppr (map ri_fn rhs_infos)
- -- , text "iteration" <+> int n_iter
- -- , text "spec_infos" <+> ppr (map (map os_pat . si_specs) spec_infos)
- -- ]) $
- do { specs_w_usg <- zipWithM (specialise env seed_calls) rhs_infos spec_infos
- ; let (extra_usg_s, new_spec_infos) = unzip specs_w_usg
- extra_usg = combineUsages extra_usg_s
- all_usg = usg_so_far `combineUsage` extra_usg
- ; go (n_iter + 1) (scu_calls extra_usg) all_usg new_spec_infos }
+ = go (n_iter + 1) seed_calls usg_so_far spec_infos
-- See Note [Limit recursive specialisation]
the_limit = case sc_count opts of
Nothing -> 10 -- Ugh!
Just max -> max
-
----------------------
specialise
:: ScEnv
@@ -1696,14 +1763,12 @@ specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs
spec_info@(SI { si_specs = specs, si_n_specs = spec_count
, si_mb_unspec = mb_unspec })
| isDeadEndId fn -- Note [Do not specialise diverging functions]
- -- and do not generate specialisation seeds from its RHS
+ -- /and/ do not generate specialisation seeds from its RHS
= -- pprTrace "specialise bot" (ppr fn) $
return (nullUsage, spec_info)
| not (isNeverActive (idInlineActivation fn))
-- See Note [Transfer activation]
- --
- --
-- Don't specialise OPAQUE things, see Note [OPAQUE pragma].
-- Since OPAQUE things are always never-active (see
-- GHC.Parser.PostProcess.mkOpaquePragma) this guard never fires for
@@ -1729,14 +1794,16 @@ specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs
; let spec_usg = combineUsages spec_usgs
+ unspec_rhs_needed = boring_call || isExportedId fn
+
-- If there were any boring calls among the seeds (= all_calls), then those
-- calls will call the un-specialised function. So we should use the seeds
-- from the _unspecialised_ function's RHS, which are in mb_unspec, by returning
-- then in new_usg.
- (new_usg, mb_unspec')
- = case mb_unspec of
- Just rhs_usg | boring_call -> (spec_usg `combineUsage` rhs_usg, Nothing)
- _ -> (spec_usg, mb_unspec)
+ (new_usg, mb_unspec') = case mb_unspec of
+ Just rhs_usg | unspec_rhs_needed
+ -> (spec_usg `combineUsage` rhs_usg, Nothing)
+ _ -> (spec_usg, mb_unspec)
-- ; pprTrace "specialise return }"
-- (vcat [ ppr fn
@@ -1744,8 +1811,8 @@ specialise env bind_calls (RI { ri_fn = fn, ri_lam_bndrs = arg_bndrs
-- , text "new calls:" <+> ppr (scu_calls new_usg)]) $
-- return ()
- ; return (new_usg, SI { si_specs = new_specs ++ specs
- , si_n_specs = spec_count + n_pats
+ ; return (new_usg, SI { si_specs = new_specs ++ specs
+ , si_n_specs = spec_count + n_pats
, si_mb_unspec = mb_unspec' }) }
| otherwise -- No calls, inactive, or not a function
@@ -2027,7 +2094,8 @@ calcSpecInfo fn (CP { cp_qvars = qvars, cp_args = pats }) extra_bndrs
Note [spec_usg includes rhs_usg]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In calls to 'specialise', the returned ScUsage must include the rhs_usg in
-the passed-in SpecInfo, unless there are no calls at all to the function.
+the passed-in SpecInfo in si_mb_unspec, unless there are no calls at all to
+the function.
The caller can, indeed must, assume this. They should not combine in rhs_usg
themselves, or they'll get rhs_usg twice -- and that can lead to an exponential
@@ -2245,9 +2313,11 @@ callsToNewPats :: ScEnv -> Id
-> SpecInfo
-> [ArgOcc] -> [Call]
-> UniqSM (Bool, [CallPat])
- -- Result has no duplicate patterns,
- -- nor ones mentioned in done_pats
- -- Bool indicates that there was at least one boring pattern
+-- Result has no duplicate patterns,
+-- nor ones mentioned in si_specs (hence "new" patterns)
+-- Bool indicates that there was at least one boring pattern
+-- The "New" in the name means "patterns that are not already covered
+-- by an existing specialisation"
callsToNewPats env fn spec_info@(SI { si_specs = done_specs }) bndr_occs calls
= do { mb_pats <- mapM (callToPats env bndr_occs) calls
@@ -2558,10 +2628,8 @@ argToPat1 env in_scope val_env arg arg_occ _arg_str
-- (b) we know what its value is
-- In that case it counts as "interesting"
argToPat1 env in_scope val_env (Var v) arg_occ arg_str
- | sc_force env || case arg_occ of { ScrutOcc {} -> True
- ; UnkOcc -> False
- ; NoOcc -> False } -- (a)
- , is_value -- (b)
+ | sc_force env || specialisableArgOcc arg_occ -- (a)
+ , is_value -- (b)
-- Ignoring sc_keen here to avoid gratuitously incurring Note [Reboxing]
-- So sc_keen focused just on f (I# x), where we have freshly-allocated
-- box that we can eliminate in the caller
=====================================
compiler/GHC/Core/Tidy.hs
=====================================
@@ -10,7 +10,7 @@ The code for *top-level* bindings is in GHC.Iface.Tidy.
{-# OPTIONS_GHC -Wno-incomplete-record-updates #-}
module GHC.Core.Tidy (
- tidyExpr, tidyRules, tidyUnfolding, tidyCbvInfoTop
+ tidyExpr, tidyRules, tidyCbvInfoTop, tidyBndrs
) where
import GHC.Prelude
@@ -360,33 +360,36 @@ tidyLetBndr rec_tidy_env env@(tidy_env, var_env) id
`setUnfoldingInfo` new_unf
old_unf = realUnfoldingInfo old_info
- new_unf | isStableUnfolding old_unf = tidyUnfolding rec_tidy_env old_unf old_unf
- | otherwise = trimUnfolding old_unf
- -- See Note [Preserve evaluatedness]
+ new_unf = tidyNestedUnfolding rec_tidy_env old_unf
in
((tidy_env', var_env'), id') }
------------ Unfolding --------------
-tidyUnfolding :: TidyEnv -> Unfolding -> Unfolding -> Unfolding
-tidyUnfolding tidy_env df@(DFunUnfolding { df_bndrs = bndrs, df_args = args }) _
+tidyNestedUnfolding :: TidyEnv -> Unfolding -> Unfolding
+tidyNestedUnfolding _ NoUnfolding = NoUnfolding
+tidyNestedUnfolding _ BootUnfolding = BootUnfolding
+tidyNestedUnfolding _ (OtherCon {}) = evaldUnfolding
+
+tidyNestedUnfolding tidy_env df@(DFunUnfolding { df_bndrs = bndrs, df_args = args })
= df { df_bndrs = bndrs', df_args = map (tidyExpr tidy_env') args }
where
(tidy_env', bndrs') = tidyBndrs tidy_env bndrs
-tidyUnfolding tidy_env
- unf@(CoreUnfolding { uf_tmpl = unf_rhs, uf_src = src })
- unf_from_rhs
+tidyNestedUnfolding tidy_env
+ unf@(CoreUnfolding { uf_tmpl = unf_rhs, uf_src = src, uf_is_value = is_value })
| isStableSource src
= seqIt $ unf { uf_tmpl = tidyExpr tidy_env unf_rhs } -- Preserves OccInfo
- -- This seqIt avoids a space leak: otherwise the uf_is_value,
- -- uf_is_conlike, ... fields may retain a reference to the
- -- pre-tidied expression forever (GHC.CoreToIface doesn't look at them)
-
- | otherwise
- = unf_from_rhs
- where seqIt unf = seqUnfolding unf `seq` unf
-tidyUnfolding _ unf _ = unf -- NoUnfolding or OtherCon
+ -- This seqIt avoids a space leak: otherwise the uf_is_value,
+ -- uf_is_conlike, ... fields may retain a reference to the
+ -- pre-tidied expression forever (GHC.CoreToIface doesn't look at them)
+
+ -- Discard unstable unfoldings, but see Note [Preserve evaluatedness]
+ | is_value = evaldUnfolding
+ | otherwise = noUnfolding
+
+ where
+ seqIt unf = seqUnfolding unf `seq` unf
{-
Note [Tidy IdInfo]
=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -706,7 +706,11 @@ filterAlts :: TyCon -- ^ Type constructor of scrutinee's type (us
-- in a "case" statement then they will need to manually add a dummy case branch that just
-- calls "error" or similar.
filterAlts _tycon inst_tys imposs_cons alts
- = (imposs_deflt_cons, addDefault trimmed_alts maybe_deflt)
+ = imposs_deflt_cons `seqList`
+ (imposs_deflt_cons, addDefault trimmed_alts maybe_deflt)
+ -- Very important to force `imposs_deflt_cons` as that forces `alt_cons`, which
+ -- is essentially as retaining `alts_wo_default` or any `Alt b` for that matter
+ -- leads to a huge space leak (see #22102 and !8896)
where
(alts_wo_default, maybe_deflt) = findDefault alts
alt_cons = [con | Alt con _ _ <- alts_wo_default]
=====================================
compiler/GHC/Iface/Tidy.hs
=====================================
@@ -24,16 +24,17 @@ import GHC.Tc.Utils.Env
import GHC.Core
import GHC.Core.Unfold
-import GHC.Core.Unfold.Make
+-- import GHC.Core.Unfold.Make
import GHC.Core.FVs
import GHC.Core.Tidy
-import GHC.Core.Seq (seqBinds)
+import GHC.Core.Seq ( seqBinds )
import GHC.Core.Opt.Arity ( exprArity, typeArity, exprBotStrictness_maybe )
import GHC.Core.InstEnv
import GHC.Core.Type ( Type, tidyTopType )
import GHC.Core.DataCon
import GHC.Core.TyCon
import GHC.Core.Class
+import GHC.Core.Opt.OccurAnal ( occurAnalyseExpr )
import GHC.Iface.Tidy.StaticPtrTable
import GHC.Iface.Env
@@ -383,8 +384,7 @@ tidyProgram opts (ModGuts { mg_module = mod
(unfold_env, tidy_occ_env) <- chooseExternalIds opts mod binds implicit_binds imp_rules
let (trimmed_binds, trimmed_rules) = findExternalRules opts binds imp_rules unfold_env
- let uf_opts = opt_unfolding_opts opts
- (tidy_env, tidy_binds) <- tidyTopBinds uf_opts unfold_env boot_exports tidy_occ_env trimmed_binds
+ (tidy_env, tidy_binds) <- tidyTopBinds unfold_env boot_exports tidy_occ_env trimmed_binds
-- See Note [Grand plan for static forms] in GHC.Iface.Tidy.StaticPtrTable.
(spt_entries, mcstub, tidy_binds') <- case opt_static_ptr_opts opts of
@@ -1152,60 +1152,49 @@ tidyTopName mod name_cache maybe_ref occ_env id
--
-- * subst_env: A Var->Var mapping that substitutes the new Var for the old
-tidyTopBinds :: UnfoldingOpts
- -> UnfoldEnv
+tidyTopBinds :: UnfoldEnv
-> NameSet
-> TidyOccEnv
-> CoreProgram
-> IO (TidyEnv, CoreProgram)
-tidyTopBinds uf_opts unfold_env boot_exports init_occ_env binds
+tidyTopBinds unfold_env boot_exports init_occ_env binds
= do let result = tidy init_env binds
seqBinds (snd result) `seq` return result
-- This seqBinds avoids a spike in space usage (see #13564)
where
init_env = (init_occ_env, emptyVarEnv)
- tidy = mapAccumL (tidyTopBind uf_opts unfold_env boot_exports)
+ tidy = mapAccumL (tidyTopBind unfold_env boot_exports)
------------------------
-tidyTopBind :: UnfoldingOpts
- -> UnfoldEnv
+tidyTopBind :: UnfoldEnv
-> NameSet
-> TidyEnv
-> CoreBind
-> (TidyEnv, CoreBind)
-tidyTopBind uf_opts unfold_env boot_exports
+tidyTopBind unfold_env boot_exports
(occ_env,subst1) (NonRec bndr rhs)
= (tidy_env2, NonRec bndr' rhs')
where
- Just (name',show_unfold) = lookupVarEnv unfold_env bndr
- (bndr', rhs') = tidyTopPair uf_opts show_unfold boot_exports tidy_env2 name' (bndr, rhs)
+ (bndr', rhs') = tidyTopPair unfold_env boot_exports tidy_env2 (bndr, rhs)
subst2 = extendVarEnv subst1 bndr bndr'
tidy_env2 = (occ_env, subst2)
-tidyTopBind uf_opts unfold_env boot_exports (occ_env, subst1) (Rec prs)
+tidyTopBind unfold_env boot_exports (occ_env, subst1) (Rec prs)
= (tidy_env2, Rec prs')
where
- prs' = [ tidyTopPair uf_opts show_unfold boot_exports tidy_env2 name' (id,rhs)
- | (id,rhs) <- prs,
- let (name',show_unfold) =
- expectJust "tidyTopBind" $ lookupVarEnv unfold_env id
- ]
-
- subst2 = extendVarEnvList subst1 (bndrs `zip` map fst prs')
+ prs' = map (tidyTopPair unfold_env boot_exports tidy_env2) prs
+ subst2 = extendVarEnvList subst1 (map fst prs `zip` map fst prs')
tidy_env2 = (occ_env, subst2)
-
- bndrs = map fst prs
+ -- This is where we "tie the knot": tidy_env2 is fed into tidyTopPair
-----------------------------------------------------------
-tidyTopPair :: UnfoldingOpts
- -> Bool -- show unfolding
+tidyTopPair :: UnfoldEnv
-> NameSet
-> TidyEnv -- The TidyEnv is used to tidy the IdInfo
-- It is knot-tied: don't look at it!
- -> Name -- New name
-> (Id, CoreExpr) -- Binder and RHS before tidying
-> (Id, CoreExpr)
-- This function is the heart of Step 2
@@ -1214,17 +1203,18 @@ tidyTopPair :: UnfoldingOpts
-- group, a variable late in the group might be mentioned
-- in the IdInfo of one early in the group
-tidyTopPair uf_opts show_unfold boot_exports rhs_tidy_env name' (bndr, rhs)
+tidyTopPair unfold_env boot_exports rhs_tidy_env (bndr, rhs)
= -- pprTrace "tidyTop" (ppr name' <+> ppr details <+> ppr rhs) $
(bndr1, rhs1)
where
+ Just (name',show_unfold) = lookupVarEnv unfold_env bndr
!cbv_bndr = tidyCbvInfoTop boot_exports bndr rhs
bndr1 = mkGlobalId details name' ty' idinfo'
details = idDetails cbv_bndr -- Preserve the IdDetails
ty' = tidyTopType (idType cbv_bndr)
rhs1 = tidyExpr rhs_tidy_env rhs
- idinfo' = tidyTopIdInfo uf_opts rhs_tidy_env name' ty'
+ idinfo' = tidyTopIdInfo rhs_tidy_env name' ty'
rhs rhs1 (idInfo cbv_bndr) show_unfold
-- tidyTopIdInfo creates the final IdInfo for top-level
@@ -1234,9 +1224,9 @@ tidyTopPair uf_opts show_unfold boot_exports rhs_tidy_env name' (bndr, rhs)
-- Indeed, CorePrep must eta expand where necessary to make
-- the manifest arity equal to the claimed arity.
--
-tidyTopIdInfo :: UnfoldingOpts -> TidyEnv -> Name -> Type
+tidyTopIdInfo :: TidyEnv -> Name -> Type
-> CoreExpr -> CoreExpr -> IdInfo -> Bool -> IdInfo
-tidyTopIdInfo uf_opts rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unfold
+tidyTopIdInfo rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unfold
| not is_external -- For internal Ids (not externally visible)
= vanillaIdInfo -- we only need enough info for code generation
-- Arity and strictness info are enough;
@@ -1292,31 +1282,20 @@ tidyTopIdInfo uf_opts rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unf
Just (arity, _, _) -> not (isDeadEndAppSig id_sig arity)
--------- Unfolding ------------
+ -- Force unfold_info (hence bangs), otherwise the old unfolding
+ -- is retained during code generation. See #22071
+
unf_info = realUnfoldingInfo idinfo
- -- Force this, otherwise the old unfolding is retained over code generation
- -- See #22071
- !unfold_info
- | isCompulsoryUnfolding unf_info || show_unfold
- = tidyUnfolding rhs_tidy_env unf_info unf_from_rhs
- | otherwise
- = minimal_unfold_info
!minimal_unfold_info = trimUnfolding unf_info
- unf_from_rhs = mkFinalUnfolding uf_opts InlineRhs final_sig tidy_rhs
- -- NB: do *not* expose the worker if show_unfold is off,
- -- because that means this thing is a loop breaker or
- -- marked NOINLINE or something like that
- -- This is important: if you expose the worker for a loop-breaker
- -- then you can make the simplifier go into an infinite loop, because
- -- in effect the unfolding is exposed. See #1709
- --
- -- You might think that if show_unfold is False, then the thing should
- -- not be w/w'd in the first place. But a legitimate reason is this:
- -- the function returns bottom
- -- In this case, show_unfold will be false (we don't expose unfoldings
- -- for bottoming functions), but we might still have a worker/wrapper
- -- split (see Note [Worker/wrapper for bottoming functions] in
- -- GHC.Core.Opt.WorkWrap)
+ !unfold_info | isCompulsoryUnfolding unf_info || show_unfold
+ = tidyTopUnfolding rhs_tidy_env tidy_rhs unf_info
+ | otherwise
+ = minimal_unfold_info
+-- unf_from_rhs = mkFinalUnfolding uf_opts InlineRhs final_sig orig_rhs
+ -- NB: use `orig_rhs` not `tidy_rhs` in this call to mkFinalUnfolding
+ -- else you get a black hole (#22122). Reason: mkFinalUnfolding
+ -- looks at IdInfo, and that is knot-tied in tidyTopBind (the Rec case)
--------- Arity ------------
-- Usually the Id will have an accurate arity on it, because
@@ -1328,10 +1307,59 @@ tidyTopIdInfo uf_opts rhs_tidy_env name rhs_ty orig_rhs tidy_rhs idinfo show_unf
arity = exprArity orig_rhs `min` typeArity rhs_ty
-- orig_rhs: using tidy_rhs would make a black hole, since
-- exprArity uses the arities of Ids inside the rhs
+ --
-- typeArity: see Note [Arity invariants for bindings]
-- in GHC.Core.Opt.Arity
-{-
+------------ Unfolding --------------
+tidyTopUnfolding :: TidyEnv -> CoreExpr -> Unfolding -> Unfolding
+tidyTopUnfolding _ _ NoUnfolding = NoUnfolding
+tidyTopUnfolding _ _ BootUnfolding = BootUnfolding
+tidyTopUnfolding _ _ (OtherCon {}) = evaldUnfolding
+
+tidyTopUnfolding tidy_env _ df@(DFunUnfolding { df_bndrs = bndrs, df_args = args })
+ = df { df_bndrs = bndrs', df_args = map (tidyExpr tidy_env') args }
+ where
+ (tidy_env', bndrs') = tidyBndrs tidy_env bndrs
+
+tidyTopUnfolding tidy_env tidy_rhs
+ unf@(CoreUnfolding { uf_tmpl = unf_rhs, uf_src = src })
+ = -- See Note [tidyTopUnfolding: avoiding black holes]
+ unf { uf_tmpl = tidy_unf_rhs }
+ where
+ tidy_unf_rhs | isStableSource src
+ = tidyExpr tidy_env unf_rhs -- Preserves OccInfo in unf_rhs
+ | otherwise
+ = occurAnalyseExpr tidy_rhs -- Do occ-anal
+
+{- Note [tidyTopUnfolding: avoiding black holes]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When we are exposing all unfoldings we don't want to tidy the unfolding
+twice -- we just want to use the tidied RHS. That tidied RHS itself
+contains fully-tidied Ids -- it is knot-tied. So the uf_tmpl for the
+unfolding contains stuff we can't look at. Now consider (#22112)
+ foo = foo
+If we freshly compute the uf_is_value field for foo's unfolding,
+we'll call `exprIsValue`, which will look at foo's unfolding!
+Whether or not the RHS is a value depends on whether foo is a value...
+black hole.
+
+In the Simplifier we deal with this by not giving `foo` an unfolding
+in its own RHS. And we could do that here. But it's qite nice
+to common everything up to a single Id for foo, used everywhere.
+
+And it's not too hard: simply leave the unfolding undisturbed, except
+tidy the uf_tmpl field. Hence tidyTopUnfolding does
+ unf { uf_tmpl = tidy_unf_rhs }
+
+Don't mess with uf_is_value, or guidance; in particular don't recompute
+them from tidy_unf_rhs.
+
+And (unlike tidyNestedUnfolding) don't deep-seq the new unfolding,
+because that'll cause a black hole (I /think/ because occurAnalyseExpr
+looks in IdInfo).
+
+
************************************************************************
* *
Old, dead, type-trimming code
=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -816,7 +816,7 @@ zapFragileUnfolding unf
trimUnfolding :: Unfolding -> Unfolding
-- Squash all unfolding info, preserving only evaluated-ness
trimUnfolding unf | isEvaldUnfolding unf = evaldUnfolding
- | otherwise = noUnfolding
+ | otherwise = noUnfolding
zapTailCallInfo :: IdInfo -> Maybe IdInfo
zapTailCallInfo info
=====================================
testsuite/tests/simplCore/should_compile/T21763.hs
=====================================
@@ -0,0 +1,24 @@
+{-# LANGUAGE MagicHash #-}
+module T21763 where
+
+import GHC.Exts
+
+-- We should get ONE SpecConstr-generated rule, for f2,
+-- not one for f1 and one for f2
+
+f1 :: Int -> [Int] -> (Int, [Int])
+-- This one only seq's x, so SpecConstr should not specialise it
+f1 x [] = (x, x `seq` [])
+f1 x (_:ys) = f1 x ys
+
+
+f2 :: Int -> [Int] -> (Int, [Int])
+-- This one takes x apart, so SpecConstr should specialise it
+f2 x [] = (x+1, x `seq` [])
+f2 x (_:ys) = f2 x ys
+
+foo1 :: [Int] -> (Int, [Int])
+foo1 ys = f1 9 ys
+
+foo2 :: [Int] -> (Int, [Int])
+foo2 ys = f2 9 ys
=====================================
testsuite/tests/simplCore/should_compile/T21763.stderr
=====================================
@@ -0,0 +1,5 @@
+
+==================== Tidy Core rules ====================
+"SC:$wf20" [2] forall (sc :: Int#). $wf2 (I# sc) = f2_$s$wf2 sc
+
+
=====================================
testsuite/tests/simplCore/should_compile/T21763a.hs
=====================================
@@ -0,0 +1,12 @@
+module T21763a where
+
+{-# NOINLINE g_imp #-}
+g_imp !x = not x
+
+f3 :: (Bool -> Bool) -> Bool -> [Bool] -> (Bool, [Bool])
+-- We want to specialize for `g` to turn it into a known call.
+f3 g x [] = (g x, [])
+f3 g x (_:ys) = f3 g x ys
+
+foo3 :: [Bool] -> (Bool, [Bool])
+foo3 ys = f3 g_imp True ys
=====================================
testsuite/tests/simplCore/should_compile/T21763a.stderr
=====================================
@@ -0,0 +1,5 @@
+
+==================== Tidy Core rules ====================
+"SC:$wf30" [2] forall. $wf3 g_imp = f3_$s$wf3
+
+
=====================================
testsuite/tests/simplCore/should_compile/T22112.hs
=====================================
@@ -0,0 +1,7 @@
+{-# LANGUAGE NoImplicitPrelude #-}
+module Rec where
+
+-- This one created a black hole in Tidy,
+-- when creating the tidied unfolding for foo
+foo :: () -> ()
+foo = foo
=====================================
testsuite/tests/simplCore/should_compile/T22112.stderr
=====================================
@@ -0,0 +1,14 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+ = {terms: 2, types: 2, coercions: 0, joins: 0/0}
+
+Rec {
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+foo [Occ=LoopBreaker] :: () -> ()
+[GblId, Str=b, Cpr=b]
+foo = foo
+end Rec }
+
+
+
=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -415,6 +415,7 @@ test('T17966', [ grep_errmsg(r'SPEC') ], compile, ['-O -ddump-spec'])
# We expect to see a SPEC rule for $cm
test('T19644', [ grep_errmsg(r'SPEC') ], compile, ['-O -ddump-spec'])
test('T21391', normal, compile, ['-O -dcore-lint'])
+test('T22112', normal, compile, ['-O -dsuppress-uniques -dno-typeable-binds -ddump-simpl'])
test('T21391a', normal, compile, ['-O -dcore-lint'])
# We don't want to see a thunk allocation for the insertBy expression after CorePrep.
test('T21392', [ grep_errmsg(r'sat.* :: \[\(.*Unique, .*Int\)\]'), expect_broken(21392) ], compile, ['-O -ddump-prep -dno-typeable-binds -dsuppress-uniques'])
@@ -424,3 +425,5 @@ test('T21848', [grep_errmsg(r'SPEC wombat') ], compile, ['-O -ddump-spec'])
test('T21694b', [grep_errmsg(r'Arity=4') ], compile, ['-O -ddump-simpl'])
test('T21960', [grep_errmsg(r'^ Arity=5') ], compile, ['-O2 -ddump-simpl'])
test('T21948', [grep_errmsg(r'^ Arity=5') ], compile, ['-O -ddump-simpl'])
+test('T21763', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
+test('T21763a', only_ways(['optasm']), compile, ['-O2 -ddump-rules'])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/9f6c96ea203d889d3f399d1d8d01fd6023dd7512...acb617f23bac29376a86318539fbbb1c264b988b
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/9f6c96ea203d889d3f399d1d8d01fd6023dd7512...acb617f23bac29376a86318539fbbb1c264b988b
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20220826/a0f494c7/attachment-0001.html>
More information about the ghc-commits
mailing list