[Git][ghc/ghc][wip/simplifier-tweaks] 6 commits: Several improvements to the handling of coercions
Simon Peyton Jones (@simonpj)
gitlab at gitlab.haskell.org
Wed Jul 12 16:43:08 UTC 2023
Simon Peyton Jones pushed to branch wip/simplifier-tweaks at Glasgow Haskell Compiler / GHC
Commits:
e04b7b3a by Simon Peyton Jones at 2023-07-12T13:17:17+01:00
Several improvements to the handling of coercions
* Make `mkSymCo` and `mkInstCo` smarter
Fixes #23642
* Fix return role of `SelCo` in the coercion optimiser.
Fixes #23617
* Make the coercion optimiser `opt_trans_rule` work better for newtypes
Fixes #23619
- - - - -
e79e49da by Simon Peyton Jones at 2023-07-12T14:29:14+01:00
Simplifier improvements
This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do. But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.
The net result is good: a 2% improvement in compile time. The table
below shows changes over 1%.
The main changes are:
* The SimplEnv now has a seInlineDepth field, which says how deep
in unfoldings we are. See Note [Inline depth] in Simplify.Env
* Avoid repeatedly simplifying coercions.
see Note [Avoid re-simplifying coercions] in Simplify.Iteration
As you'll see from the Note, this makes use of the seInlineDepth.
* Allow Simplify.Utils.postInlineUnconditionally to inline variables
that are used exactly once. See Note [Post-inline for single-use things].
* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
This is another part of Note [Post-inline for single-use things], and
is really good for reducing simplifier iterations in situations like
case K e of { K x -> blah }
wher x is used once in blah.
* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
elimination. Note [Case elim in exprIsConApp_maybe]
* When making join points, don't do so if the join point is so small
it will immediately be inlined. See Note [Duplicating alternatives]
* Do not add an unfolding to a join point at birth. This is a tricky one
and has a long Note [Do not add unfoldings to join points at birth]
It shows up in two places
- In `mkDupableAlt` do not add an inlining
- (trickier) In `simplLetUnfolding` don't add an unfolding for a
fresh join point
I am not fully satisifed with this, but it works and is well documented.
* Many new or rewritten Notes. E.g. Note [Avoiding simplifying repeatedly]
I discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very
delicately balanced. It's a small, heavily used, overloaded function
and it's important that it inlines. By a fluke it was before, but at
various times in my journey it stopped doing so. So I added an INLINE
pragma to it.
Metrics: compile_time/bytes allocated
------------------------------------------------
CoOpt_Singletons(normal) -4.3% GOOD
LargeRecord(normal) -23.3% GOOD
PmSeriesS(normal) -2.4%
T11195(normal) -1.7%
T12227(normal) -20.0% GOOD
T12545(normal) -5.4%
T13253-spj(normal) -50.7% GOOD
T13386(normal) -5.1% GOOD
T14766(normal) -2.4% GOOD
T15164(normal) -1.7%
T15304(normal) +1.0%
T15630(normal) -7.7%
T15630a(normal) NEW
T15703(normal) -7.5% GOOD
T16577(normal) -5.1% GOOD
T17516(normal) -3.6%
T18223(normal) -16.8% GOOD
T18282(normal) -1.5%
T18304(normal) +1.9%
T21839c(normal) -3.5% GOOD
T3064(normal) -1.5%
T5030(normal) -16.2% GOOD
T5321Fun(normal) -1.6%
T6048(optasm) -2.1% GOOD
T8095(normal) -6.1% GOOD
T9630(normal) -5.1% GOOD
WWRec(normal) -1.6%
geo. mean -2.1%
minimum -50.7%
maximum +1.9%
Metric Decrease:
CoOpt_Singletons
LargeRecord
T12227
T13253-spj
T13386
T14766
T15703
T16577
T18223
T21839c
T5030
T6048
T8095
T9630
- - - - -
f32ad188 by Simon Peyton Jones at 2023-07-12T17:35:45+01:00
No postInlineUnconditionally for strict bindings
Does not save allocation!
- - - - -
c9b06dfa by Simon Peyton Jones at 2023-07-12T17:37:05+01:00
No preInlineConditionally for join points
Does not save allocation!
- - - - -
90d96f33 by Simon Peyton Jones at 2023-07-12T17:37:31+01:00
Don't use Plan A for a case continuation
See carryPropagate in digits-of-e2
Really I'm moving more towards Plan B.
- - - - -
be385269 by Simon Peyton Jones at 2023-07-12T17:39:07+01:00
Half way attempt at inlining join points
My idea here is to be more parsimonious about inlining join
points. I was thinking that even
join j x = I# x in
case v of
p1 -> j x1
p2 -> j x2
...
might not inline. Better for consumers.
Also don't inline even in FinalPhase beause we want importing
modules to see this.
- - - - -
21 changed files:
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Inline.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Monad.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/TyCo/Rep.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unfold/Make.hs
- compiler/GHC/Core/Utils.hs
- compiler/GHC/HsToCore/Pmc/Solver/Types.hs
- compiler/GHC/Iface/Type.hs
- compiler/GHC/IfaceToCore.hs
- testsuite/tests/perf/compiler/T15630.hs
- + testsuite/tests/perf/compiler/T15630a.hs
- testsuite/tests/simplCore/should_compile/T18730.hs → testsuite/tests/perf/compiler/T18730.hs
- testsuite/tests/simplCore/should_compile/T18730_A.hs → testsuite/tests/perf/compiler/T18730_A.hs
- testsuite/tests/perf/compiler/all.T
- testsuite/tests/simplCore/should_compile/all.T
Changes:
=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -37,7 +37,7 @@ module GHC.Core.Coercion (
mkAxInstLHS, mkUnbranchedAxInstLHS,
mkPiCo, mkPiCos, mkCoCast,
mkSymCo, mkTransCo,
- mkSelCo, getNthFun, getNthFromType, mkLRCo,
+ mkSelCo, mkSelCoResRole, getNthFun, getNthFromType, mkLRCo,
mkInstCo, mkAppCo, mkAppCos, mkTyConAppCo,
mkFunCo, mkFunCo2, mkFunCoNoFTF, mkFunResCo,
mkNakedFunCo,
@@ -556,20 +556,28 @@ splitFunCo_maybe (FunCo { fco_arg = arg, fco_res = res }) = Just (arg, res)
splitFunCo_maybe _ = Nothing
splitForAllCo_maybe :: Coercion -> Maybe (TyCoVar, Coercion, Coercion)
-splitForAllCo_maybe (ForAllCo tv k_co co) = Just (tv, k_co, co)
-splitForAllCo_maybe _ = Nothing
+splitForAllCo_maybe (ForAllCo tv k_co co)
+ = Just (tv, k_co, co)
+splitForAllCo_maybe co
+ | Just (ty, r) <- isReflCo_maybe co
+ , Just (tcv, body_ty) <- splitForAllTyCoVar_maybe ty
+ = Just (tcv, mkNomReflCo (varType tcv), mkReflCo r body_ty)
+splitForAllCo_maybe _
+ = Nothing
-- | Like 'splitForAllCo_maybe', but only returns Just for tyvar binder
splitForAllCo_ty_maybe :: Coercion -> Maybe (TyVar, Coercion, Coercion)
-splitForAllCo_ty_maybe (ForAllCo tv k_co co)
- | isTyVar tv = Just (tv, k_co, co)
-splitForAllCo_ty_maybe _ = Nothing
+splitForAllCo_ty_maybe co
+ | Just stuff@(tv,_,_) <- splitForAllCo_maybe co
+ , isTyVar tv = Just stuff
+ | otherwise = Nothing
-- | Like 'splitForAllCo_maybe', but only returns Just for covar binder
splitForAllCo_co_maybe :: Coercion -> Maybe (CoVar, Coercion, Coercion)
-splitForAllCo_co_maybe (ForAllCo cv k_co co)
- | isCoVar cv = Just (cv, k_co, co)
-splitForAllCo_co_maybe _ = Nothing
+splitForAllCo_co_maybe co
+ | Just stuff@(cv,_,_) <- splitForAllCo_maybe co
+ , isCoVar cv = Just stuff
+ | otherwise = Nothing
-------------------------------------------------------
-- and some coercion kind stuff
@@ -1126,12 +1134,17 @@ mkUnivCo prov role ty1 ty2
-- a kind of @t1 ~ t2@ becomes the kind @t2 ~ t1 at .
mkSymCo :: Coercion -> Coercion
--- Do a few simple optimizations, but don't bother pushing occurrences
--- of symmetry to the leaves; the optimizer will take care of that.
-mkSymCo co | isReflCo co = co
-mkSymCo (SymCo co) = co
-mkSymCo (SubCo (SymCo co)) = SubCo co
-mkSymCo co = SymCo co
+-- Do a few simple optimizations, mainly to expose the underlying
+-- constructors to other 'mk' functions. E.g.
+-- mkInstCo (mkSymCo (ForAllCo ...)) ty
+-- We want to push the SymCo inside the ForallCo, so that we can instantiate
+-- This can make a big difference. E.g without coercion optimisation, GHC.Read
+-- totally explodes; but when we push Sym inside ForAll, it's fine.
+mkSymCo co | isReflCo co = co
+mkSymCo (SymCo co) = co
+mkSymCo (SubCo (SymCo co)) = SubCo co
+mkSymCo (ForAllCo tcv kco co) = ForAllCo tcv (mkSymCo kco) (mkSymCo co)
+mkSymCo co = SymCo co
-- | Create a new 'Coercion' by composing the two given 'Coercion's transitively.
-- (co1 ; co2)
@@ -1142,6 +1155,7 @@ mkTransCo (GRefl r t1 (MCo co1)) (GRefl _ _ (MCo co2))
= GRefl r t1 (MCo $ mkTransCo co1 co2)
mkTransCo co1 co2 = TransCo co1 co2
+--------------------
mkSelCo :: HasDebugCallStack
=> CoSel
-> Coercion
@@ -1161,7 +1175,7 @@ mkSelCo_maybe cs co
go cs co
| Just (ty, r) <- isReflCo_maybe co
- = Just (mkReflCo r (getNthFromType cs ty))
+ = Just (mkReflCo (mkSelCoResRole cs r) (getNthFromType cs ty))
go SelForAll (ForAllCo _ kind_co _)
= Just kind_co
@@ -1212,6 +1226,14 @@ mkSelCo_maybe cs co
good_call _ = False
+mkSelCoResRole :: CoSel -> Role -> Role
+-- What is the role of (SelCo cs co), if co has role 'r'?
+-- It is not just 'r'!
+-- c.f. the SelCo case of coercionRole
+mkSelCoResRole SelForAll _ = Nominal
+mkSelCoResRole (SelTyCon _ r') _ = r'
+mkSelCoResRole (SelFun fs) r = funRole r fs
+
-- | Extract the nth field of a FunCo
getNthFun :: FunSel
-> a -- ^ multiplicity
@@ -1222,6 +1244,24 @@ getNthFun SelMult mult _ _ = mult
getNthFun SelArg _ arg _ = arg
getNthFun SelRes _ _ res = res
+getNthFromType :: HasDebugCallStack => CoSel -> Type -> Type
+getNthFromType (SelFun fs) ty
+ | Just (_af, mult, arg, res) <- splitFunTy_maybe ty
+ = getNthFun fs mult arg res
+
+getNthFromType (SelTyCon n _) ty
+ | Just args <- tyConAppArgs_maybe ty
+ = assertPpr (args `lengthExceeds` n) (ppr n $$ ppr ty) $
+ args `getNth` n
+
+getNthFromType SelForAll ty -- Works for both tyvar and covar
+ | Just (tv,_) <- splitForAllTyCoVar_maybe ty
+ = tyVarKind tv
+
+getNthFromType cs ty
+ = pprPanic "getNthFromType" (ppr cs $$ ppr ty)
+
+--------------------
mkLRCo :: LeftOrRight -> Coercion -> Coercion
mkLRCo lr co
| Just (ty, eq) <- isReflCo_maybe co
@@ -1230,11 +1270,14 @@ mkLRCo lr co
= LRCo lr co
-- | Instantiates a 'Coercion'.
+-- Works for both tyvar and covar
mkInstCo :: Coercion -> CoercionN -> Coercion
-mkInstCo (ForAllCo tcv _kind_co body_co) co
- | Just (arg, _) <- isReflCo_maybe co
- -- works for both tyvar and covar
- = substCoUnchecked (zipTCvSubst [tcv] [arg]) body_co
+mkInstCo co_fun co_arg
+ | Just (tcv, kind_co, body_co) <- splitForAllCo_maybe co_fun
+ , Just (arg, _) <- isReflCo_maybe co_arg
+ = assertPpr (isReflexiveCo kind_co) (ppr co_fun $$ ppr co_arg) $
+ -- If the arg is Refl, then kind_co must be reflexive too
+ substCoUnchecked (zipTCvSubst [tcv] [arg]) body_co
mkInstCo co arg = InstCo co arg
-- | Given @ty :: k1@, @co :: k1 ~ k2@,
@@ -2433,23 +2476,6 @@ coercionLKind co
go_app (InstCo co arg) args = go_app co (go arg:args)
go_app co args = piResultTys (go co) args
-getNthFromType :: HasDebugCallStack => CoSel -> Type -> Type
-getNthFromType (SelFun fs) ty
- | Just (_af, mult, arg, res) <- splitFunTy_maybe ty
- = getNthFun fs mult arg res
-
-getNthFromType (SelTyCon n _) ty
- | Just args <- tyConAppArgs_maybe ty
- = assertPpr (args `lengthExceeds` n) (ppr n $$ ppr ty) $
- args `getNth` n
-
-getNthFromType SelForAll ty -- Works for both tyvar and covar
- | Just (tv,_) <- splitForAllTyCoVar_maybe ty
- = tyVarKind tv
-
-getNthFromType cs ty
- = pprPanic "getNthFromType" (ppr cs $$ ppr ty)
-
coercionRKind :: Coercion -> Type
coercionRKind co
= go co
=====================================
compiler/GHC/Core/Coercion/Opt.hs
=====================================
@@ -23,7 +23,7 @@ import GHC.Core.Unify
import GHC.Types.Var.Set
import GHC.Types.Var.Env
-import GHC.Types.Unique.Set
+-- import GHC.Types.Unique.Set
import GHC.Data.Pair
import GHC.Data.List.SetOps ( getNth )
@@ -132,45 +132,52 @@ optCoercion :: OptCoercionOpts -> Subst -> Coercion -> NormalCo
optCoercion opts env co
| optCoercionEnabled opts
= optCoercion' env co
+
{-
- = pprTrace "optCoercion {" (text "Co:" <+> ppr co) $
+ = pprTrace "optCoercion {" (text "Co:" <> ppr (coercionSize co)) $
let result = optCoercion' env co in
- pprTrace "optCoercion }" (vcat [ text "Co:" <+> ppr co
- , text "Optco:" <+> ppr result ]) $
+ pprTrace "optCoercion }"
+ (vcat [ text "Co:" <+> ppr (coercionSize co)
+ , text "Optco:" <+> ppWhen (isReflCo result) (text "(refl)")
+ <+> ppr (coercionSize result) ]) $
result
-}
| otherwise
= substCo env co
-
optCoercion' :: Subst -> Coercion -> NormalCo
optCoercion' env co
| debugIsOn
= let out_co = opt_co1 lc False co
(Pair in_ty1 in_ty2, in_role) = coercionKindRole co
(Pair out_ty1 out_ty2, out_role) = coercionKindRole out_co
+
+ details = vcat [ text "in_co:" <+> ppr co
+ , text "in_ty1:" <+> ppr in_ty1
+ , text "in_ty2:" <+> ppr in_ty2
+ , text "out_co:" <+> ppr out_co
+ , text "out_ty1:" <+> ppr out_ty1
+ , text "out_ty2:" <+> ppr out_ty2
+ , text "in_role:" <+> ppr in_role
+ , text "out_role:" <+> ppr out_role
+-- , vcat $ map ppr_one $ nonDetEltsUniqSet $ coVarsOfCo co
+-- , text "subst:" <+> ppr env
+ ]
in
+ warnPprTrace (not (isReflCo out_co) && isReflexiveCo out_co)
+ "optCoercion: reflexive but not refl" details $
assertPpr (substTyUnchecked env in_ty1 `eqType` out_ty1 &&
substTyUnchecked env in_ty2 `eqType` out_ty2 &&
in_role == out_role)
- (hang (text "optCoercion changed types!")
- 2 (vcat [ text "in_co:" <+> ppr co
- , text "in_ty1:" <+> ppr in_ty1
- , text "in_ty2:" <+> ppr in_ty2
- , text "out_co:" <+> ppr out_co
- , text "out_ty1:" <+> ppr out_ty1
- , text "out_ty2:" <+> ppr out_ty2
- , text "in_role:" <+> ppr in_role
- , text "out_role:" <+> ppr out_role
- , vcat $ map ppr_one $ nonDetEltsUniqSet $ coVarsOfCo co
- , text "subst:" <+> ppr env ]))
- out_co
-
- | otherwise = opt_co1 lc False co
+ (hang (text "optCoercion changed types!") 2 details) $
+ out_co
+
+ | otherwise
+ = opt_co1 lc False co
where
lc = mkSubstLiftingContext env
- ppr_one cv = ppr cv <+> dcolon <+> ppr (coVarKind cv)
+-- ppr_one cv = ppr cv <+> dcolon <+> ppr (coVarKind cv)
type NormalCo = Coercion
@@ -215,23 +222,37 @@ opt_co3 env sym _ r co = opt_co4_wrap env sym False r co
-- | Optimize a non-phantom coercion.
opt_co4, opt_co4_wrap :: LiftingContext -> SymFlag -> ReprFlag
-> Role -> Coercion -> NormalCo
--- Precondition: In every call (opt_co4 lc sym rep role co)
--- we should have role = coercionRole co
+-- Precondition: In every call (opt_co4 lc sym rep role co)
+-- we should have role = coercionRole co
+-- Postcondition: The resulting coercion is equivalant to
+-- wrapsub (wrapsym (mksub co)
+-- where wrapsym is SymCo if sym=True
+-- wrapsub is SubCo if rep=True
+
+-- opt_co4_wrap is there just to support tracing, when debugging
+-- Usually it just goes straight to opt_co4
opt_co4_wrap = opt_co4
{-
opt_co4_wrap env sym rep r co
= pprTrace "opt_co4_wrap {"
- ( vcat [ text "Sym:" <+> ppr sym
- , text "Rep:" <+> ppr rep
- , text "Role:" <+> ppr r
- , text "Co:" <+> ppr co ]) $
- assert (r == coercionRole co ) $
- let result = opt_co4 env sym rep r co in
- pprTrace "opt_co4_wrap }" (ppr co $$ text "---" $$ ppr result) $
- result
--}
+ ( vcat [ text "Sym:" <+> ppr sym
+ , text "Rep:" <+> ppr rep
+ , text "Role:" <+> ppr r
+ , text "Co:" <+> ppr co ]) $
+ assert (r == coercionRole co ) $
+ let result = opt_co4 env sym rep r co in
+ pprTrace "opt_co4_wrap }" (ppr co $$ text "---" $$ ppr result) $
+ assertPpr (res_role == coercionRole result)
+ (vcat [ text "Role:" <+> ppr r
+ , text "Result: " <+> ppr result
+ , text "Result type:" <+> ppr (coercionType result) ]) $
+ result
+ where
+ res_role | rep = Representational
+ | otherwise = r
+-}
opt_co4 env _ rep r (Refl ty)
= assertPpr (r == Nominal)
@@ -379,11 +400,17 @@ opt_co4 env sym rep _ (SelCo SelForAll (ForAllCo _ eta _))
-- works for both tyvar and covar
= opt_co4_wrap env sym rep Nominal eta
-opt_co4 env sym rep r (SelCo n co)
- | Just nth_co <- case (co', n) of
- (TyConAppCo _ _ cos, SelTyCon n _) -> Just (cos `getNth` n)
- (FunCo _ _ _ w co1 co2, SelFun fs) -> Just (getNthFun fs w co1 co2)
- (ForAllCo _ eta _, SelForAll) -> Just eta
+-- So the /input/ coercion isn't ForAllCo or Refl;
+-- instead look at the /output/ coercion
+opt_co4 env sym rep r (SelCo cs co)
+ | Just (ty, co_role) <- isReflCo_maybe co'
+ = mkReflCo (chooseRole rep (mkSelCoResRole cs co_role))
+ (getNthFromType cs ty)
+
+ | Just nth_co <- case (co', cs) of
+ (TyConAppCo _ _ cos, SelTyCon n _) -> Just (cos `getNth` n)
+ (FunCo _ _ _ w co1 co2, SelFun fs) -> Just (getNthFun fs w co1 co2)
+ (ForAllCo _ eta _, SelForAll) -> Just eta
_ -> Nothing
= if rep && (r == Nominal)
-- keep propagating the SubCo
@@ -391,7 +418,7 @@ opt_co4 env sym rep r (SelCo n co)
else nth_co
| otherwise
- = wrapRole rep r $ SelCo n co'
+ = wrapRole rep r $ SelCo cs co'
where
co' = opt_co1 env sym co
@@ -586,7 +613,6 @@ opt_univ env sym prov role oty1 oty2
(env', tv1', eta') = optForAllCoBndr env sym tv1 eta
in
mkForAllCo tv1' eta' (opt_univ env' sym prov' role ty1 ty2')
-
| Just (cv1, ty1) <- splitForAllCoVar_maybe oty1
, Just (cv2, ty2) <- splitForAllCoVar_maybe oty2
-- NB: prov isn't interesting here either
@@ -628,8 +654,25 @@ opt_transList :: HasDebugCallStack => InScopeSet -> [NormalCo] -> [NormalCo] ->
opt_transList is = zipWithEqual "opt_transList" (opt_trans is)
-- The input lists must have identical length.
-opt_trans :: InScopeSet -> NormalCo -> NormalCo -> NormalCo
+opt_trans, opt_trans' :: InScopeSet -> NormalCo -> NormalCo -> NormalCo
+
+-- opt_trans just allows us to add some debug tracing
+-- Usually it just goes to opt_trans'
+opt_trans is co1 co2 = opt_trans' is co1 co2
+
+{-
opt_trans is co1 co2
+ = assertPpr (r1==r2) (vcat [ ppr r1 <+> ppr co1, ppr r2 <+> ppr co2]) $
+ assertPpr (rres == r1) (vcat [ ppr r1 <+> ppr co1, ppr r2 <+> ppr co2, text "res" <+> ppr rres <+> ppr res ]) $
+ res
+ where
+ res = opt_trans' is co1 co2
+ rres = coercionRole res
+ r1 = coercionRole co1
+ r2 = coercionRole co1
+-}
+
+opt_trans' is co1 co2
| isReflCo co1 = co2
-- optimize when co1 is a Refl Co
| otherwise = opt_trans1 is co1 co2
@@ -803,10 +846,37 @@ opt_trans_rule is co1 co2
-- Push transitivity inside axioms
opt_trans_rule is co1 co2
+ -- TrPushAxSym/TrPushSymAx
+ -- Put this first! Otherwise (#23619) we get
+ -- newtype N a = MkN a
+ -- axN :: forall a. N a ~ a
+ -- Now consider (axN ty ; sym (axN ty))
+ -- If we put TrPushSymAxR first, we'll get
+ -- (axN ty ; sym (axN ty)) :: N ty ~ N ty -- Obviously Refl
+ -- --> axN (sym (axN ty)) :: N ty ~ N ty -- Very stupid
+ | Just (sym1, ax1, ind1, cos1) <- isAxiom_maybe co1
+ , Just (sym2, ax2, ind2, cos2) <- isAxiom_maybe co2
+ , ax1 == ax2
+ , ind1 == ind2
+ , sym1 == not sym2
+ , let branch = coAxiomNthBranch ax1 ind1
+ role = coAxiomRole ax1
+ qtvs = coAxBranchTyVars branch ++ coAxBranchCoVars branch
+ lhs = coAxNthLHS ax1 ind1
+ rhs = coAxBranchRHS branch
+ pivot_tvs = exactTyCoVarsOfType (if sym2 then rhs else lhs)
+ , all (`elemVarSet` pivot_tvs) qtvs
+ = fireTransRule "TrPushAxSym" co1 co2 $
+ if sym2
+ -- TrPushAxSym
+ then liftCoSubstWith role qtvs (opt_transList is cos1 (map mkSymCo cos2)) lhs
+ -- TrPushSymAx
+ else liftCoSubstWith role qtvs (opt_transList is (map mkSymCo cos1) cos2) rhs
+
-- See Note [Push transitivity inside axioms] and
-- Note [Push transitivity inside newtype axioms only]
-- TrPushSymAxR
- | Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
+ | Just (sym, con, ind, cos1) <- isAxiom_maybe co1
, isNewTyCon (coAxiomTyCon con)
, True <- sym
, Just cos2 <- matchAxiom sym con ind co2
@@ -814,7 +884,7 @@ opt_trans_rule is co1 co2
= fireTransRule "TrPushSymAxR" co1 co2 $ SymCo newAxInst
-- TrPushAxR
- | Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
+ | Just (sym, con, ind, cos1) <- isAxiom_maybe co1
, isNewTyCon (coAxiomTyCon con)
, False <- sym
, Just cos2 <- matchAxiom sym con ind co2
@@ -822,7 +892,7 @@ opt_trans_rule is co1 co2
= fireTransRule "TrPushAxR" co1 co2 newAxInst
-- TrPushSymAxL
- | Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
+ | Just (sym, con, ind, cos2) <- isAxiom_maybe co2
, isNewTyCon (coAxiomTyCon con)
, True <- sym
, Just cos1 <- matchAxiom (not sym) con ind co1
@@ -830,35 +900,13 @@ opt_trans_rule is co1 co2
= fireTransRule "TrPushSymAxL" co1 co2 $ SymCo newAxInst
-- TrPushAxL
- | Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
+ | Just (sym, con, ind, cos2) <- isAxiom_maybe co2
, isNewTyCon (coAxiomTyCon con)
, False <- sym
, Just cos1 <- matchAxiom (not sym) con ind co1
, let newAxInst = AxiomInstCo con ind (opt_transList is cos1 cos2)
= fireTransRule "TrPushAxL" co1 co2 newAxInst
- -- TrPushAxSym/TrPushSymAx
- | Just (sym1, con1, ind1, cos1) <- co1_is_axiom_maybe
- , Just (sym2, con2, ind2, cos2) <- co2_is_axiom_maybe
- , con1 == con2
- , ind1 == ind2
- , sym1 == not sym2
- , let branch = coAxiomNthBranch con1 ind1
- qtvs = coAxBranchTyVars branch ++ coAxBranchCoVars branch
- lhs = coAxNthLHS con1 ind1
- rhs = coAxBranchRHS branch
- pivot_tvs = exactTyCoVarsOfType (if sym2 then rhs else lhs)
- , all (`elemVarSet` pivot_tvs) qtvs
- = fireTransRule "TrPushAxSym" co1 co2 $
- if sym2
- -- TrPushAxSym
- then liftCoSubstWith role qtvs (opt_transList is cos1 (map mkSymCo cos2)) lhs
- -- TrPushSymAx
- else liftCoSubstWith role qtvs (opt_transList is (map mkSymCo cos1) cos2) rhs
- where
- co1_is_axiom_maybe = isAxiom_maybe co1
- co2_is_axiom_maybe = isAxiom_maybe co2
- role = coercionRole co1 -- should be the same as coercionRole co2!
opt_trans_rule _ co1 co2 -- Identity rule
| let ty1 = coercionLKind co1
@@ -1108,11 +1156,13 @@ chooseRole _ r = r
-----------
isAxiom_maybe :: Coercion -> Maybe (Bool, CoAxiom Branched, Int, [Coercion])
-isAxiom_maybe (SymCo co)
- | Just (sym, con, ind, cos) <- isAxiom_maybe co
- = Just (not sym, con, ind, cos)
-isAxiom_maybe (AxiomInstCo con ind cos)
- = Just (False, con, ind, cos)
+-- We don't expect to see nested SymCo; and that lets us write a simple,
+-- non-recursive function. (If we see a nested SymCo we'll just fail,
+-- which is ok.)
+isAxiom_maybe (SymCo (AxiomInstCo ax ind cos))
+ = Just (True, ax, ind, cos)
+isAxiom_maybe (AxiomInstCo ax ind cos)
+ = Just (False, ax, ind, cos)
isAxiom_maybe _ = Nothing
matchAxiom :: Bool -- True = match LHS, False = match RHS
=====================================
compiler/GHC/Core/Opt/Simplify/Env.hs
=====================================
@@ -23,6 +23,7 @@ module GHC.Core.Opt.Simplify.Env (
getInScope, setInScopeFromE, setInScopeFromF,
setInScopeSet, modifyInScope, addNewInScopeIds,
getSimplRules, enterRecGroupRHSs,
+ reSimplifying,
-- * Substitution results
SimplSR(..), mkContEx, substId, lookupRecBndr,
@@ -61,27 +62,31 @@ import GHC.Core.Utils
import GHC.Core.Multiplicity ( scaleScaled )
import GHC.Core.Unfold
import GHC.Core.TyCo.Subst (emptyIdSubstEnv)
-import GHC.Types.Var
-import GHC.Types.Var.Env
-import GHC.Types.Var.Set
-import GHC.Data.OrdList
-import GHC.Data.Graph.UnVar
-import GHC.Types.Id as Id
import GHC.Core.Make ( mkWildValBinder, mkCoreLet )
-import GHC.Builtin.Types
-import qualified GHC.Core.Type as Type
import GHC.Core.Type hiding ( substTy, substTyVar, substTyVarBndr, substCo
, extendTvSubst, extendCvSubst )
import qualified GHC.Core.Coercion as Coercion
import GHC.Core.Coercion hiding ( substCo, substCoVar, substCoVarBndr )
-import GHC.Platform ( Platform )
+import qualified GHC.Core.Type as Type
+
+import GHC.Types.Var
+import GHC.Types.Var.Env
+import GHC.Types.Var.Set
+import GHC.Types.Id as Id
import GHC.Types.Basic
+import GHC.Types.Unique.FM ( pprUniqFM )
+
+import GHC.Builtin.Types
+
+import GHC.Data.OrdList
+import GHC.Data.Graph.UnVar
+import GHC.Platform ( Platform )
+
import GHC.Utils.Monad
import GHC.Utils.Outputable
import GHC.Utils.Panic
import GHC.Utils.Panic.Plain
import GHC.Utils.Misc
-import GHC.Types.Unique.FM ( pprUniqFM )
import Data.List ( intersperse, mapAccumL )
@@ -151,6 +156,17 @@ following table:
| Set by user | SimplMode | TopEnvConfig |
| Computed on initialization | SimplEnv | SimplTopEnv |
+Note [Inline depth]
+~~~~~~~~~~~~~~~~~~~
+When we inline an /already-simplified/ unfolding, we
+* Zap the substitution environment; the inlined thing is an OutExpr
+* Bump the seInlineDepth in the SimplEnv
+Both these tasks are done in zapSubstEnv.
+
+The seInlineDepth tells us how deep in inlining we are. Currently,
+seInlineDepth is used for just one purpose: when we encounter a
+coercion we don't apply optCoercion to it if seInlineDepth>0.
+Reason: it has already been optimised once, no point in doing so again.
-}
data SimplEnv
@@ -180,7 +196,11 @@ data SimplEnv
-- They are all OutVars, and all bound in this module
, seInScope :: !InScopeSet -- OutVars only
- , seCaseDepth :: !Int -- Depth of multi-branch case alternatives
+ , seCaseDepth :: !Int -- Depth of multi-branch case alternatives
+
+ , seInlineDepth :: !Int -- 0 initially, 1 when we inline an already-simplified
+ -- unfolding, and simplify again; and so on
+ -- See Note [Inline depth]
}
seArityOpts :: SimplEnv -> ArityOpts
@@ -488,14 +508,15 @@ points we're substituting. -}
mkSimplEnv :: SimplMode -> (FamInstEnv, FamInstEnv) -> SimplEnv
mkSimplEnv mode fam_envs
- = SimplEnv { seMode = mode
- , seFamEnvs = fam_envs
- , seInScope = init_in_scope
- , seTvSubst = emptyVarEnv
- , seCvSubst = emptyVarEnv
- , seIdSubst = emptyVarEnv
- , seRecIds = emptyUnVarSet
- , seCaseDepth = 0 }
+ = SimplEnv { seMode = mode
+ , seFamEnvs = fam_envs
+ , seInScope = init_in_scope
+ , seTvSubst = emptyVarEnv
+ , seCvSubst = emptyVarEnv
+ , seIdSubst = emptyVarEnv
+ , seRecIds = emptyUnVarSet
+ , seCaseDepth = 0
+ , seInlineDepth = 0 }
-- The top level "enclosing CC" is "SUBSUMED".
init_in_scope :: InScopeSet
@@ -531,6 +552,9 @@ updMode upd env
bumpCaseDepth :: SimplEnv -> SimplEnv
bumpCaseDepth env = env { seCaseDepth = seCaseDepth env + 1 }
+reSimplifying :: SimplEnv -> Bool
+reSimplifying (SimplEnv { seInlineDepth = n }) = n>0
+
---------------------
extendIdSubst :: SimplEnv -> Id -> SimplSR -> SimplEnv
extendIdSubst env@(SimplEnv {seIdSubst = subst}) var res
@@ -616,7 +640,12 @@ setInScopeFromE.
---------------------
zapSubstEnv :: SimplEnv -> SimplEnv
-zapSubstEnv env = env {seTvSubst = emptyVarEnv, seCvSubst = emptyVarEnv, seIdSubst = emptyVarEnv}
+-- See Note [Inline depth]
+-- We call zapSubstEnv precisely when we are about to
+-- simplify an already-simplified term
+zapSubstEnv env@(SimplEnv { seInlineDepth = n })
+ = env { seTvSubst = emptyVarEnv, seCvSubst = emptyVarEnv, seIdSubst = emptyVarEnv
+ , seInlineDepth = n+1 }
setSubstEnv :: SimplEnv -> TvSubstEnv -> CvSubstEnv -> SimplIdSubst -> SimplEnv
setSubstEnv env tvs cvs ids = env { seTvSubst = tvs, seCvSubst = cvs, seIdSubst = ids }
=====================================
compiler/GHC/Core/Opt/Simplify/Inline.hs
=====================================
@@ -89,14 +89,18 @@ StrictAnal.addStrictnessInfoToTopId
callSiteInline :: Logger
-> UnfoldingOpts
- -> Int -- Case depth
+ -> Int -> Int -- Case depth and inline depth
-> Id -- The Id
-> Bool -- True <=> unfolding is active
-> Bool -- True if there are no arguments at all (incl type args)
-> [ArgSummary] -- One for each value arg; True if it is interesting
-> CallCtxt -- True <=> continuation is interesting
-> Maybe CoreExpr -- Unfolding, if any
-callSiteInline logger opts !case_depth id active_unfolding lone_variable arg_infos cont_info
+callSiteInline logger opts
+ !case_depth -- See Note [Avoid inlining into deeply nested cases]
+ !inline_depth -- Currently not used to control inlining
+ -- but we pass it for debug-logging purposes
+ id active_unfolding lone_variable arg_infos cont_info
= case idUnfolding id of
-- idUnfolding checks for loop-breakers, returning NoUnfolding
-- Things with an INLINE pragma may have an unfolding *and*
@@ -104,7 +108,7 @@ callSiteInline logger opts !case_depth id active_unfolding lone_variable arg_inf
CoreUnfolding { uf_tmpl = unf_template
, uf_cache = unf_cache
, uf_guidance = guidance }
- | active_unfolding -> tryUnfolding logger opts case_depth id lone_variable
+ | active_unfolding -> tryUnfolding logger opts case_depth inline_depth id lone_variable
arg_infos cont_info unf_template
unf_cache guidance
| otherwise -> traceInline logger opts id "Inactive unfolding:" (ppr id) Nothing
@@ -133,8 +137,9 @@ traceInline logger opts inline_id str doc result
{- Note [Avoid inlining into deeply nested cases]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Also called "exponential inlining".
-Consider a function f like this:
+Consider a function f like this: (#18730)
f arg1 arg2 =
case ...
@@ -145,46 +150,44 @@ This function is small. So should be safe to inline.
However sometimes this doesn't quite work out like that.
Consider this code:
-f1 arg1 arg2 ... = ...
- case _foo of
- alt1 -> ... f2 arg1 ...
- alt2 -> ... f2 arg2 ...
+ f1 arg1 arg2 ... = ...
+ case _foo of
+ alt1 -> ... f2 arg1 ...
+ alt2 -> ... f2 arg2 ...
-f2 arg1 arg2 ... = ...
- case _foo of
- alt1 -> ... f3 arg1 ...
- alt2 -> ... f3 arg2 ...
+ f2 arg1 arg2 ... = ...
+ case _foo of
+ alt1 -> ... f3 arg1 ...
+ alt2 -> ... f3 arg2 ...
-f3 arg1 arg2 ... = ...
+ f3 arg1 arg2 ... = ...
-... repeats up to n times. And then f1 is
-applied to some arguments:
+ ... repeats up to n times. And then f1 is
+ applied to some arguments:
-foo = ... f1 <interestingArgs> ...
+ foo = ... f1 <interestingArgs> ...
-Initially f2..fn are not interesting to inline so we don't.
-However we see that f1 is applied to interesting args.
-So it's an obvious choice to inline those:
+Initially f2..fn are not interesting to inline so we don't. However we see
+that f1 is applied to interesting args. So it's an obvious choice to inline
+those:
-foo =
- ...
- case _foo of
- alt1 -> ... f2 <interestingArg> ...
- alt2 -> ... f2 <interestingArg> ...
+ foo = ...
+ case _foo of
+ alt1 -> ... f2 <interestingArg> ...
+ alt2 -> ... f2 <interestingArg> ...
-As a result we go and inline f2 both mentions of f2 in turn are now applied to interesting
-arguments and f2 is small:
+As a result we go and inline f2 both mentions of f2 in turn are now applied to
+interesting arguments and f2 is small:
-foo =
- ...
- case _foo of
- alt1 -> ... case _foo of
- alt1 -> ... f3 <interestingArg> ...
- alt2 -> ... f3 <interestingArg> ...
+ foo = ...
+ case _foo of
+ alt1 -> ... case _foo of
+ alt1 -> ... f3 <interestingArg> ...
+ alt2 -> ... f3 <interestingArg> ...
- alt2 -> ... case _foo of
- alt1 -> ... f3 <interestingArg> ...
- alt2 -> ... f3 <interestingArg> ...
+ alt2 -> ... case _foo of
+ alt1 -> ... f3 <interestingArg> ...
+ alt2 -> ... f3 <interestingArg> ...
The same thing happens for each binding up to f_n, duplicating the amount of inlining
done in each step. Until at some point we are either done or run out of simplifier
@@ -201,19 +204,73 @@ The heuristic can be tuned in two ways:
* We can ignore the first n levels of case nestings for inlining decisions using
-funfolding-case-threshold.
-* The penalty grows linear with the depth. It's computed as size*(depth-threshold)/scaling.
+
+* The penalty grows linear with the depth. It's computed as
+ size*(depth-threshold)/scaling.
Scaling can be set with -funfolding-case-scaling.
+Reflections and wrinkles
+
+* See also Note [Do not add unfoldings to join points at birth] in
+ GHC.Core.Opt.Simplify.Iteration
+
+* The case total case depth is really the wrong thing; it will inhibit inlining of a
+ local function, just because there is some giant case nest further out. What we
+ want is the /difference/ in case-depth between the binding site and the call site.
+ That could be done quite easily by adding the case-depth to the Unfolding of the
+ function.
+
+* What matters more than /depth/ is total /width/; that is how many alternatives
+ are in the tree. We could perhaps multiply depth by width at each case expression.
+
+* There might be a case nest with many alternatives, but the function is called in
+ only a handful of them. So maybe we should ignore case-depth, and instead penalise
+ funtions that are called many times -- after all, inlining them bloats code.
+
+ But in the scenario above, we are simplifying an inlined fuction, without doing a
+ global occurrence analysis each time. So if we based the penalty on multiple
+ occurences, we should /also/ add a penalty when simplifying an already-simplified
+ expression. We do track this (seInlineDepth) but currently we barely use it.
+
+ An advantage of using occurrences+inline depth is that it'll work when no
+ case expressions are involved. See #15488.
+
+* Test T18730 did not involve join points. But join points are very prone to
+ the same kind of thing. For exampe in #13253, and several related tickets,
+ we got an exponential blowup in code size from a program that looks like
+ this.
+
+ let j1a x = case f y of { True -> p; False -> q }
+ j1b x = case f y of { True -> q; False -> p }
+ j2a x = case f (y+1) of { True -> j1a x; False -> j1b x}
+ j2b x = case f (y+1) of { True -> j1b x; False -> j1a x}
+ ...
+ in case f (y+10) of { True -> j10a 7; False -> j10b 8 }
+
+ The first danger is this: in Simplifier iteration 1 postInlineUnconditionally
+ inlines the last functions, j10a and j10b (they are both small). Now we have
+ two calls to j9a and two to j9b. In the next Simplifer iteration,
+ postInlineUnconditionally inlines all four of these calls, leaving four calls
+ to j8a and j8b. Etc.
+
+ Happily, this probably /won't/ happen because the Simplifier works top down, so it'll
+ inline j1a/j1b into j2a/j2b, which will make the latter bigger; so the process
+ will stop. But we still need to stop the inline cascade described at the head
+ of this Note.
+
Some guidance on setting these defaults:
* A low treshold (<= 2) is needed to prevent exponential cases from spiraling out of
control. We picked 2 for no particular reason.
+
* Scaling the penalty by any more than 30 means the reproducer from
T18730 won't compile even with reasonably small values of n. Instead
it will run out of runs/ticks. This means to positively affect the reproducer
a scaling <= 30 is required.
+
* A scaling of >= 15 still causes a few very large regressions on some nofib benchmarks.
(+80% for gc/fulsom, +90% for real/ben-raytrace, +20% for spectral/fibheaps)
+
* A scaling of >= 25 showed no regressions on nofib. However it showed a number of
(small) regression for compiler perf benchmarks.
@@ -222,15 +279,15 @@ This gives us minimal compiler perf regressions. No nofib runtime regressions an
will still avoid this pattern sometimes. This is a "safe" default, where we err on
the side of compiler blowup instead of risking runtime regressions.
-For cases where the default falls short the flag can be changed to allow more/less inlining as
-needed on a per-module basis.
+For cases where the default falls short the flag can be changed to allow
+more/less inlining as needed on a per-module basis.
-}
-tryUnfolding :: Logger -> UnfoldingOpts -> Int -> Id -> Bool -> [ArgSummary] -> CallCtxt
+tryUnfolding :: Logger -> UnfoldingOpts -> Int -> Int -> Id -> Bool -> [ArgSummary] -> CallCtxt
-> CoreExpr -> UnfoldingCache -> UnfoldingGuidance
-> Maybe CoreExpr
-tryUnfolding logger opts !case_depth id lone_variable arg_infos
+tryUnfolding logger opts !case_depth !inline_depth id lone_variable arg_infos
cont_info unf_template unf_cache guidance
= case guidance of
UnfNever -> traceInline logger opts id str (text "UnfNever") Nothing
@@ -263,8 +320,7 @@ tryUnfolding logger opts !case_depth id lone_variable arg_infos
small_enough = adjusted_size <= unfoldingUseThreshold opts
discount = computeDiscount arg_discounts res_discount arg_infos cont_info
- extra_doc = vcat [ text "case depth =" <+> int case_depth
- , text "depth based penalty =" <+> int depth_penalty
+ extra_doc = vcat [ text "depth based penalty =" <+> int depth_penalty
, text "discounted size =" <+> int adjusted_size ]
where
-- Unpack the UnfoldingCache lazily because it may not be needed, and all
@@ -281,6 +337,8 @@ tryUnfolding logger opts !case_depth id lone_variable arg_infos
, text "is exp:" <+> ppr is_exp
, text "is work-free:" <+> ppr is_wf
, text "guidance" <+> ppr guidance
+ , text "case depth =" <+> int case_depth
+ , text "inline depth =" <+> int inline_depth
, extra_doc
, text "ANSWER =" <+> if yes_or_no then text "YES" else text "NO"]
=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -13,14 +13,12 @@ module GHC.Core.Opt.Simplify.Iteration ( simplTopBinds, simplExpr, simplImpRules
import GHC.Prelude
-import GHC.Platform
-
import GHC.Driver.Flags
import GHC.Core
import GHC.Core.Opt.Simplify.Monad
import GHC.Core.Opt.ConstantFold
-import GHC.Core.Type hiding ( substTy, substTyVar, extendTvSubst, extendCvSubst )
+import GHC.Core.Type hiding ( substCo, substTy, substTyVar, extendTvSubst, extendCvSubst )
import GHC.Core.TyCo.Compare( eqType )
import GHC.Core.Opt.Simplify.Env
import GHC.Core.Opt.Simplify.Inline
@@ -45,6 +43,7 @@ import GHC.Core.Opt.Arity ( ArityType, exprArity, arityTypeBotSigs_maybe
, pushCoTyArg, pushCoValArg, exprIsDeadEnd
, typeArity, arityTypeArity, etaExpandAT )
import GHC.Core.SimpleOpt ( exprIsConApp_maybe, joinPointBinding_maybe, joinPointBindings_maybe )
+-- import GHC.Core.FVs ( mkRuleInfo, exprsFreeIds )
import GHC.Core.FVs ( mkRuleInfo )
import GHC.Core.Rules ( lookupRule, getRules )
import GHC.Core.Multiplicity
@@ -60,10 +59,12 @@ import GHC.Types.Unique ( hasKey )
import GHC.Types.Basic
import GHC.Types.Tickish
import GHC.Types.Var ( isTyCoVar )
+-- import GHC.Types.Var.Set
import GHC.Builtin.PrimOps ( PrimOp (SeqOp) )
import GHC.Builtin.Types.Prim( realWorldStatePrimTy )
import GHC.Builtin.Names( runRWKey )
+-- import GHC.Data.Maybe ( isNothing, isJust, orElse, mapMaybe )
import GHC.Data.Maybe ( isNothing, orElse, mapMaybe )
import GHC.Data.FastString
import GHC.Unit.Module ( moduleName )
@@ -399,7 +400,8 @@ simplJoinBind env is_rec cont old_bndr new_bndr rhs rhs_se
; completeBind env (BC_Join is_rec cont) old_bndr new_bndr rhs' }
--------------------------
-simplAuxBind :: SimplEnv
+simplAuxBind :: String
+ -> SimplEnv
-> InId -- Old binder; not a JoinId
-> OutExpr -- Simplified RHS
-> SimplM (SimplFloats, SimplEnv)
@@ -411,17 +413,22 @@ simplAuxBind :: SimplEnv
--
-- Precondition: rhs satisfies the let-can-float invariant
-simplAuxBind env bndr new_rhs
+simplAuxBind _str env bndr new_rhs
| assertPpr (isId bndr && not (isJoinId bndr)) (ppr bndr) $
isDeadBinder bndr -- Not uncommon; e.g. case (a,b) of c { (p,q) -> p }
= return (emptyFloats env, env) -- Here c is dead, and we avoid
-- creating the binding c = (a,b)
-- The cases would be inlined unconditionally by completeBind:
- -- but it seems not uncommon, and avoids faff to do it here
- -- This is safe because it's only used for auxiliary bindings, which
- -- have no NOLINE pragmas, nor RULEs
+ -- but it seems not uncommon, and it turns to be a little more
+ -- efficient (in compile time allocations) to do it here.
+ -- Effectively this is just a poor man's postInlineUnconditionally
+ -- See Note [Post-inline for single-use things] in GHC.Core.Opt.Simplify.Utils
+ -- Note: auxiliary bindings have no NOLINE pragmas, RULEs, or stable unfoldings
| exprIsTrivial new_rhs -- Short-cut for let x = y in ...
+ || case (idOccInfo bndr) of
+ OneOcc{ occ_n_br = 1, occ_in_lam = NotInsideLam } -> True
+ _ -> False
= return ( emptyFloats env
, case new_rhs of
Coercion co -> extendCvSubst env bndr co
@@ -587,11 +594,10 @@ Note [Concrete types] in GHC.Tc.Utils.Concrete.
-}
tryCastWorkerWrapper :: SimplEnv -> BindContext
- -> InId -> OccInfo
- -> OutId -> OutExpr
+ -> InId -> OutId -> OutExpr
-> SimplM (SimplFloats, SimplEnv)
-- See Note [Cast worker/wrapper]
-tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
+tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
| BC_Let top_lvl is_rec <- bind_cxt -- Not join points
, not (isDFunId bndr) -- nor DFuns; cast w/w is no help, and we can't transform
-- a DFunUnfolding in mk_worker_unfolding
@@ -617,7 +623,7 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
triv_rhs = Cast (Var work_id_w_unf) co
- ; if postInlineUnconditionally env bind_cxt bndr occ_info triv_rhs
+ ; if postInlineUnconditionally env bind_cxt old_bndr bndr triv_rhs
-- Almost always True, because the RHS is trivial
-- In that case we want to eliminate the binding fast
-- We conservatively use postInlineUnconditionally so that we
@@ -660,7 +666,7 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
| isStableSource src -> return (unf { uf_tmpl = mkCast unf_rhs (mkSymCo co) })
_ -> mkLetUnfolding uf_opts top_lvl VanillaSrc work_id work_rhs
-tryCastWorkerWrapper env _ _ _ bndr rhs -- All other bindings
+tryCastWorkerWrapper env _ _ bndr rhs -- All other bindings
= do { traceSmpl "tcww:no" (vcat [ text "bndr:" <+> ppr bndr
, text "rhs:" <+> ppr rhs ])
; return (mkFloatBind env (NonRec bndr rhs)) }
@@ -939,7 +945,6 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
= assert (isId new_bndr) $
do { let old_info = idInfo old_bndr
old_unf = realUnfoldingInfo old_info
- occ_info = occInfo old_info
-- Do eta-expansion on the RHS of the binding
-- See Note [Eta-expanding at let bindings] in GHC.Core.Opt.Simplify.Utils
@@ -952,7 +957,7 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
; let new_bndr_w_info = addLetBndrInfo new_bndr new_arity new_unfolding
-- See Note [In-scope set as a substitution]
- ; if postInlineUnconditionally env bind_cxt new_bndr_w_info occ_info eta_rhs
+ ; if postInlineUnconditionally env bind_cxt old_bndr new_bndr_w_info eta_rhs
then -- Inline and discard the binding
do { tick (PostInlineUnconditionally old_bndr)
@@ -966,8 +971,9 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
-- substitution will happen, since we are going to discard the binding
else -- Keep the binding; do cast worker/wrapper
- -- pprTrace "Binding" (ppr new_bndr <+> ppr new_unfolding) $
- tryCastWorkerWrapper env bind_cxt old_bndr occ_info new_bndr_w_info eta_rhs }
+-- simplTrace "completeBind" (vcat [ text "bndrs" <+> ppr old_bndr <+> ppr new_bndr
+-- , text "eta_rhs" <+> ppr eta_rhs ]) $
+ tryCastWorkerWrapper env bind_cxt old_bndr new_bndr_w_info eta_rhs }
addLetBndrInfo :: OutId -> ArityType -> Unfolding -> OutId
addLetBndrInfo new_bndr new_arity_type new_unf
@@ -1331,10 +1337,16 @@ simplCoercionF env co cont
simplCoercion :: SimplEnv -> InCoercion -> SimplM OutCoercion
simplCoercion env co
- = do { let opt_co = optCoercion opts (getSubst env) co
+ = do { let opt_co | reSimplifying env = substCo env co
+ | otherwise = optCoercion opts subst co
+ -- If (reSimplifying env) is True we have already
+ -- simplified this coercion once, and we don't
+ -- want do so again; doing so repeatedly risks
+ -- non-linear behaviour
; seqCo opt_co `seq` return opt_co }
where
- opts = seOptCoercionOpts env
+ subst = getSubst env
+ opts = seOptCoercionOpts env
-----------------------------------
-- | Push a TickIt context outwards past applications and cases, as
@@ -1443,8 +1455,8 @@ simplTick env tickish expr cont
splitCont :: SimplCont -> (SimplCont, SimplCont)
splitCont cont@(ApplyToTy { sc_cont = tail }) = (cont { sc_cont = inc }, outc)
where (inc,outc) = splitCont tail
- splitCont (CastIt co c) = (CastIt co inc, outc)
- where (inc,outc) = splitCont c
+ splitCont cont@(CastIt { sc_cont = tail }) = (cont { sc_cont = inc }, outc)
+ where (inc,outc) = splitCont tail
splitCont other = (mkBoringStop (contHoleType other), other)
getDoneId (DoneId id) = Just id
@@ -1500,8 +1512,11 @@ rebuild env expr cont
= case cont of
Stop {} -> return (emptyFloats env, expr)
TickIt t cont -> rebuild env (mkTick t expr) cont
- CastIt co cont -> rebuild env (mkCast expr co) cont
- -- NB: mkCast implements the (Coercion co |> g) optimisation
+ CastIt { sc_co = co, sc_opt = opt, sc_cont = cont }
+ -> rebuild env (mkCast expr co') cont
+ -- NB: mkCast implements the (Coercion co |> g) optimisation
+ where
+ co' = optOutCoercion env co opt
Select { sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont }
-> rebuildCase (se `setInScopeFromE` env) expr bndr alts cont
@@ -1552,7 +1567,7 @@ completeBindX env from_what bndr rhs body cont
bndr2 (emptyFloats env) rhs
-- NB: it makes a surprisingly big difference (5% in compiler allocation
-- in T9630) to pass 'env' rather than 'env1'. It's fine to pass 'env',
- -- because this is simplNonRecX, so bndr is not in scope in the RHS.
+ -- because this is completeBindX, so bndr is not in scope in the RHS.
; (bind_float, env2) <- completeBind (env2 `setInScopeFromF` rhs_floats)
(BC_Let NotTopLevel NonRecursive)
@@ -1600,36 +1615,73 @@ isReflexiveCo
In investigating this I saw missed opportunities for on-the-fly
coercion shrinkage. See #15090.
+
+Note [Avoid re-simplifying coercions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In some benchmarks (with deeply nested cases) we successively push
+casts onto the SimplCont. We don't want to call the coercion optimiser
+on each successive composition -- that's at least quadratic. So:
+
+* The CastIt constructor in SimplCont has a `sc_opt :: Bool` flag to
+ record whether the coercion optimiser has been applied to the coercion.
+
+* In `simplCast`, when we see (Cast e co), we simplify `co` to get
+ an OutCoercion, and built a CastIt with sc_opt=True.
+
+ Actually not quite: if we are simplifying the result of inlining an
+ unfolding (seInlineDepth > 0), then instead of /optimising/ it again,
+ just /substitute/ which is cheaper. See `simplCoercion`.
+
+* In `addCoerce` (in `simplCast`) if we combine this new coercion with
+ an existing once, we build a CastIt for (co1 ; co2) with sc_opt=False.
+
+* When unpacking a CastIt, in `rebuildCall` and `rebuild`, we optimise
+ the (presumably composed) coercion if sc_opt=False; this is done
+ by `optOutCoercion`.
+
+* When duplicating a continuation in `mkDupableContWithDmds`, before
+ duplicating a CastIt, optimise the coercion. Otherwise we'll end up
+ optimising it separately in the duplicate copies.
-}
-simplCast :: SimplEnv -> InExpr -> Coercion -> SimplCont
+optOutCoercion :: SimplEnv -> OutCoercion -> Bool -> OutCoercion
+-- See Note [Avoid re-simplifying coercions]
+optOutCoercion env co already_optimised
+ | already_optimised = co -- See Note [Avoid re-simplifying coercions]
+ | otherwise = optCoercion opts empty_subst co
+ where
+ empty_subst = mkEmptySubst (seInScope env)
+ opts = seOptCoercionOpts env
+
+simplCast :: SimplEnv -> InExpr -> InCoercion -> SimplCont
-> SimplM (SimplFloats, OutExpr)
simplCast env body co0 cont0
= do { co1 <- {-#SCC "simplCast-simplCoercion" #-} simplCoercion env co0
; cont1 <- {-#SCC "simplCast-addCoerce" #-}
if isReflCo co1
then return cont0 -- See Note [Optimising reflexivity]
- else addCoerce co1 cont0
+ else addCoerce co1 True cont0
+ -- True <=> co1 is optimised
; {-#SCC "simplCast-simplExprF" #-} simplExprF env body cont1 }
where
+
-- If the first parameter is MRefl, then simplifying revealed a
-- reflexive coercion. Omit.
- addCoerceM :: MOutCoercion -> SimplCont -> SimplM SimplCont
- addCoerceM MRefl cont = return cont
- addCoerceM (MCo co) cont = addCoerce co cont
-
- addCoerce :: OutCoercion -> SimplCont -> SimplM SimplCont
- addCoerce co1 (CastIt co2 cont) -- See Note [Optimising reflexivity]
- | isReflexiveCo co' = return cont
- | otherwise = addCoerce co' cont
- where
- co' = mkTransCo co1 co2
-
- addCoerce co (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
+ addCoerceM :: MOutCoercion -> Bool -> SimplCont -> SimplM SimplCont
+ addCoerceM MRefl _ cont = return cont
+ addCoerceM (MCo co) opt cont = addCoerce co opt cont
+
+ addCoerce :: OutCoercion -> Bool -> SimplCont -> SimplM SimplCont
+ addCoerce co1 _ (CastIt { sc_co = co2, sc_cont = cont }) -- See Note [Optimising reflexivity]
+ = addCoerce (mkTransCo co1 co2) False cont
+ -- False: (mkTransCo co1 co2) is not fully optimised
+ -- See Note [Avoid re-simplifying coercions]
+
+ addCoerce co opt (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
| Just (arg_ty', m_co') <- pushCoTyArg co arg_ty
= {-#SCC "addCoerce-pushCoTyArg" #-}
- do { tail' <- addCoerceM m_co' tail
+ do { tail' <- addCoerceM m_co' opt tail
; return (ApplyToTy { sc_arg_ty = arg_ty'
, sc_cont = tail'
, sc_hole_ty = coercionLKind co }) }
@@ -1640,18 +1692,20 @@ simplCast env body co0 cont0
-- where co :: (s1->s2) ~ (t1->t2)
-- co1 :: t1 ~ s1
-- co2 :: s2 ~ t2
- addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
- , sc_dup = dup, sc_cont = tail
- , sc_hole_ty = fun_ty })
+ addCoerce co opt cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
+ , sc_dup = dup, sc_cont = tail
+ , sc_hole_ty = fun_ty })
+ | not opt -- pushCoValArg duplicates the coercion, so optimise first
+ = addCoerce (optOutCoercion env co opt) True cont
+
| Just (m_co1, m_co2) <- pushCoValArg co
, fixed_rep m_co1
= {-#SCC "addCoerce-pushCoValArg" #-}
- do { tail' <- addCoerceM m_co2 tail
+ do { tail' <- addCoerceM m_co2 opt tail
; case m_co1 of {
MRefl -> return (cont { sc_cont = tail'
, sc_hole_ty = coercionLKind co }) ;
- -- Avoid simplifying if possible;
- -- See Note [Avoiding exponential behaviour]
+ -- See Note [Avoiding simplifying repeatedly]
MCo co1 ->
do { (dup', arg_se', arg') <- simplArg env dup fun_ty arg_se arg
@@ -1666,11 +1720,11 @@ simplCast env body co0 cont0
, sc_cont = tail'
, sc_hole_ty = coercionLKind co }) } } }
- addCoerce co cont
- | isReflexiveCo co = return cont -- Having this at the end makes a huge
- -- difference in T12227, for some reason
- -- See Note [Optimising reflexivity]
- | otherwise = return (CastIt co cont)
+ addCoerce co opt cont
+ | isReflCo co = return cont -- Having this at the end makes a huge
+ -- difference in T12227, for some reason
+ -- See Note [Optimising reflexivity]
+ | otherwise = return (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
fixed_rep :: MCoercionR -> Bool
fixed_rep MRefl = True
@@ -1732,7 +1786,7 @@ simpl_lam env bndr body (ApplyToVal { sc_arg = arg, sc_env = arg_se
; let arg_ty = funArgTy fun_ty
; if | isSimplified dup -- Don't re-simplify if we've simplified it once
-- Including don't preInlineUnconditionally
- -- See Note [Avoiding exponential behaviour]
+ -- See Note [Avoiding simplifying repeatedly]
-> completeBindX env (FromBeta arg_ty) bndr arg body cont
| Just env' <- preInlineUnconditionally env NotTopLevel bndr arg arg_se
@@ -1866,27 +1920,35 @@ Simplifier without first calling SimpleOpt, so anything involving
GHCi or TH and operator sections will fall over if we don't take
care here.
-Note [Avoiding exponential behaviour]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Avoiding simplifying repeatedly]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One way in which we can get exponential behaviour is if we simplify a
big expression, and then re-simplify it -- and then this happens in a
deeply-nested way. So we must be jolly careful about re-simplifying
-an expression (#13379). That is why simplNonRecX does not try
-preInlineUnconditionally (unlike simplNonRecE).
+an expression (#13379).
Example:
f BIG, where f has a RULE
Then
* We simplify BIG before trying the rule; but the rule does not fire
- * We inline f = \x. x True
- * So if we did preInlineUnconditionally we'd re-simplify (BIG True)
+ (forcing this simplification is why we have the RULE in this example)
+ * We inline f = \x. g x, in `simpl_lam`
+ * So if `simpl_lam` did preInlineUnconditionally we get (g BIG)
+ * Now if g has a RULE we'll simplify BIG again, and this whole thing can
+ iterate.
+ * However, if `f` did not have a RULE, so that BIG has /not/ already been
+ simplified, we /want/ to do preInlineUnconditionally in simpl_lam.
-However, if BIG has /not/ already been simplified, we'd /like/ to
-simplify BIG True; maybe good things happen. That is why
+So we go to some effort to avoid repeatedly simplifying the same thing:
-* simplLam has
- - a case for (isSimplified dup), which goes via simplNonRecX, and
- - a case for the un-simplified case, which goes via simplNonRecE
+* ApplyToVal has a (sc_dup :: DupFlag) field which records if the argument
+ has been evaluated.
+
+* simplArg checks this flag to avoid re-simplifying.
+
+* simpl_lam has:
+ - a case for (isSimplified dup), which goes via completeBindX, and
+ - a case for an un-simplified argument, which tries preInlineUnconditionally
* We go to some efforts to avoid unnecessarily simplifying ApplyToVal,
in at least two places
@@ -1894,6 +1956,11 @@ simplify BIG True; maybe good things happen. That is why
- In rebuildCall we avoid simplifying arguments before we have to
(see Note [Trying rewrite rules])
+All that said /postInlineUnconditionally/ (called in `completeBind`) does
+fire in the above (f BIG) situation. See Note [Post-inline for single-use
+things] in Simplify.Utils. This certainly risks repeated simplification, but
+in practice seems to be a small win.
+
************************************************************************
* *
@@ -2214,8 +2281,10 @@ rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args
_ -> True
---------- Simplify type applications and casts --------------
-rebuildCall env info (CastIt co cont)
- = rebuildCall env (addCastTo info co) cont
+rebuildCall env info (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
+ = rebuildCall env (addCastTo info co') cont
+ where
+ co' = optOutCoercion env co opt
rebuildCall env info (ApplyToTy { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = cont })
= rebuildCall env (addTyArgTo info arg_ty hole_ty) cont
@@ -2297,7 +2366,7 @@ rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args }) cont
-----------------------------------
tryInlining :: SimplEnv -> Logger -> OutId -> SimplCont -> SimplM (Maybe OutExpr)
tryInlining env logger var cont
- | Just expr <- callSiteInline logger uf_opts case_depth var active_unf
+ | Just expr <- callSiteInline logger uf_opts case_depth inline_depth var active_unf
lone_variable arg_infos interesting_cont
= do { dump_inline expr cont
; return (Just expr) }
@@ -2308,6 +2377,7 @@ tryInlining env logger var cont
where
uf_opts = seUnfoldingOpts env
case_depth = seCaseDepth env
+ inline_depth = seInlineDepth env
(lone_variable, arg_infos, call_cont) = contArgs cont
interesting_cont = interestingCallContext env call_cont
active_unf = activeUnfolding (seMode env) var
@@ -2345,7 +2415,7 @@ Then given (f Int e1) we rewrite to
(\x. x True) e1
without simplifying e1. Now we can inline x into its unique call site,
and absorb the True into it all in the same pass. If we simplified
-e1 first, we couldn't do that; see Note [Avoiding exponential behaviour].
+e1 first, we couldn't do that; see Note [Avoiding simplifying repeatedly].
So we try to apply rules if either
(a) no_more_args: we've run out of argument that the rules can "see"
@@ -2961,7 +3031,7 @@ rebuildCase env scrut case_bndr alts cont
where
simple_rhs env wfloats case_bndr_rhs bs rhs =
assert (null bs) $
- do { (floats1, env') <- simplAuxBind env case_bndr case_bndr_rhs
+ do { (floats1, env') <- simplAuxBind "rebuildCase" env case_bndr case_bndr_rhs
-- scrut is a constructor application,
-- hence satisfies let-can-float invariant
; (floats2, expr') <- simplExprF env' rhs cont
@@ -3028,7 +3098,7 @@ rebuildCase env scrut case_bndr alts@[Alt _ bndrs rhs] cont
| all_dead_bndrs
, doCaseToLet scrut case_bndr
= do { tick (CaseElim case_bndr)
- ; (floats1, env') <- simplAuxBind env case_bndr scrut
+ ; (floats1, env') <- simplAuxBind "rebuildCaseAlt1" env case_bndr scrut
; (floats2, expr') <- simplExprF env' rhs cont
; return (floats1 `addFloats` floats2, expr') }
@@ -3241,7 +3311,6 @@ simplAlts env0 scrut case_bndr alts cont'
-- See Note [Shadowing in prepareAlts] in GHC.Core.Opt.Simplify.Utils
; alts' <- mapM (simplAlt alt_env' (Just scrut') imposs_deflt_cons case_bndr' cont') in_alts
--- ; pprTrace "simplAlts" (ppr case_bndr $$ ppr alts $$ ppr cont') $ return ()
; let alts_ty' = contResultType cont'
-- See Note [Avoiding space leaks in OutType]
@@ -3528,8 +3597,8 @@ knownCon env scrut dc_floats dc dc_ty_args dc_args bndr bs rhs cont
-- occur in the RHS; and simplAuxBind may therefore discard it.
-- Nevertheless we must keep it if the case-binder is alive,
-- because it may be used in the con_app. See Note [knownCon occ info]
- ; (floats1, env2) <- simplAuxBind env' b' arg -- arg satisfies let-can-float invariant
- ; (floats2, env3) <- bind_args env2 bs' args
+ ; (floats1, env2) <- simplAuxBind "knownCon" env' b' arg -- arg satisfies let-can-float invariant
+ ; (floats2, env3) <- bind_args env2 bs' args
; return (floats1 `addFloats` floats2, env3) }
bind_args _ _ _ =
@@ -3554,7 +3623,7 @@ knownCon env scrut dc_floats dc dc_ty_args dc_args bndr bs rhs cont
; let con_app = Var (dataConWorkId dc)
`mkTyApps` dc_ty_args
`mkApps` dc_args
- ; simplAuxBind env bndr con_app }
+ ; simplAuxBind "case-bndr" env bndr con_app }
-------------------
missingAlt :: SimplEnv -> Id -> [InAlt] -> SimplCont
@@ -3651,9 +3720,11 @@ mkDupableContWithDmds env _ cont
mkDupableContWithDmds _ _ (Stop {}) = panic "mkDupableCont" -- Handled by previous eqn
-mkDupableContWithDmds env dmds (CastIt ty cont)
+mkDupableContWithDmds env dmds (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
= do { (floats, cont') <- mkDupableContWithDmds env dmds cont
- ; return (floats, CastIt ty cont') }
+ ; return (floats, CastIt { sc_co = optOutCoercion env co opt
+ , sc_opt = True, sc_cont = cont' }) }
+ -- optOutCoercion: see Note [Avoid re-simplifying coercions]
-- Duplicating ticks for now, not sure if this is good or not
mkDupableContWithDmds env dmds (TickIt t cont)
@@ -3684,6 +3755,7 @@ mkDupableContWithDmds env _
| isNothing (isDataConId_maybe (ai_fun fun))
, thumbsUpPlanA cont -- See point (3) of Note [Duplicating join points]
= -- Use Plan A of Note [Duplicating StrictArg]
+-- pprTrace "Using plan A" (ppr (ai_fun fun) $$ text "args" <+> ppr (ai_args fun) $$ text "cont" <+> ppr cont) $
do { let (_ : dmds) = ai_dmds fun
; (floats1, cont') <- mkDupableContWithDmds env dmds cont
-- Use the demands from the function to add the right
@@ -3707,14 +3779,15 @@ mkDupableContWithDmds env _
; (floats, join_rhs) <- rebuildCall env' (addValArgTo fun (Var arg_bndr) fun_ty) cont
; mkDupableStrictBind env' arg_bndr (wrapFloats floats join_rhs) rhs_ty }
where
+ thumbsUpPlanA (StrictBind {}) = True
+ thumbsUpPlanA (Stop {}) = True
+ thumbsUpPlanA (Select {}) = False -- Not quite sure of this one, but it
+ -- benefits nofib digits-of-e1 quite a bit
thumbsUpPlanA (StrictArg {}) = False
- thumbsUpPlanA (CastIt _ k) = thumbsUpPlanA k
+ thumbsUpPlanA (CastIt { sc_cont = k }) = thumbsUpPlanA k
thumbsUpPlanA (TickIt _ k) = thumbsUpPlanA k
thumbsUpPlanA (ApplyToVal { sc_cont = k }) = thumbsUpPlanA k
thumbsUpPlanA (ApplyToTy { sc_cont = k }) = thumbsUpPlanA k
- thumbsUpPlanA (Select {}) = True
- thumbsUpPlanA (StrictBind {}) = True
- thumbsUpPlanA (Stop {}) = True
mkDupableContWithDmds env dmds
(ApplyToTy { sc_cont = cont, sc_arg_ty = arg_ty, sc_hole_ty = hole_ty })
@@ -3775,8 +3848,7 @@ mkDupableContWithDmds env _
-- NB: we don't use alt_env further; it has the substEnv for
-- the alternatives, and we don't want that
- ; let platform = sePlatform env
- ; (join_floats, alts'') <- mapAccumLM (mkDupableAlt platform case_bndr')
+ ; (join_floats, alts'') <- mapAccumLM (mkDupableAlt env case_bndr')
emptyJoinFloats alts'
; let all_floats = floats `addJoinFloats` join_floats
@@ -3817,11 +3889,11 @@ mkDupableStrictBind env arg_bndr join_rhs res_ty
, sc_cont = mkBoringStop res_ty
} ) }
-mkDupableAlt :: Platform -> OutId
+mkDupableAlt :: SimplEnv -> OutId
-> JoinFloats -> OutAlt
-> SimplM (JoinFloats, OutAlt)
-mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
- | exprIsTrivial alt_rhs_in -- See point (2) of Note [Duplicating join points]
+mkDupableAlt _env case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
+ | ok_to_dup_alt case_bndr alt_bndrs alt_rhs_in -- See point (2) of Note [Duplicating join points]
= return (jfloats, Alt con alt_bndrs alt_rhs_in)
| otherwise
@@ -3852,7 +3924,7 @@ mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
filtered_binders = map fst abstracted_binders
-- We want to make any binder with an evaldUnfolding strict in the rhs.
-- See Note [Call-by-value for worker args] (which also applies to join points)
- (rhs_with_seqs) = mkStrictFieldSeqs abstracted_binders alt_rhs_in
+ rhs_with_seqs = mkStrictFieldSeqs abstracted_binders alt_rhs_in
final_args = varsToCoreExprs filtered_binders
-- Note [Join point abstraction]
@@ -3870,15 +3942,98 @@ mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
join_rhs = mkLams (map zapIdUnfolding final_bndrs) rhs_with_seqs
; join_bndr <- newJoinId filtered_binders rhs_ty'
-
- ; let join_call = mkApps (Var join_bndr) final_args
+ ; let -- join_bndr_w_unf = join_bndr `setIdUnfolding`
+ -- mkUnfolding uf_opts VanillaSrc False False join_rhs Nothing
+ -- See Note [Do not add unfoldings to join points at birth]
+ join_call = mkApps (Var join_bndr) final_args
alt' = Alt con alt_bndrs join_call
; return ( jfloats `addJoinFlts` unitJoinFloat (NonRec join_bndr join_rhs)
, alt') }
-- See Note [Duplicated env]
+ok_to_dup_alt :: OutId -> [OutVar] -> OutExpr -> Bool
+-- See Note [Duplicating alternatives]
+ok_to_dup_alt _case_bndr _alt_bndrs alt_rhs
+ | (Var v, args) <- collectArgs alt_rhs
+ , all exprIsTrivial args
+ = isNothing (isDataConId_maybe v)
+ | otherwise
+ = False
+
{-
+Note [Do not add unfoldings to join points at birth]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider this (#15360)
+
+ case (case (case (case ...))) of
+ Left x -> e1
+ Right y -> e2
+
+We will make a join point for e1, e2, thus
+ $j1a x = e1
+ $j1b y = e2
+
+Now those join points count as "duplicable" , so we feel free to duplicate
+them into the loop nest. And each of those calls are then subject to
+callSiteInline, which might inline them, if e1, e2 are reasonably small. Now,
+if this applies recursive to the next `case` inwards, and so on, the net
+effect is that we can get an exponential number of calls to $j1a and $j1b, and
+an exponential number of inlinings (since each is done independently).
+
+This hit #15360 (not a complicated program!) badly. Our simple solution is this:
+when a join point is born, we don't give it an unfolding. So we end up with
+ $j1a x = e1
+ $j1b y = e2
+ $j2a x = ...$j1a ... $j1b...
+ $j2b x = ...$j1a ... $j1b...
+ ... and so on...
+
+Now we are into Note [Avoid inlining into deeply nested cases] in Simplify.Inline,
+which is still a challenge. But at least we have a chance. If we add inlinings at
+birth we never get that chance.
+
+Wrinkle
+
+(JU1) It turns out that the same problem shows up in a different guise, via
+ Note [Post-inline for single-use things] in Simplify.Utils. I think
+ we have something like
+ case K (join $j x = <rhs> in jblah) of K y{OneOcc} -> blah
+ where $j is a freshly-born join point. After case-of-known-constructor
+ wo we end up substituting (join $j x = <rhs> in jblah) for `y` in `blah`;
+ and thus we re-simplify that join binding. In test T15630 this results in
+ masssive duplication.
+
+ So in `simplLetUnfolding` we spot this case a bit hackily; a freshly-born
+ join point will have OccInfo of ManyOccs, unlike an existing join point which
+ will have OneOcc. So in simplLetUnfolding we kill the unfolding of a freshly
+ born join point.
+
+I can't quite articulate precisely why this is so important. But it makes a MASSIVE
+difference in T15630 (a fantastic test case); and at worst it'll merely delay inlining
+join points by one simplifier iteration.
+
+Note [Duplicating alternatives]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When should we duplicate an alternative, and when should we make a join point?
+We don't want to make a join point if it will /definitely/ be inlined; that
+takes extra work to build, and an extra Simplifier iteration to do the inlining.
+So consider
+
+ case (case x of True -> e2; False -> e2) of
+ K1 a b -> f b a
+ K2 x -> g x v
+ K3 v -> Just v
+
+The (f b a) would turn into a join point like
+ $j1 a b = f b a
+which would immediately inline again because the call is not smaller than the RHS.
+On the other hand, the (g x v) turns into
+ $j2 x = g x v
+which won't imediately inline. Finally the (Just v) would turn into
+ $j3 v = Just v
+and you might think that would immediately inline.
+
Note [Fusing case continuations]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's important to fuse two successive case continuations when the
@@ -3923,7 +4078,7 @@ inlining join points. Consider
Here the join-point RHS is very small, just a constructor
application (K x y). So we might inline it to get
- case (case v of )
+ case (case v of )
( p1 -> K f x1 ) of
( p2 -> K f x2 )
( p3 -> K f x3 )
@@ -3947,14 +4102,12 @@ To achieve this:
phase. (The Final phase is still quite early, so we might consider
delaying still more.)
-2. In mkDupableAlt and mkDupableStrictBind, generate an alterative for
- all alternatives, except for exprIsTrival RHSs. Previously we used
- exprIsDupable. This generates a lot more join points, but makes
- them much more case-of-case friendly.
+2. In mkDupableAlt and mkDupableStrictBind, generate an alterative for all
+ alternatives, /unless/ the join point would be immediately inlined in the
+ following iteration: e.g. if its RHS is trivial.
- It is definitely worth checking for exprIsTrivial, otherwise we get
- an extra Simplifier iteration, because it is inlined in the next
- round.
+ (Previously we used exprIsDupable.) This generates a lot more join points,
+ but makes them much more case-of-case friendly.
3. By the same token we want to use Plan B in
Note [Duplicating StrictArg] when the RHS of the new join point
@@ -3978,7 +4131,7 @@ the join point only when the RHS is
* a constructor application? or
* just non-trivial?
Currently, a bit ad-hoc, but we definitely want to retain the join
-point for data constructors in mkDupalbleALt (point 2); that is the
+point for data constructors in mkDupableAlt (point 2); that is the
whole point of #19996 described above.
Historical Note [Case binders and join points]
@@ -4241,30 +4394,40 @@ simplLetUnfolding :: SimplEnv
simplLetUnfolding env bind_cxt id new_rhs rhs_ty arity unf
| isStableUnfolding unf
= simplStableUnfolding env bind_cxt id rhs_ty arity unf
+
| isExitJoinId id
- = return noUnfolding -- See Note [Do not inline exit join points] in GHC.Core.Opt.Exitify
+ = -- See Note [Do not inline exit join points] in GHC.Core.Opt.Exitify
+ return noUnfolding
+
+ | isJoinId id
+ , too_many_occs (idOccInfo id)
+ = -- This is a tricky one!
+ -- See wrinkle (JU1) in Note [Do not add unfoldings to join points at birth]
+ return noUnfolding
+
| otherwise
= -- Otherwise, we end up retaining all the SimpleEnv
let !opts = seUnfoldingOpts env
in mkLetUnfolding opts (bindContextLevel bind_cxt) VanillaSrc id new_rhs
+ where
+ too_many_occs (ManyOccs {}) = True
+ too_many_occs (OneOcc { occ_n_br = n }) = n > 10 -- See #23627
+ too_many_occs IAmDead = False
+ too_many_occs (IAmALoopBreaker {}) = False
+
-------------------
mkLetUnfolding :: UnfoldingOpts -> TopLevelFlag -> UnfoldingSource
-> InId -> OutExpr -> SimplM Unfolding
mkLetUnfolding !uf_opts top_lvl src id new_rhs
- = return (mkUnfolding uf_opts src is_top_lvl is_bottoming new_rhs Nothing)
- -- We make an unfolding *even for loop-breakers*.
- -- Reason: (a) It might be useful to know that they are WHNF
- -- (b) In GHC.Iface.Tidy we currently assume that, if we want to
- -- expose the unfolding then indeed we *have* an unfolding
- -- to expose. (We could instead use the RHS, but currently
- -- we don't.) The simple thing is always to have one.
+ = return (mkCoreUnfolding src is_top_lvl new_rhs Nothing guidance)
where
- -- Might as well force this, profiles indicate up to 0.5MB of thunks
- -- just from this site.
- !is_top_lvl = isTopLevel top_lvl
- -- See Note [Force bottoming field]
- !is_bottoming = isDeadEndId id
+ guidance = calcUnfoldingGuidance uf_opts (isJoinId id) is_top_bottoming new_rhs
+
+ -- Strict binding; profiles indicate up to 0.5MB of thunks
+ -- just from this site. See Note [Force bottoming field]
+ !is_top_lvl = isTopLevel top_lvl
+ !is_top_bottoming =is_top_lvl && isDeadEndId id
-------------------
simplStableUnfolding :: SimplEnv -> BindContext
@@ -4375,6 +4538,17 @@ Wrinkles
in GHC.Core.Opt.Simplify.Utils. We uphold this because the join-point
case (bind_cxt = BC_Join {}) doesn't use eta_expand.
+Note [Heavily used join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+After inining join points we can end up with
+ let $j x = <rhs>
+ in case x1 of
+ True -> case x2 of
+ True -> $j blah1
+ False -> $j blah2
+ False -> case x3 of ....
+with a huge case tree
+
Note [Force bottoming field]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We need to force bottoming, or the new unfolding holds
=====================================
compiler/GHC/Core/Opt/Simplify/Monad.hs
=====================================
@@ -219,7 +219,6 @@ newJoinId bndrs body_ty
join_arity = length bndrs
details = JoinId join_arity Nothing
id_info = vanillaIdInfo `setArityInfo` arity
--- `setOccInfo` strongLoopBreaker
; return (mkLocalVar details name ManyTy join_id_ty id_info) }
=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -162,9 +162,12 @@ data SimplCont
| CastIt -- (CastIt co K)[e] = K[ e `cast` co ]
- OutCoercion -- The coercion simplified
+ { sc_co :: OutCoercion -- The coercion simplified
-- Invariant: never an identity coercion
- SimplCont
+ , sc_opt :: Bool -- True <=> sc_co has had optCoercion applied to it
+ -- See Note [Avoid re-simplifying coercions]
+ -- in GHC.Core.Opt.Simplify.Iteration
+ , sc_cont :: SimplCont }
| ApplyToVal -- (ApplyToVal arg K)[e] = K[ e arg ]
{ sc_dup :: DupFlag -- See Note [DupFlag invariants]
@@ -272,8 +275,10 @@ instance Outputable SimplCont where
= text "Stop" <> brackets (sep $ punctuate comma pps) <+> ppr ty
where
pps = [ppr interesting] ++ [ppr eval_sd | eval_sd /= topSubDmd]
- ppr (CastIt co cont ) = (text "CastIt" <+> pprOptCo co) $$ ppr cont
- ppr (TickIt t cont) = (text "TickIt" <+> ppr t) $$ ppr cont
+ ppr (CastIt { sc_co = co, sc_cont = cont })
+ = (text "CastIt" <+> pprOptCo co) $$ ppr cont
+ ppr (TickIt t cont)
+ = (text "TickIt" <+> ppr t) $$ ppr cont
ppr (ApplyToTy { sc_arg_ty = ty, sc_cont = cont })
= (text "ApplyToTy" <+> pprParendType ty) $$ ppr cont
ppr (ApplyToVal { sc_arg = arg, sc_dup = dup, sc_cont = cont, sc_hole_ty = hole_ty })
@@ -284,9 +289,9 @@ instance Outputable SimplCont where
= (text "StrictBind" <+> ppr b) $$ ppr cont
ppr (StrictArg { sc_fun = ai, sc_cont = cont })
= (text "StrictArg" <+> ppr (ai_fun ai)) $$ ppr cont
- ppr (Select { sc_dup = dup, sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont })
+ ppr (Select { sc_dup = dup, sc_bndr = bndr, sc_alts = alts, sc_cont = cont })
= (text "Select" <+> ppr dup <+> ppr bndr) $$
- whenPprDebug (nest 2 $ vcat [ppr (seTvSubst se), ppr alts]) $$ ppr cont
+ whenPprDebug (nest 2 $ ppr alts) $$ ppr cont
{- Note [The hole type in ApplyToTy]
@@ -350,6 +355,7 @@ data ArgSpec
, as_hole_ty :: OutType } -- Type of the function (presumably forall a. blah)
| CastBy OutCoercion -- Cast by this; c.f. CastIt
+ -- Coercion is optimised
instance Outputable ArgInfo where
ppr (ArgInfo { ai_fun = fun, ai_args = args, ai_dmds = dmds })
@@ -412,7 +418,8 @@ pushSimplifiedArg env (ValArg { as_arg = arg, as_hole_ty = hole_ty }) cont
= ApplyToVal { sc_arg = arg, sc_env = env, sc_dup = Simplified
-- The SubstEnv will be ignored since sc_dup=Simplified
, sc_hole_ty = hole_ty, sc_cont = cont }
-pushSimplifiedArg _ (CastBy c) cont = CastIt c cont
+pushSimplifiedArg _ (CastBy c) cont
+ = CastIt { sc_co = c, sc_cont = cont, sc_opt = True }
argInfoExpr :: OutId -> [ArgSpec] -> OutExpr
-- NB: the [ArgSpec] is reversed so that the first arg
@@ -469,7 +476,7 @@ mkLazyArgStop ty fun_info = Stop ty (lazyArgContext fun_info) arg_sd
-------------------
contIsRhs :: SimplCont -> Maybe RecFlag
contIsRhs (Stop _ (RhsCtxt is_rec) _) = Just is_rec
-contIsRhs (CastIt _ k) = contIsRhs k -- For f = e |> co, treat e as Rhs context
+contIsRhs (CastIt { sc_cont = k }) = contIsRhs k -- For f = e |> co, treat e as Rhs context
contIsRhs _ = Nothing
-------------------
@@ -483,7 +490,7 @@ contIsDupable (ApplyToTy { sc_cont = k }) = contIsDupable k
contIsDupable (ApplyToVal { sc_dup = OkToDup }) = True -- See Note [DupFlag invariants]
contIsDupable (Select { sc_dup = OkToDup }) = True -- ...ditto...
contIsDupable (StrictArg { sc_dup = OkToDup }) = True -- ...ditto...
-contIsDupable (CastIt _ k) = contIsDupable k
+contIsDupable (CastIt { sc_cont = k }) = contIsDupable k
contIsDupable _ = False
-------------------
@@ -492,13 +499,13 @@ contIsTrivial (Stop {}) = True
contIsTrivial (ApplyToTy { sc_cont = k }) = contIsTrivial k
-- This one doesn't look right. A value application is not trivial
-- contIsTrivial (ApplyToVal { sc_arg = Coercion _, sc_cont = k }) = contIsTrivial k
-contIsTrivial (CastIt _ k) = contIsTrivial k
+contIsTrivial (CastIt { sc_cont = k }) = contIsTrivial k
contIsTrivial _ = False
-------------------
contResultType :: SimplCont -> OutType
contResultType (Stop ty _ _) = ty
-contResultType (CastIt _ k) = contResultType k
+contResultType (CastIt { sc_cont = k }) = contResultType k
contResultType (StrictBind { sc_cont = k }) = contResultType k
contResultType (StrictArg { sc_cont = k }) = contResultType k
contResultType (Select { sc_cont = k }) = contResultType k
@@ -509,7 +516,7 @@ contResultType (TickIt _ k) = contResultType k
contHoleType :: SimplCont -> OutType
contHoleType (Stop ty _ _) = ty
contHoleType (TickIt _ k) = contHoleType k
-contHoleType (CastIt co _) = coercionLKind co
+contHoleType (CastIt { sc_co = co }) = coercionLKind co
contHoleType (StrictBind { sc_bndr = b, sc_dup = dup, sc_env = se })
= perhapsSubstTy dup se (idType b)
contHoleType (StrictArg { sc_fun_ty = ty }) = funArgTy ty
@@ -529,7 +536,8 @@ contHoleType (Select { sc_dup = d, sc_bndr = b, sc_env = se })
-- case-of-case transformation.
contHoleScaling :: SimplCont -> Mult
contHoleScaling (Stop _ _ _) = OneTy
-contHoleScaling (CastIt _ k) = contHoleScaling k
+contHoleScaling (CastIt { sc_cont = k })
+ = contHoleScaling k
contHoleScaling (StrictBind { sc_bndr = id, sc_cont = k })
= idMult id `mkMultMul` contHoleScaling k
contHoleScaling (Select { sc_bndr = id, sc_cont = k })
@@ -548,14 +556,14 @@ countArgs :: SimplCont -> Int
-- and other values; skipping over casts.
countArgs (ApplyToTy { sc_cont = cont }) = 1 + countArgs cont
countArgs (ApplyToVal { sc_cont = cont }) = 1 + countArgs cont
-countArgs (CastIt _ cont) = countArgs cont
+countArgs (CastIt { sc_cont = cont }) = countArgs cont
countArgs _ = 0
countValArgs :: SimplCont -> Int
-- Count value arguments only
countValArgs (ApplyToTy { sc_cont = cont }) = countValArgs cont
countValArgs (ApplyToVal { sc_cont = cont }) = 1 + countValArgs cont
-countValArgs (CastIt _ cont) = countValArgs cont
+countValArgs (CastIt { sc_cont = cont }) = countValArgs cont
countValArgs _ = 0
-------------------
@@ -575,7 +583,7 @@ contArgs cont
go args (ApplyToVal { sc_arg = arg, sc_env = se, sc_cont = k })
= go (is_interesting arg se : args) k
go args (ApplyToTy { sc_cont = k }) = go args k
- go args (CastIt _ k) = go args k
+ go args (CastIt { sc_cont = k }) = go args k
go args k = (False, reverse args, k)
is_interesting arg se = interestingArg se arg
@@ -594,10 +602,10 @@ contArgs cont
-- about what to do then and no call sites so far seem to care.
contEvalContext :: SimplCont -> SubDemand
contEvalContext k = case k of
- (Stop _ _ sd) -> sd
- (TickIt _ k) -> contEvalContext k
- (CastIt _ k) -> contEvalContext k
- ApplyToTy{sc_cont=k} -> contEvalContext k
+ Stop _ _ sd -> sd
+ TickIt _ k -> contEvalContext k
+ CastIt { sc_cont = k } -> contEvalContext k
+ ApplyToTy{ sc_cont = k } -> contEvalContext k
-- ApplyToVal{sc_cont=k} -> mkCalledOnceDmd $ contEvalContext k
-- Not 100% sure that's correct, . Here's an example:
-- f (e x) and f :: <SC(S,C(1,L))>
@@ -881,7 +889,7 @@ interestingCallContext env cont
interesting (Stop _ cci _) = cci
interesting (TickIt _ k) = interesting k
interesting (ApplyToTy { sc_cont = k }) = interesting k
- interesting (CastIt _ k) = interesting k
+ interesting (CastIt { sc_cont = k }) = interesting k
-- If this call is the arg of a strict function, the context
-- is a bit interesting. If we inline here, we may get useful
-- evaluation information to avoid repeated evals: e.g.
@@ -921,7 +929,7 @@ contHasRules cont
where
go (ApplyToVal { sc_cont = cont }) = go cont
go (ApplyToTy { sc_cont = cont }) = go cont
- go (CastIt _ cont) = go cont
+ go (CastIt { sc_cont = cont }) = go cont
go (StrictArg { sc_fun = fun }) = ai_encl fun
go (Stop _ RuleArgCtxt _) = True
go (TickIt _ c) = go c
@@ -1514,15 +1522,14 @@ rules] for details.
postInlineUnconditionally
:: SimplEnv -> BindContext
- -> OutId -- The binder (*not* a CoVar), including its unfolding
- -> OccInfo -- From the InId
+ -> InId -> OutId -- The binder (*not* a CoVar), including its unfolding
-> OutExpr
-> Bool
-- Precondition: rhs satisfies the let-can-float invariant
-- See Note [Core let-can-float invariant] in GHC.Core
-- Reason: we don't want to inline single uses, or discard dead bindings,
-- for unlifted, side-effect-ful bindings
-postInlineUnconditionally env bind_cxt bndr occ_info rhs
+postInlineUnconditionally env bind_cxt old_bndr bndr rhs
| not active = False
| isWeakLoopBreaker occ_info = False -- If it's a loop-breaker of any kind, don't inline
-- because it might be referred to "earlier"
@@ -1530,27 +1537,23 @@ postInlineUnconditionally env bind_cxt bndr occ_info rhs
| isTopLevel (bindContextLevel bind_cxt)
= False -- Note [Top level and postInlineUnconditionally]
| exprIsTrivial rhs = True
- | BC_Join {} <- bind_cxt -- See point (1) of Note [Duplicating join points]
- , not (phase == FinalPhase) = False -- in Simplify.hs
+ | BC_Join {} <- bind_cxt = False -- See point (1) of Note [Duplicating join points]
+-- , not (phase == FinalPhase) = False -- in Simplify.hs
| otherwise
= case occ_info of
OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt, occ_n_br = n_br }
-- See Note [Inline small things to avoid creating a thunk]
- -> n_br < 100 -- See Note [Suppress exponential blowup]
+ | let not_inside_lam = in_lam == NotInsideLam
+ -> n_br < 100 -- See #23627
- && smallEnoughToInline uf_opts unfolding -- Small enough to dup
- -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
- --
- -- NB: Do NOT inline arbitrarily big things, even if occ_n_br=1
- -- Reason: doing so risks exponential behaviour. We simplify a big
- -- expression, inline it, and simplify it again. But if the
- -- very same thing happens in the big expression, we get
- -- exponential cost!
- -- PRINCIPLE: when we've already simplified an expression once,
- -- make sure that we only inline it if it's reasonably small.
-
- && (in_lam == NotInsideLam ||
+ && ( (n_br == 1 && not_inside_lam)
+ -- See Note [Post-inline for single-use things]
+ || (is_lazy && smallEnoughToInline uf_opts unfolding))
+ -- Lazy, and small enough to dup
+ -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
+
+ && (not_inside_lam ||
-- Outside a lambda, we want to be reasonably aggressive
-- about inlining into multiple branches of case
-- e.g. let x = <non-value>
@@ -1571,19 +1574,9 @@ postInlineUnconditionally env bind_cxt bndr occ_info rhs
_ -> False
--- Here's an example that we don't handle well:
--- let f = if b then Left (\x.BIG) else Right (\y.BIG)
--- in \y. ....case f of {...} ....
--- Here f is used just once, and duplicating the case work is fine (exprIsCheap).
--- But
--- - We can't preInlineUnconditionally because that would invalidate
--- the occ info for b.
--- - We can't postInlineUnconditionally because the RHS is big, and
--- that risks exponential behaviour
--- - We can't call-site inline, because the rhs is big
--- Alas!
-
where
+ is_lazy = not (isStrictId bndr)
+ occ_info = idOccInfo old_bndr
unfolding = idUnfolding bndr
uf_opts = seUnfoldingOpts env
phase = sePhase env
@@ -1608,37 +1601,51 @@ in allocation if you miss this out. And bits of GHC itself start
to allocate more. An egregious example is test perf/compiler/T14697,
where GHC.Driver.CmdLine.$wprocessArgs allocated hugely more.
-Note [Suppress exponential blowup]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In #13253, and several related tickets, we got an exponential blowup
-in code size from postInlineUnconditionally. The trouble comes when
-we have
- let j1a = case f y of { True -> p; False -> q }
- j1b = case f y of { True -> q; False -> p }
- j2a = case f (y+1) of { True -> j1a; False -> j1b }
- j2b = case f (y+1) of { True -> j1b; False -> j1a }
- ...
- in case f (y+10) of { True -> j10a; False -> j10b }
-
-when there are many branches. In pass 1, postInlineUnconditionally
-inlines j10a and j10b (they are both small). Now we have two calls
-to j9a and two to j9b. In pass 2, postInlineUnconditionally inlines
-all four of these calls, leaving four calls to j8a and j8b. Etc.
-Yikes! This is exponential!
-
-A possible plan: stop doing postInlineUnconditionally
-for some fixed, smallish number of branches, say 4. But that turned
-out to be bad: see Note [Inline small things to avoid creating a thunk].
-And, as it happened, the problem with #13253 was solved in a
-different way (Note [Duplicating StrictArg] in Simplify).
-
-So I just set an arbitrary, high limit of 100, to stop any
-totally exponential behaviour.
-
-This still leaves the nasty possibility that /ordinary/ inlining (not
-postInlineUnconditionally) might inline these join points, each of
-which is individually quiet small. I'm still not sure what to do
-about this (e.g. see #15488).
+Note [Post-inline for single-use things]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If we have
+
+ let x = rhs in ...x...
+
+and `x` is used exactly once, and not inside a lambda, then we will usually
+preInlineUnconditinally. But we can still get this situation in
+postInlineUnconditionally:
+
+ case K rhs of K x -> ...x....
+
+Here we'll use `simplAuxBind` to bind `x` to (the already-simplified) `rhs`;
+and `x` is used exactly once. It's beneficial to inline right away; otherwise
+we risk creating
+
+ let x = rhs in ...x...
+
+which will take another iteration of the Simplifier to eliminate. We do this in
+two places
+
+1. In the full `postInlineUnconditionally` look for the special case
+ of "one occurrence, not under a lambda", and inline unconditionally then.
+
+ This is a bit risky: see Note [Avoiding simplifying repeatedly] in
+ Simplify.Iteration. But in practice it seems to be a small win.
+
+2. `simplAuxBind` does a kind of poor-man's `postInlineUnconditionally`. It
+ does not need to account for many of the cases (e.g. top level) that the
+ full `postInlineUnconditionally` does. Moreover, we don't have an
+ OutId, which `postInlineUnconditionally` needs. I got a slight improvement
+ in compiler performance when I added this test.
+
+Here's an example that we don't currently handle well:
+ let f = if b then Left (\x.BIG) else Right (\y.BIG)
+ in \y. ....case f of {...} ....
+Here f is used just once, and duplicating the case work is fine (exprIsCheap).
+But
+ - We can't preInlineUnconditionally because that would invalidate
+ the occ info for b.
+ - We can't postInlineUnconditionally because the RHS is big, and
+ that risks exponential behaviour
+ - We can't call-site inline, because the rhs is big
+Alas!
+
Note [Top level and postInlineUnconditionally]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=====================================
compiler/GHC/Core/SimpleOpt.hs
=====================================
@@ -512,6 +512,12 @@ do_beta_by_substitution bndr rhs
= exprIsTrivial rhs -- Can duplicate
|| safe_to_inline (idOccInfo bndr) -- Occurs at most once
+do_case_elim :: CoreExpr -> Id -> [Id] -> Bool
+do_case_elim scrut case_bndr alt_bndrs
+ = exprIsHNF scrut
+ && safe_to_inline (idOccInfo case_bndr)
+ && all isDeadBinder alt_bndrs
+
-------------------
simple_out_bind :: TopLevelFlag
-> SimpleOptEnv
@@ -1290,13 +1296,17 @@ exprIsConApp_maybe ise@(ISE in_scope id_unf) expr
in go subst' (float:floats) expr cont
go subst floats (Case scrut b _ [Alt con vars expr]) cont
+ | do_case_elim scrut' b vars -- See Note [Case elim in exprIsConApp_maybe]
+ = go (extend subst b scrut') floats expr cont
+ | otherwise
= let
- scrut' = subst_expr subst scrut
(subst', b') = subst_bndr subst b
(subst'', vars') = subst_bndrs subst' vars
float = FloatCase scrut' b' con vars'
in
go subst'' (float:floats) expr cont
+ where
+ scrut' = subst_expr subst scrut
go (Right sub) floats (Var v) cont
= go (Left (getSubstInScope sub))
@@ -1417,6 +1427,27 @@ dealWithStringLiteral fun str co =
in pushCoDataCon consDataCon [Type charTy, char_expr, rest] co
{-
+Note [Case elim in exprIsConApp_maybe]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+ data K a = MkK !a
+
+ $WMkK x = case x of y -> K y -- Wrapper for MkK
+
+ ...case $WMkK v of K w -> <rhs>
+
+We call `exprIsConApp_maybe` on ($WMkK v); we inline the wrapper
+and beta-reduce, so we get to
+ exprIsConApp_maybe (case v of y -> K y)
+
+So we may float the case, and end up with
+ case v of y -> <rhs>[y/w]
+
+But if `v` is already evaluated, the next run of the Simplifier will
+eliminate the case, and we may then make more progress with <rhs>.
+Better to do it in one iteration. Hence the `do_case_elim`
+check in `exprIsConApp_maybe`.
+
Note [Unfolding DFuns]
~~~~~~~~~~~~~~~~~~~~~~
DFuns look like
=====================================
compiler/GHC/Core/TyCo/Rep.hs
=====================================
@@ -152,7 +152,7 @@ data Type
-- for example unsaturated type synonyms
-- can appear as the right hand side of a type synonym.
- | ForAllTy
+ | ForAllTy -- See Note [Weird typing rule for ForAllTy]
{-# UNPACK #-} !ForAllTyBinder
Type -- ^ A Π type.
-- Note [When we quantify over a coercion variable]
@@ -938,9 +938,15 @@ instance Outputable Coercion where
ppr = pprCo
instance Outputable CoSel where
- ppr (SelTyCon n _r) = text "Tc" <> parens (int n)
- ppr SelForAll = text "All"
- ppr (SelFun fs) = text "Fun" <> parens (ppr fs)
+ ppr (SelTyCon n r) = text "Tc" <> parens (int n <> comma <> pprOneCharRole r)
+ ppr SelForAll = text "All"
+ ppr (SelFun fs) = text "Fun" <> parens (ppr fs)
+
+
+pprOneCharRole :: Role -> SDoc
+pprOneCharRole Nominal = char 'N'
+pprOneCharRole Representational = char 'R'
+pprOneCharRole Phantom = char 'P'
instance Outputable FunSel where
ppr SelMult = text "mult"
@@ -1054,7 +1060,7 @@ SelTyCon, SelForAll, and SelFun.
r = tyConRole tc r0 i
i < n (i is zero-indexed)
----------------------------------
- SelCo (SelTyCon i r) : si ~r ti
+ SelCo (SelTyCon i r) co : si ~r ti
"Not a newtype": see Note [SelCo and newtypes]
"Not an arrow type": see SelFun below
@@ -1074,17 +1080,17 @@ SelTyCon, SelForAll, and SelFun.
co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
r = funRole r0 SelMult
----------------------------------
- SelCo (SelFun SelMult) : m1 ~r m2
+ SelCo (SelFun SelMult) co : m1 ~r m2
co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
r = funRole r0 SelArg
----------------------------------
- SelCo (SelFun SelArg) : s1 ~r s2
+ SelCo (SelFun SelArg) co : s1 ~r s2
co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
r = funRole r0 SelRes
----------------------------------
- SelCo (SelFun SelRes) : t1 ~r t2
+ SelCo (SelFun SelRes) co : t1 ~r t2
Note [FunCo]
~~~~~~~~~~~~
@@ -1167,11 +1173,11 @@ because the kinds of the bound tyvars can be different.
The typing rule is:
- kind_co : k1 ~ k2
- tv1:k1 |- co : t1 ~ t2
+ kind_co : k1 ~N k2
+ tv1:k1 |- co : t1 ~r t2
-------------------------------------------------------------------
- ForAllCo tv1 kind_co co : all tv1:k1. t1 ~
- all tv1:k2. (t2[tv1 |-> tv1 |> sym kind_co])
+ ForAllCo tv1 kind_co co : all tv1:k1. t1 ~r
+ all tv1:k2. (t2[tv1 := (tv1 |> sym kind_co)])
First, the TyCoVar stored in a ForAllCo is really an optimisation: this field
should be a Name, as its kind is redundant. Thinking of the field as a Name
=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -256,17 +256,18 @@ inlineBoringOk e
calcUnfoldingGuidance
:: UnfoldingOpts
+ -> Bool -- This is a join point
-> Bool -- Definitely a top-level, bottoming binding
-> CoreExpr -- Expression to look at
-> UnfoldingGuidance
-calcUnfoldingGuidance opts is_top_bottoming (Tick t expr)
+calcUnfoldingGuidance opts is_join is_top_bottoming (Tick t expr)
| not (tickishIsCode t) -- non-code ticks don't matter for unfolding
- = calcUnfoldingGuidance opts is_top_bottoming expr
-calcUnfoldingGuidance opts is_top_bottoming expr
+ = calcUnfoldingGuidance opts is_join is_top_bottoming expr
+calcUnfoldingGuidance opts is_join is_top_bottoming expr
= case sizeExpr opts bOMB_OUT_SIZE val_bndrs body of
TooBig -> UnfNever
SizeIs size cased_bndrs scrut_discount
- | uncondInline expr n_val_bndrs size
+ | uncondInline is_join expr n_val_bndrs size
-> UnfWhen { ug_unsat_ok = unSaturatedOk
, ug_boring_ok = boringCxtOk
, ug_arity = n_val_bndrs } -- Note [INLINE for small functions]
@@ -432,11 +433,12 @@ sharing the wrapper closure.
The solution: don’t ignore coercion arguments after all.
-}
-uncondInline :: CoreExpr -> Arity -> Int -> Bool
+uncondInline :: Bool -> CoreExpr -> Arity -> Int -> Bool
-- Inline unconditionally if there no size increase
-- Size of call is arity (+1 for the function)
-- See Note [INLINE for small functions]
-uncondInline rhs arity size
+uncondInline is_join rhs arity size
+ | is_join = size < 10
| arity > 0 = size <= 10 * (arity + 1) -- See Note [INLINE for small functions] (1)
| otherwise = exprIsTrivial rhs -- See Note [INLINE for small functions] (4)
@@ -594,6 +596,7 @@ sizeExpr opts !bOMB_OUT_SIZE top_args expr
DataConWorkId dc -> conSize dc (length val_args)
PrimOpId op _ -> primOpSize op (length val_args)
ClassOpId {} -> classOpSize opts top_args val_args
+ JoinId {} -> sizeZero -- See Note [Inlining join points]
_ -> funSize opts top_args fun (length val_args) voids
------------
@@ -685,6 +688,7 @@ callSize n_val_args voids = 10 * (1 + n_val_args - voids)
-- Add 1 for each non-trivial arg;
-- the allocation cost, as in let(rec)
+{-
-- | The size of a jump to a join point
jumpSize
:: Int -- ^ number of value args
@@ -695,6 +699,7 @@ jumpSize n_val_args voids = 2 * (1 + n_val_args - voids)
-- bug #6048, but making them any more expensive loses a 21% improvement in
-- spectral/puzzle. TODO Perhaps adjusting the default threshold would be a
-- better solution?
+-}
funSize :: UnfoldingOpts -> [Id] -> Id -> Int -> Int -> ExprSize
-- Size for functions that are not constructors or primops
@@ -705,9 +710,9 @@ funSize opts top_args fun n_val_args voids
| otherwise = SizeIs size arg_discount res_discount
where
some_val_args = n_val_args > 0
- is_join = isJoinId fun
+-- is_join = isJoinId fun
- size | is_join = jumpSize n_val_args voids
+ size -- | is_join = jumpSize n_val_args voids
| not some_val_args = 0
| otherwise = callSize n_val_args voids
@@ -772,6 +777,21 @@ win", but its terribly dangerous because a function with many many
case branches, each finishing with a constructor, can have an
arbitrarily large discount. This led to terrible code bloat: see #6099.
+Note [Inlining join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+ join j1 a b c d = blah
+ join j2 x = j1 x v x w
+ in ...(jump j2 t)....
+
+Then j1 is just an indirection to j1 with a bit of argument shuffling.
+We want to inline it even though it has more arguments:
+ join j1 a b c d = blah
+ in ...(jump j1 t v t w)...
+
+So we charge nothing for join-point calls; a bit like we make constructor
+applications cheap (see Note [Constructor size and result discount]).
+
Note [Unboxed tuple size and result discount]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
However, unboxed tuples count as size zero. I found occasions where we had
=====================================
compiler/GHC/Core/Unfold/Make.hs
=====================================
@@ -117,7 +117,7 @@ mkWorkerUnfolding opts work_fn
= mkCoreUnfolding src top_lvl new_tmpl Nothing guidance
where
new_tmpl = simpleOptExpr opts (work_fn tmpl)
- guidance = calcUnfoldingGuidance (so_uf_opts opts) False new_tmpl
+ guidance = calcUnfoldingGuidance (so_uf_opts opts) False False new_tmpl
mkWorkerUnfolding _ _ _ = noUnfolding
@@ -328,7 +328,7 @@ mkUnfolding opts src top_lvl is_bottoming expr cache
= mkCoreUnfolding src top_lvl expr cache guidance
where
is_top_bottoming = top_lvl && is_bottoming
- guidance = calcUnfoldingGuidance opts is_top_bottoming expr
+ guidance = calcUnfoldingGuidance opts False is_top_bottoming expr
-- NB: *not* (calcUnfoldingGuidance (occurAnalyseExpr expr))!
-- See Note [Calculate unfolding guidance on the non-occ-anal'd expression]
=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -263,41 +263,28 @@ mkPiMCo v (MCo co) = MCo (mkPiCo Representational v co)
-- | Wrap the given expression in the coercion safely, dropping
-- identity coercions and coalescing nested coercions
mkCast :: HasDebugCallStack => CoreExpr -> CoercionR -> CoreExpr
-mkCast e co
- | assertPpr (coercionRole co == Representational)
- (text "coercion" <+> ppr co <+> text "passed to mkCast"
- <+> ppr e <+> text "has wrong role" <+> ppr (coercionRole co)) $
- isReflCo co
- = e
-
-mkCast (Coercion e_co) co
- | isCoVarType (coercionRKind co)
- -- The guard here checks that g has a (~#) on both sides,
- -- otherwise decomposeCo fails. Can in principle happen
- -- with unsafeCoerce
- = Coercion (mkCoCast e_co co)
-
-mkCast (Cast expr co2) co
- = warnPprTrace (let { from_ty = coercionLKind co;
- to_ty2 = coercionRKind co2 } in
- not (from_ty `eqType` to_ty2))
- "mkCast"
- (vcat ([ text "expr:" <+> ppr expr
- , text "co2:" <+> ppr co2
- , text "co:" <+> ppr co ])) $
- mkCast expr (mkTransCo co2 co)
-
-mkCast (Tick t expr) co
- = Tick t (mkCast expr co)
mkCast expr co
- = let from_ty = coercionLKind co in
- warnPprTrace (not (from_ty `eqType` exprType expr))
+ = assertPpr (coercionRole co == Representational)
+ (text "coercion" <+> ppr co <+> text "passed to mkCast"
+ <+> ppr expr <+> text "has wrong role" <+> ppr (coercionRole co)) $
+ warnPprTrace (not (coercionLKind co `eqType` exprType expr))
"Trying to coerce" (text "(" <> ppr expr
$$ text "::" <+> ppr (exprType expr) <> text ")"
$$ ppr co $$ ppr (coercionType co)
$$ callStackDoc) $
- (Cast expr co)
+ case expr of
+ Cast expr co2 -> mkCast expr (mkTransCo co2 co)
+ Tick t expr -> Tick t (mkCast expr co)
+
+ Coercion e_co | isCoVarType (coercionRKind co)
+ -- The guard here checks that g has a (~#) on both sides,
+ -- otherwise decomposeCo fails. Can in principle happen
+ -- with unsafeCoerce
+ -> Coercion (mkCoCast e_co co)
+
+ _ | isReflCo co -> expr
+ | otherwise -> Cast expr co
{- *********************************************************************
=====================================
compiler/GHC/HsToCore/Pmc/Solver/Types.hs
=====================================
@@ -325,6 +325,11 @@ lookupVarInfoNT ts x = case lookupVarInfo ts x of
go _ = Nothing
trvVarInfo :: Functor f => (VarInfo -> f (a, VarInfo)) -> Nabla -> Id -> f (a, Nabla)
+{-# INLINE trvVarInfo #-}
+-- This function is called a lot and we want to specilise it, not only
+-- for the type class, but also for its 'f' function argument.
+-- Before the INLINE pragma it sometimes inlined and sometimes didn't,
+-- depending delicately on GHC's optimisations. Better to use a pragma.
trvVarInfo f nabla at MkNabla{ nabla_tm_st = ts at TmSt{ts_facts = env} } x
= set_vi <$> f (lookupVarInfo ts x)
where
=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -1872,8 +1872,8 @@ ppr_co _ (IfaceUnivCo prov role ty1 ty2)
ppr_co ctxt_prec (IfaceInstCo co ty)
= maybeParen ctxt_prec appPrec $
- text "Inst" <+> pprParendIfaceCoercion co
- <+> pprParendIfaceCoercion ty
+ text "Inst" <+> sep [ pprParendIfaceCoercion co
+ , pprParendIfaceCoercion ty ]
ppr_co ctxt_prec (IfaceAxiomRuleCo tc cos)
= maybeParen ctxt_prec appPrec $ ppr tc <+> parens (interpp'SP cos)
=====================================
compiler/GHC/IfaceToCore.hs
=====================================
@@ -1778,7 +1778,7 @@ tcUnfolding toplvl name _ info (IfCoreUnfold src cache if_guidance if_expr)
; expr <- tcUnfoldingRhs (isCompulsorySource src) toplvl name if_expr
; let guidance = case if_guidance of
IfWhen arity unsat_ok boring_ok -> UnfWhen arity unsat_ok boring_ok
- IfNoGuidance -> calcUnfoldingGuidance uf_opts is_top_bottoming expr
+ IfNoGuidance -> calcUnfoldingGuidance uf_opts False is_top_bottoming expr
-- See Note [Tying the 'CoreUnfolding' knot]
; return $ mkCoreUnfolding src True expr (Just cache) guidance }
where
=====================================
testsuite/tests/perf/compiler/T15630.hs
=====================================
@@ -1,5 +1,28 @@
module T15630 where
+{- This is a fansastic test cose.
+
+* It scales really easily (just add or remove fields).
+
+* It can demonstrate massive (exponental) blow up if you get inlining
+ for join points wrong.
+
+* I found that a more monomorphic variant, T15630a, tickled a very similar
+ exponential -blowup, but somehow in a slighlty different way. To be specific,
+ at the time of writing, HEAD was fine on T15630, but blew up on T15630a.
+ So both tests are valuable.
+
+* Also worth noting: even if it doesn't blow up, it can result in two
+ very different programs. Below are the good and bad versions for 5
+ fields. Note that the good version passes Maybes to the join points,
+ the ultimate values of the fields. But the bad version passes an
+ accumulating *function* to the join points. Lots of PAPs much less
+ efficient.
+
+See Note [Do not add unfoldings to join points at birth] in
+GHc.Core.Opt.Simplify.Iteration.
+-}
+
data IValue = IDefault
| IInt Int
| IBlob String
@@ -54,3 +77,220 @@ getMenuItem vs = fst $ (pure TestStructure, vs)
<+> (getInt ?)
<+> (getInt ?)
<+> (getString ?)
+
+
+{-
+------------- The good version (5 fields) ----------------
+getMenuItem
+ = \ (vs_az6 :: [IValue]) ->
+ case vs_az6 of {
+ [] -> case T15630.<+>1 of wild1_00 { };
+ : v_az3 vs1_az4 ->
+ case vs1_az4 of {
+ [] -> case T15630.<+>1 of wild2_00 { };
+ : v1_X4 vs2_X5 ->
+ case vs2_X5 of {
+ [] -> case T15630.<+>1 of wild3_00 { };
+ : v2_X7 vs3_X8 ->
+ case vs3_X8 of {
+ [] -> case T15630.<+>1 of wild4_00 { };
+ : v3_Xa vs4_Xb ->
+ case vs4_Xb of {
+ [] -> case T15630.<+>1 of wild5_00 { };
+ : v4_Xd vs5_Xe ->
+ case v_az3 of {
+ __DEFAULT -> T15630.getMenuItem1;
+ IInt i_ayQ ->
+ join {
+ $j_sPO [Dmd=MC(1,L)] :: Maybe String -> Either () TestStructure
+ [LclId[JoinId(1)(Nothing)], Arity=1, Str=<L>, Unf=OtherCon []]
+ $j_sPO (y_Xf [OS=OneShot] :: Maybe String)
+ = join {
+ $j1_sPR [Dmd=MC(1,L)] :: Maybe Int -> Either () TestStructure
+ [LclId[JoinId(1)(Nothing)], Arity=1, Str=<L>, Unf=OtherCon []]
+ $j1_sPR (y1_Xg [OS=OneShot] :: Maybe Int)
+ = case v3_Xa of {
+ IDefault ->
+ case v4_Xd of {
+ IDefault ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (T15630.TestStructure
+ i_ayQ
+ y_Xf
+ y1_Xg
+ (Nothing @String)
+ (Nothing @Int));
+ IInt i1_Xk ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (T15630.TestStructure
+ i_ayQ
+ y_Xf
+ y1_Xg
+ (Nothing @String)
+ (Just @Int i1_Xk));
+ IBlob ipv_sPo -> T15630.getMenuItem1
+ };
+ IInt ipv_sPm -> T15630.getMenuItem1;
+ IBlob b_ayW ->
+ case v4_Xd of {
+ IDefault ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (T15630.TestStructure
+ i_ayQ
+ y_Xf
+ y1_Xg
+ (Just @String b_ayW)
+ (Nothing @Int));
+ IInt i1_Xk ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (T15630.TestStructure
+ i_ayQ
+ y_Xf
+ y1_Xg
+ (Just @String b_ayW)
+ (Just @Int i1_Xk));
+ IBlob ipv_sPo -> T15630.getMenuItem1
+ }
+ } } in
+ case v2_X7 of {
+ IDefault -> jump $j1_sPR (Nothing @Int);
+ IInt i1_Xi -> jump $j1_sPR (Just @Int i1_Xi);
+ IBlob ipv_sPk -> T15630.getMenuItem1
+ } } in
+ case v1_X4 of {
+ IDefault -> jump $j_sPO (Nothing @String);
+ IInt ipv_sPi -> T15630.getMenuItem1;
+ IBlob b_ayW -> jump $j_sPO (Just @String b_ayW)
+ }}}}}}}
+
+
+------------- The bad version ----------------
+getMenuItem
+ = \ (vs_azD :: [IValue]) ->
+ case vs_azD of {
+ [] -> case T15630.<+>1 of wild1_00 { };
+ : v_azA vs1_azB ->
+ case vs1_azB of {
+ [] -> case T15630.<+>1 of wild2_00 { };
+ : v1_X5 vs2_X6 ->
+ case vs2_X6 of {
+ [] -> case T15630.<+>1 of wild3_00 { };
+ : v2_X9 vs3_Xa ->
+ case vs3_Xa of {
+ [] -> case T15630.<+>1 of wild4_00 { };
+ : v3_Xd vs4_Xe ->
+ case vs4_Xe of {
+ [] -> case T15630.<+>1 of wild5_00 { };
+ : v4_Xh vs5_Xi ->
+ case v_azA of {
+ __DEFAULT -> T15630.getMenuItem1;
+ IInt i_azn ->
+ join {
+ $j_sQw [Dmd=MC(1,L)]
+ :: (Maybe String -> Maybe Int -> TestStructure)
+ -> Either () TestStructure
+ [LclId[JoinId(1)(Nothing)],
+ Arity=1,
+ Str=<MC(1,C(1,L))>,
+ Unf=OtherCon []]
+ $j_sQw (f_aPr [OS=OneShot]
+ :: Maybe String -> Maybe Int -> TestStructure)
+ = case v3_Xd of {
+ IDefault ->
+ case v4_Xh of {
+ IDefault ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (f_aPr
+ (Nothing @String)
+ (Nothing @Int));
+ IInt i1_Xl ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (f_aPr
+ (Nothing @String)
+ (Just @Int i1_Xl));
+ IBlob ipv_sPM -> T15630.getMenuItem1
+ };
+ IInt ipv_sPK -> T15630.getMenuItem1;
+ IBlob b_azt ->
+ case v4_Xh of {
+ IDefault ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (f_aPr
+ (Just @String b_azt)
+ (Nothing @Int));
+ IInt i1_Xl ->
+ Data.Either.Right
+ @()
+ @TestStructure
+ (f_aPr
+ (Just @String b_azt)
+ (Just @Int i1_Xl));
+ IBlob ipv_sPM -> T15630.getMenuItem1
+ }
+ } } in
+ case v1_X5 of {
+ IDefault ->
+ case v2_X9 of {
+ IDefault ->
+ jump $j_sQw
+ (\ (ds_dNN [OS=OneShot] :: Maybe String)
+ (ds1_dNO [OS=OneShot] :: Maybe Int) ->
+ T15630.TestStructure
+ i_azn
+ (Nothing @String)
+ (Nothing @Int)
+ ds_dNN
+ ds1_dNO);
+ IInt i1_Xk ->
+ jump $j_sQw
+ (\ (ds_dNN [OS=OneShot] :: Maybe String)
+ (ds1_dNO [OS=OneShot] :: Maybe Int) ->
+ T15630.TestStructure
+ i_azn
+ (Nothing @String)
+ (Just @Int i1_Xk)
+ ds_dNN
+ ds1_dNO);
+ IBlob ipv_sPI -> T15630.getMenuItem1
+ };
+ IInt ipv_sPG -> T15630.getMenuItem1;
+ IBlob b_azt ->
+ case v2_X9 of {
+ IDefault ->
+ jump $j_sQw
+ (\ (ds_Xl [OS=OneShot] :: Maybe String)
+ (ds1_Xm [OS=OneShot] :: Maybe Int) ->
+ T15630.TestStructure
+ i_azn
+ (Just @String b_azt)
+ (Nothing @Int)
+ ds_Xl
+ ds1_Xm);
+ IInt i1_Xk ->
+ jump $j_sQw
+ (\ (ds_Xm [OS=OneShot] :: Maybe String)
+ (ds1_Xn [OS=OneShot] :: Maybe Int) ->
+ T15630.TestStructure
+ i_azn
+ (Just @String b_azt)
+ (Just @Int i1_Xk)
+ ds_Xm
+ ds1_Xn);
+ IBlob ipv_sPI -> T15630.getMenuItem1
+ }}}}}}}}
+
+-}
=====================================
testsuite/tests/perf/compiler/T15630a.hs
=====================================
@@ -0,0 +1,64 @@
+module T15630a where
+
+data IValue = IDefault
+ | IInt Int
+ | IBlob String
+
+(?) :: (IValue -> Either x a) -> IValue -> Either x (Maybe a)
+-- With this NOINLINE pragma we get good behaviour, but disastrous without
+-- {-# NOINLINE [0] (?) #-}
+(?) _ IDefault = pure Nothing
+(?) p x = Just <$> p x
+
+getInt :: IValue -> Either () Int
+{-# NOINLINE getInt #-}
+getInt (IInt i) = Right i
+getInt v = Left ()
+
+getString :: IValue -> Either () String
+{-# NOINLINE getString #-}
+getString (IBlob b) = Right $ b
+getString v = Left ()
+
+(<+>) :: (Either x (a -> b), [IValue]) -> (IValue -> Either x a) -> (Either x b, [IValue])
+(<+>) (f, (v:vs)) p = (f <*> (p v), vs)
+
+data TestStructure = TestStructure
+ { _param1 :: Int
+ , _param2 :: Maybe String
+ , _param3 :: Maybe Int
+ , _param4 :: Maybe String
+ , _param5 :: Maybe Int
+ , _param6 :: Maybe Int
+
+ , _param7 :: Maybe String
+ , _param8 :: Maybe String
+ , _param9 :: Maybe Int
+ , _param10 :: Maybe Int
+ , _param11 :: Maybe String
+ , _param12 :: Maybe String
+ , _param13 :: Maybe Int
+ , _param14 :: Maybe Int
+ , _param15 :: Maybe String
+
+ }
+
+getMenuItem :: [IValue] -> Either () TestStructure
+getMenuItem vs = fst $ (pure TestStructure, vs)
+ <+> getInt
+ <+> (getString ?)
+ <+> (getInt ?)
+ <+> (getString ?)
+ <+> (getInt ?)
+ <+> (getInt ?)
+
+ <+> (getString ?)
+ <+> (getString ?)
+ <+> (getInt ?)
+ <+> (getInt ?)
+ <+> (getString ?)
+ <+> (getString ?)
+ <+> (getInt ?)
+ <+> (getInt ?)
+ <+> (getString ?)
+
=====================================
testsuite/tests/simplCore/should_compile/T18730.hs → testsuite/tests/perf/compiler/T18730.hs
=====================================
=====================================
testsuite/tests/simplCore/should_compile/T18730_A.hs → testsuite/tests/perf/compiler/T18730_A.hs
=====================================
=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -202,6 +202,15 @@ test('CoOpt_Singletons',
#########
+# Moved from simplCore/should_compile
+test('T18730',
+ [ only_ways(['optasm'])
+ , collect_compiler_stats('bytes allocated',1)
+ , extra_files(['T8730_aux.hs'])
+ ],
+ multimod_compile,
+ ['T18730_A', '-v0 -O'])
+
test ('LargeRecord',
[ only_ways(['normal']),
collect_compiler_stats('bytes allocated',1)
@@ -527,6 +536,11 @@ test('T15630',
],
compile,
['-O2'])
+test('T15630a',
+ [collect_compiler_stats()
+ ],
+ compile,
+ ['-O2'])
# See https://gitlab.haskell.org/ghc/ghc/merge_requests/312#note_186960
test ('WWRec',
=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -341,7 +341,6 @@ test('T18603', normal, compile, ['-dcore-lint -O'])
# T18649 should /not/ generate a specialisation rule
test('T18649', normal, compile, ['-O -ddump-rules -Wno-simplifiable-class-constraints'])
-test('T18730', normal, multimod_compile, ['T18730_A', '-dcore-lint -O'])
test('T18747A', normal, compile, [''])
test('T18747B', normal, compile, [''])
test('T18815', only_ways(['optasm']), makefile_test, ['T18815'])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8fc0f2b29ed83b26d59c59fdebffeea97225cc1c...be385269dafbad2d3af3347afac4690b4f9f4933
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8fc0f2b29ed83b26d59c59fdebffeea97225cc1c...be385269dafbad2d3af3347afac4690b4f9f4933
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230712/b1ed23f8/attachment-0001.html>
More information about the ghc-commits
mailing list