[Git][ghc/ghc][wip/simplifier-tweaks] 6 commits: Several improvements to the handling of coercions

Simon Peyton Jones (@simonpj) gitlab at gitlab.haskell.org
Wed Jul 12 16:43:08 UTC 2023



Simon Peyton Jones pushed to branch wip/simplifier-tweaks at Glasgow Haskell Compiler / GHC


Commits:
e04b7b3a by Simon Peyton Jones at 2023-07-12T13:17:17+01:00
Several improvements to the handling of coercions

* Make `mkSymCo` and `mkInstCo` smarter
  Fixes #23642

* Fix return role of `SelCo` in the coercion optimiser.
  Fixes #23617

* Make the coercion optimiser `opt_trans_rule` work better for newtypes
  Fixes #23619

- - - - -
e79e49da by Simon Peyton Jones at 2023-07-12T14:29:14+01:00
Simplifier improvements

This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do.  But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.

The net result is good: a 2% improvement in compile time.  The table
below shows changes over 1%.

The main changes are:

* The SimplEnv now has a seInlineDepth field, which says how deep
  in unfoldings we are.  See Note [Inline depth] in Simplify.Env

* Avoid repeatedly simplifying coercions.
  see Note [Avoid re-simplifying coercions] in Simplify.Iteration
  As you'll see from the Note, this makes use of the seInlineDepth.

* Allow Simplify.Utils.postInlineUnconditionally to inline variables
  that are used exactly once. See Note [Post-inline for single-use things].

* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
  This is another part of Note [Post-inline for single-use things], and
  is really good for reducing simplifier iterations in situations like
      case K e of { K x -> blah }
  wher x is used once in blah.

* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
  elimination.  Note [Case elim in exprIsConApp_maybe]

* When making join points, don't do so if the join point is so small
  it will immediately be inlined.  See Note [Duplicating alternatives]

* Do not add an unfolding to a join point at birth.  This is a tricky one
  and has a long Note [Do not add unfoldings to join points at birth]
  It shows up in two places
  - In `mkDupableAlt` do not add an inlining
  - (trickier) In `simplLetUnfolding` don't add an unfolding for a
    fresh join point
  I am not fully satisifed with this, but it works and is well documented.

* Many new or rewritten Notes.  E.g. Note [Avoiding simplifying repeatedly]

I discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very
delicately balanced.  It's a small, heavily used, overloaded function
and it's important that it inlines. By a fluke it was before, but at
various times in my journey it stopped doing so.  So I added an INLINE
pragma to it.

Metrics: compile_time/bytes allocated
------------------------------------------------
           CoOpt_Singletons(normal)   -4.3% GOOD
                LargeRecord(normal)  -23.3% GOOD
                  PmSeriesS(normal)   -2.4%
                     T11195(normal)   -1.7%
                     T12227(normal)  -20.0% GOOD
                     T12545(normal)   -5.4%
                 T13253-spj(normal)  -50.7% GOOD
                     T13386(normal)   -5.1% GOOD
                     T14766(normal)   -2.4% GOOD
                     T15164(normal)   -1.7%
                     T15304(normal)   +1.0%
                     T15630(normal)   -7.7%
                    T15630a(normal)          NEW
                     T15703(normal)   -7.5% GOOD
                     T16577(normal)   -5.1% GOOD
                     T17516(normal)   -3.6%
                     T18223(normal)  -16.8% GOOD
                     T18282(normal)   -1.5%
                     T18304(normal)   +1.9%
                    T21839c(normal)   -3.5% GOOD
                      T3064(normal)   -1.5%
                      T5030(normal)  -16.2% GOOD
                   T5321Fun(normal)   -1.6%
                      T6048(optasm)   -2.1% GOOD
                      T8095(normal)   -6.1% GOOD
                      T9630(normal)   -5.1% GOOD
                      WWRec(normal)   -1.6%

                          geo. mean   -2.1%
                          minimum    -50.7%
                          maximum     +1.9%

Metric Decrease:
    CoOpt_Singletons
    LargeRecord
    T12227
    T13253-spj
    T13386
    T14766
    T15703
    T16577
    T18223
    T21839c
    T5030
    T6048
    T8095
    T9630

- - - - -
f32ad188 by Simon Peyton Jones at 2023-07-12T17:35:45+01:00
No postInlineUnconditionally for strict bindings

Does not save allocation!

- - - - -
c9b06dfa by Simon Peyton Jones at 2023-07-12T17:37:05+01:00
No preInlineConditionally for join points

Does not save allocation!

- - - - -
90d96f33 by Simon Peyton Jones at 2023-07-12T17:37:31+01:00
Don't use Plan A for a case continuation

See carryPropagate in digits-of-e2

Really I'm moving more towards Plan B.

- - - - -
be385269 by Simon Peyton Jones at 2023-07-12T17:39:07+01:00
Half way attempt at inlining join points

My idea here is to be more parsimonious about inlining join
points.   I was thinking that even

  join j x = I# x in
  case v of
     p1 -> j x1
     p2 -> j x2
     ...

might not inline. Better for consumers.

Also don't inline even in FinalPhase beause we want importing
modules to see this.

- - - - -


21 changed files:

- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Inline.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Monad.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/TyCo/Rep.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Unfold/Make.hs
- compiler/GHC/Core/Utils.hs
- compiler/GHC/HsToCore/Pmc/Solver/Types.hs
- compiler/GHC/Iface/Type.hs
- compiler/GHC/IfaceToCore.hs
- testsuite/tests/perf/compiler/T15630.hs
- + testsuite/tests/perf/compiler/T15630a.hs
- testsuite/tests/simplCore/should_compile/T18730.hs → testsuite/tests/perf/compiler/T18730.hs
- testsuite/tests/simplCore/should_compile/T18730_A.hs → testsuite/tests/perf/compiler/T18730_A.hs
- testsuite/tests/perf/compiler/all.T
- testsuite/tests/simplCore/should_compile/all.T


Changes:

=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -37,7 +37,7 @@ module GHC.Core.Coercion (
         mkAxInstLHS, mkUnbranchedAxInstLHS,
         mkPiCo, mkPiCos, mkCoCast,
         mkSymCo, mkTransCo,
-        mkSelCo, getNthFun, getNthFromType, mkLRCo,
+        mkSelCo, mkSelCoResRole, getNthFun, getNthFromType, mkLRCo,
         mkInstCo, mkAppCo, mkAppCos, mkTyConAppCo,
         mkFunCo, mkFunCo2, mkFunCoNoFTF, mkFunResCo,
         mkNakedFunCo,
@@ -556,20 +556,28 @@ splitFunCo_maybe (FunCo { fco_arg = arg, fco_res = res }) = Just (arg, res)
 splitFunCo_maybe _ = Nothing
 
 splitForAllCo_maybe :: Coercion -> Maybe (TyCoVar, Coercion, Coercion)
-splitForAllCo_maybe (ForAllCo tv k_co co) = Just (tv, k_co, co)
-splitForAllCo_maybe _                     = Nothing
+splitForAllCo_maybe (ForAllCo tv k_co co)
+  = Just (tv, k_co, co)
+splitForAllCo_maybe co
+  | Just (ty, r)        <- isReflCo_maybe co
+  , Just (tcv, body_ty) <- splitForAllTyCoVar_maybe ty
+  = Just (tcv, mkNomReflCo (varType tcv), mkReflCo r body_ty)
+splitForAllCo_maybe _
+  = Nothing
 
 -- | Like 'splitForAllCo_maybe', but only returns Just for tyvar binder
 splitForAllCo_ty_maybe :: Coercion -> Maybe (TyVar, Coercion, Coercion)
-splitForAllCo_ty_maybe (ForAllCo tv k_co co)
-  | isTyVar tv = Just (tv, k_co, co)
-splitForAllCo_ty_maybe _ = Nothing
+splitForAllCo_ty_maybe co
+  | Just stuff@(tv,_,_) <- splitForAllCo_maybe co
+  , isTyVar tv = Just stuff
+  | otherwise  = Nothing
 
 -- | Like 'splitForAllCo_maybe', but only returns Just for covar binder
 splitForAllCo_co_maybe :: Coercion -> Maybe (CoVar, Coercion, Coercion)
-splitForAllCo_co_maybe (ForAllCo cv k_co co)
-  | isCoVar cv = Just (cv, k_co, co)
-splitForAllCo_co_maybe _ = Nothing
+splitForAllCo_co_maybe co
+  | Just stuff@(cv,_,_) <- splitForAllCo_maybe co
+  , isCoVar cv = Just stuff
+  | otherwise  = Nothing
 
 -------------------------------------------------------
 -- and some coercion kind stuff
@@ -1126,12 +1134,17 @@ mkUnivCo prov role ty1 ty2
 --   a kind of @t1 ~ t2@ becomes the kind @t2 ~ t1 at .
 mkSymCo :: Coercion -> Coercion
 
--- Do a few simple optimizations, but don't bother pushing occurrences
--- of symmetry to the leaves; the optimizer will take care of that.
-mkSymCo co | isReflCo co          = co
-mkSymCo    (SymCo co)             = co
-mkSymCo    (SubCo (SymCo co))     = SubCo co
-mkSymCo co                        = SymCo co
+-- Do a few simple optimizations, mainly to expose the underlying
+-- constructors to other 'mk' functions.  E.g.
+--   mkInstCo (mkSymCo (ForAllCo ...)) ty
+-- We want to push the SymCo inside the ForallCo, so that we can instantiate
+-- This can make a big difference.  E.g without coercion optimisation, GHC.Read
+-- totally explodes; but when we push Sym inside ForAll, it's fine.
+mkSymCo co | isReflCo co       = co
+mkSymCo (SymCo co)             = co
+mkSymCo (SubCo (SymCo co))     = SubCo co
+mkSymCo (ForAllCo tcv kco co)  = ForAllCo tcv (mkSymCo kco) (mkSymCo co)
+mkSymCo co                     = SymCo co
 
 -- | Create a new 'Coercion' by composing the two given 'Coercion's transitively.
 --   (co1 ; co2)
@@ -1142,6 +1155,7 @@ mkTransCo (GRefl r t1 (MCo co1)) (GRefl _ _ (MCo co2))
   = GRefl r t1 (MCo $ mkTransCo co1 co2)
 mkTransCo co1 co2                = TransCo co1 co2
 
+--------------------
 mkSelCo :: HasDebugCallStack
         => CoSel
         -> Coercion
@@ -1161,7 +1175,7 @@ mkSelCo_maybe cs co
 
     go cs co
       | Just (ty, r) <- isReflCo_maybe co
-      = Just (mkReflCo r (getNthFromType cs ty))
+      = Just (mkReflCo (mkSelCoResRole cs r) (getNthFromType cs ty))
 
     go SelForAll (ForAllCo _ kind_co _)
       = Just kind_co
@@ -1212,6 +1226,14 @@ mkSelCo_maybe cs co
 
     good_call _ = False
 
+mkSelCoResRole :: CoSel -> Role -> Role
+-- What is the role of (SelCo cs co), if co has role 'r'?
+-- It is not just 'r'!
+-- c.f. the SelCo case of coercionRole
+mkSelCoResRole SelForAll       _ = Nominal
+mkSelCoResRole (SelTyCon _ r') _ = r'
+mkSelCoResRole (SelFun fs)     r = funRole r fs
+
 -- | Extract the nth field of a FunCo
 getNthFun :: FunSel
           -> a    -- ^ multiplicity
@@ -1222,6 +1244,24 @@ getNthFun SelMult mult _   _   = mult
 getNthFun SelArg _     arg _   = arg
 getNthFun SelRes _     _   res = res
 
+getNthFromType :: HasDebugCallStack => CoSel -> Type -> Type
+getNthFromType (SelFun fs) ty
+  | Just (_af, mult, arg, res) <- splitFunTy_maybe ty
+  = getNthFun fs mult arg res
+
+getNthFromType (SelTyCon n _) ty
+  | Just args <- tyConAppArgs_maybe ty
+  = assertPpr (args `lengthExceeds` n) (ppr n $$ ppr ty) $
+    args `getNth` n
+
+getNthFromType SelForAll ty       -- Works for both tyvar and covar
+  | Just (tv,_) <- splitForAllTyCoVar_maybe ty
+  = tyVarKind tv
+
+getNthFromType cs ty
+  = pprPanic "getNthFromType" (ppr cs $$ ppr ty)
+
+--------------------
 mkLRCo :: LeftOrRight -> Coercion -> Coercion
 mkLRCo lr co
   | Just (ty, eq) <- isReflCo_maybe co
@@ -1230,11 +1270,14 @@ mkLRCo lr co
   = LRCo lr co
 
 -- | Instantiates a 'Coercion'.
+-- Works for both tyvar and covar
 mkInstCo :: Coercion -> CoercionN -> Coercion
-mkInstCo (ForAllCo tcv _kind_co body_co) co
-  | Just (arg, _) <- isReflCo_maybe co
-      -- works for both tyvar and covar
-  = substCoUnchecked (zipTCvSubst [tcv] [arg]) body_co
+mkInstCo co_fun co_arg
+  | Just (tcv, kind_co, body_co) <- splitForAllCo_maybe co_fun
+  , Just (arg, _) <- isReflCo_maybe co_arg
+  = assertPpr (isReflexiveCo kind_co) (ppr co_fun $$ ppr co_arg) $
+       -- If the arg is Refl, then kind_co must be reflexive too
+    substCoUnchecked (zipTCvSubst [tcv] [arg]) body_co
 mkInstCo co arg = InstCo co arg
 
 -- | Given @ty :: k1@, @co :: k1 ~ k2@,
@@ -2433,23 +2476,6 @@ coercionLKind co
     go_app (InstCo co arg) args = go_app co (go arg:args)
     go_app co              args = piResultTys (go co) args
 
-getNthFromType :: HasDebugCallStack => CoSel -> Type -> Type
-getNthFromType (SelFun fs) ty
-  | Just (_af, mult, arg, res) <- splitFunTy_maybe ty
-  = getNthFun fs mult arg res
-
-getNthFromType (SelTyCon n _) ty
-  | Just args <- tyConAppArgs_maybe ty
-  = assertPpr (args `lengthExceeds` n) (ppr n $$ ppr ty) $
-    args `getNth` n
-
-getNthFromType SelForAll ty       -- Works for both tyvar and covar
-  | Just (tv,_) <- splitForAllTyCoVar_maybe ty
-  = tyVarKind tv
-
-getNthFromType cs ty
-  = pprPanic "getNthFromType" (ppr cs $$ ppr ty)
-
 coercionRKind :: Coercion -> Type
 coercionRKind co
   = go co


=====================================
compiler/GHC/Core/Coercion/Opt.hs
=====================================
@@ -23,7 +23,7 @@ import GHC.Core.Unify
 
 import GHC.Types.Var.Set
 import GHC.Types.Var.Env
-import GHC.Types.Unique.Set
+-- import GHC.Types.Unique.Set
 
 import GHC.Data.Pair
 import GHC.Data.List.SetOps ( getNth )
@@ -132,45 +132,52 @@ optCoercion :: OptCoercionOpts -> Subst -> Coercion -> NormalCo
 optCoercion opts env co
   | optCoercionEnabled opts
   = optCoercion' env co
+
 {-
-  = pprTrace "optCoercion {" (text "Co:" <+> ppr co) $
+  = pprTrace "optCoercion {" (text "Co:" <> ppr (coercionSize co)) $
     let result = optCoercion' env co in
-    pprTrace "optCoercion }" (vcat [ text "Co:" <+> ppr co
-                                   , text "Optco:" <+> ppr result ]) $
+    pprTrace "optCoercion }"
+       (vcat [ text "Co:"    <+> ppr (coercionSize co)
+             , text "Optco:" <+> ppWhen (isReflCo result) (text "(refl)")
+                             <+> ppr (coercionSize result) ]) $
     result
 -}
 
   | otherwise
   = substCo env co
 
-
 optCoercion' :: Subst -> Coercion -> NormalCo
 optCoercion' env co
   | debugIsOn
   = let out_co = opt_co1 lc False co
         (Pair in_ty1  in_ty2,  in_role)  = coercionKindRole co
         (Pair out_ty1 out_ty2, out_role) = coercionKindRole out_co
+
+        details = vcat [ text "in_co:" <+> ppr co
+                       , text "in_ty1:" <+> ppr in_ty1
+                       , text "in_ty2:" <+> ppr in_ty2
+                       , text "out_co:" <+> ppr out_co
+                       , text "out_ty1:" <+> ppr out_ty1
+                       , text "out_ty2:" <+> ppr out_ty2
+                       , text "in_role:" <+> ppr in_role
+                       , text "out_role:" <+> ppr out_role
+--                       , vcat $ map ppr_one $ nonDetEltsUniqSet $ coVarsOfCo co
+--                       , text "subst:" <+> ppr env
+                       ]
     in
+    warnPprTrace (not (isReflCo out_co) && isReflexiveCo out_co)
+                 "optCoercion: reflexive but not refl" details $
     assertPpr (substTyUnchecked env in_ty1 `eqType` out_ty1 &&
                substTyUnchecked env in_ty2 `eqType` out_ty2 &&
                in_role == out_role)
-              (hang (text "optCoercion changed types!")
-                  2 (vcat [ text "in_co:" <+> ppr co
-                          , text "in_ty1:" <+> ppr in_ty1
-                          , text "in_ty2:" <+> ppr in_ty2
-                          , text "out_co:" <+> ppr out_co
-                          , text "out_ty1:" <+> ppr out_ty1
-                          , text "out_ty2:" <+> ppr out_ty2
-                          , text "in_role:" <+> ppr in_role
-                          , text "out_role:" <+> ppr out_role
-                          , vcat $ map ppr_one $ nonDetEltsUniqSet $ coVarsOfCo co
-                          , text "subst:" <+> ppr env ]))
-               out_co
-
-  | otherwise         = opt_co1 lc False co
+              (hang (text "optCoercion changed types!") 2 details) $
+    out_co
+
+  | otherwise
+  = opt_co1 lc False co
   where
     lc = mkSubstLiftingContext env
-    ppr_one cv = ppr cv <+> dcolon <+> ppr (coVarKind cv)
+--    ppr_one cv = ppr cv <+> dcolon <+> ppr (coVarKind cv)
 
 
 type NormalCo    = Coercion
@@ -215,23 +222,37 @@ opt_co3 env sym _                       r co = opt_co4_wrap env sym False r co
 -- | Optimize a non-phantom coercion.
 opt_co4, opt_co4_wrap :: LiftingContext -> SymFlag -> ReprFlag
                       -> Role -> Coercion -> NormalCo
--- Precondition: In every call (opt_co4 lc sym rep role co)
---               we should have role = coercionRole co
+-- Precondition:  In every call (opt_co4 lc sym rep role co)
+--                we should have role = coercionRole co
+-- Postcondition: The resulting coercion is equivalant to
+--                     wrapsub (wrapsym (mksub co)
+--                 where wrapsym is SymCo if sym=True
+--                       wrapsub is SubCo if rep=True
+
+-- opt_co4_wrap is there just to support tracing, when debugging
+-- Usually it just goes straight to opt_co4
 opt_co4_wrap = opt_co4
 
 {-
 opt_co4_wrap env sym rep r co
   = pprTrace "opt_co4_wrap {"
-    ( vcat [ text "Sym:" <+> ppr sym
-           , text "Rep:" <+> ppr rep
-           , text "Role:" <+> ppr r
-           , text "Co:" <+> ppr co ]) $
-    assert (r == coercionRole co )    $
-    let result = opt_co4 env sym rep r co in
-    pprTrace "opt_co4_wrap }" (ppr co $$ text "---" $$ ppr result) $
-    result
--}
+   ( vcat [ text "Sym:" <+> ppr sym
+          , text "Rep:" <+> ppr rep
+          , text "Role:" <+> ppr r
+          , text "Co:" <+> ppr co ]) $
+   assert (r == coercionRole co )    $
+   let result = opt_co4 env sym rep r co in
+   pprTrace "opt_co4_wrap }" (ppr co $$ text "---" $$ ppr result) $
+   assertPpr (res_role == coercionRole result)
+             (vcat [ text "Role:" <+> ppr r
+                   , text "Result: " <+>  ppr result
+                   , text "Result type:" <+> ppr (coercionType result) ]) $
+   result
 
+  where
+    res_role | rep       = Representational
+             | otherwise = r
+-}
 
 opt_co4 env _   rep r (Refl ty)
   = assertPpr (r == Nominal)
@@ -379,11 +400,17 @@ opt_co4 env sym rep _ (SelCo SelForAll (ForAllCo _ eta _))
       -- works for both tyvar and covar
   = opt_co4_wrap env sym rep Nominal eta
 
-opt_co4 env sym rep r (SelCo n co)
-  | Just nth_co <- case (co', n) of
-      (TyConAppCo _ _ cos, SelTyCon n _) -> Just (cos `getNth` n)
-      (FunCo _ _ _ w co1 co2, SelFun fs) -> Just (getNthFun fs w co1 co2)
-      (ForAllCo _ eta _, SelForAll)      -> Just eta
+-- So the /input/ coercion isn't ForAllCo or Refl;
+-- instead look at the /output/ coercion
+opt_co4 env sym rep r (SelCo cs co)
+  | Just (ty, co_role) <- isReflCo_maybe co'
+  = mkReflCo (chooseRole rep (mkSelCoResRole cs co_role))
+             (getNthFromType cs ty)
+
+  | Just nth_co <- case (co', cs) of
+      (TyConAppCo _ _ cos,    SelTyCon n _) -> Just (cos `getNth` n)
+      (FunCo _ _ _ w co1 co2, SelFun fs)    -> Just (getNthFun fs w co1 co2)
+      (ForAllCo _ eta _,      SelForAll)    -> Just eta
       _                  -> Nothing
   = if rep && (r == Nominal)
       -- keep propagating the SubCo
@@ -391,7 +418,7 @@ opt_co4 env sym rep r (SelCo n co)
     else nth_co
 
   | otherwise
-  = wrapRole rep r $ SelCo n co'
+  = wrapRole rep r $ SelCo cs co'
   where
     co' = opt_co1 env sym co
 
@@ -586,7 +613,6 @@ opt_univ env sym prov role oty1 oty2
         (env', tv1', eta') = optForAllCoBndr env sym tv1 eta
     in
     mkForAllCo tv1' eta' (opt_univ env' sym prov' role ty1 ty2')
-
   | Just (cv1, ty1) <- splitForAllCoVar_maybe oty1
   , Just (cv2, ty2) <- splitForAllCoVar_maybe oty2
       -- NB: prov isn't interesting here either
@@ -628,8 +654,25 @@ opt_transList :: HasDebugCallStack => InScopeSet -> [NormalCo] -> [NormalCo] ->
 opt_transList is = zipWithEqual "opt_transList" (opt_trans is)
   -- The input lists must have identical length.
 
-opt_trans :: InScopeSet -> NormalCo -> NormalCo -> NormalCo
+opt_trans, opt_trans' :: InScopeSet -> NormalCo -> NormalCo -> NormalCo
+
+-- opt_trans just allows us to add some debug tracing
+-- Usually it just goes to opt_trans'
+opt_trans is co1 co2 = opt_trans' is co1 co2
+
+{-
 opt_trans is co1 co2
+  = assertPpr (r1==r2) (vcat [ ppr r1 <+> ppr co1, ppr r2 <+> ppr co2]) $
+    assertPpr (rres == r1) (vcat [ ppr r1 <+> ppr co1, ppr r2 <+> ppr co2, text "res" <+> ppr rres <+> ppr res ]) $
+    res
+  where
+    res = opt_trans' is co1 co2
+    rres = coercionRole res
+    r1 = coercionRole co1
+    r2 = coercionRole co1
+-}
+
+opt_trans' is co1 co2
   | isReflCo co1 = co2
     -- optimize when co1 is a Refl Co
   | otherwise    = opt_trans1 is co1 co2
@@ -803,10 +846,37 @@ opt_trans_rule is co1 co2
 -- Push transitivity inside axioms
 opt_trans_rule is co1 co2
 
+  -- TrPushAxSym/TrPushSymAx
+  -- Put this first!  Otherwise (#23619) we get
+  --    newtype N a = MkN a
+  --    axN :: forall a. N a ~ a
+  -- Now consider (axN ty ; sym (axN ty))
+  -- If we put TrPushSymAxR first, we'll get
+  --    (axN ty ; sym (axN ty)) :: N ty ~ N ty -- Obviously Refl
+  --    --> axN (sym (axN ty))  :: N ty ~ N ty -- Very stupid
+  | Just (sym1, ax1, ind1, cos1) <- isAxiom_maybe co1
+  , Just (sym2, ax2, ind2, cos2) <- isAxiom_maybe co2
+  , ax1 == ax2
+  , ind1 == ind2
+  , sym1 == not sym2
+  , let branch = coAxiomNthBranch ax1 ind1
+        role   = coAxiomRole ax1
+        qtvs   = coAxBranchTyVars branch ++ coAxBranchCoVars branch
+        lhs    = coAxNthLHS ax1 ind1
+        rhs    = coAxBranchRHS branch
+        pivot_tvs = exactTyCoVarsOfType (if sym2 then rhs else lhs)
+  , all (`elemVarSet` pivot_tvs) qtvs
+  = fireTransRule "TrPushAxSym" co1 co2 $
+    if sym2
+       -- TrPushAxSym
+    then liftCoSubstWith role qtvs (opt_transList is cos1 (map mkSymCo cos2)) lhs
+       -- TrPushSymAx
+    else liftCoSubstWith role qtvs (opt_transList is (map mkSymCo cos1) cos2) rhs
+
   -- See Note [Push transitivity inside axioms] and
   -- Note [Push transitivity inside newtype axioms only]
   -- TrPushSymAxR
-  | Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
+  | Just (sym, con, ind, cos1) <- isAxiom_maybe co1
   , isNewTyCon (coAxiomTyCon con)
   , True <- sym
   , Just cos2 <- matchAxiom sym con ind co2
@@ -814,7 +884,7 @@ opt_trans_rule is co1 co2
   = fireTransRule "TrPushSymAxR" co1 co2 $ SymCo newAxInst
 
   -- TrPushAxR
-  | Just (sym, con, ind, cos1) <- co1_is_axiom_maybe
+  | Just (sym, con, ind, cos1) <- isAxiom_maybe co1
   , isNewTyCon (coAxiomTyCon con)
   , False <- sym
   , Just cos2 <- matchAxiom sym con ind co2
@@ -822,7 +892,7 @@ opt_trans_rule is co1 co2
   = fireTransRule "TrPushAxR" co1 co2 newAxInst
 
   -- TrPushSymAxL
-  | Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
+  | Just (sym, con, ind, cos2) <- isAxiom_maybe co2
   , isNewTyCon (coAxiomTyCon con)
   , True <- sym
   , Just cos1 <- matchAxiom (not sym) con ind co1
@@ -830,35 +900,13 @@ opt_trans_rule is co1 co2
   = fireTransRule "TrPushSymAxL" co1 co2 $ SymCo newAxInst
 
   -- TrPushAxL
-  | Just (sym, con, ind, cos2) <- co2_is_axiom_maybe
+  | Just (sym, con, ind, cos2) <- isAxiom_maybe co2
   , isNewTyCon (coAxiomTyCon con)
   , False <- sym
   , Just cos1 <- matchAxiom (not sym) con ind co1
   , let newAxInst = AxiomInstCo con ind (opt_transList is cos1 cos2)
   = fireTransRule "TrPushAxL" co1 co2 newAxInst
 
-  -- TrPushAxSym/TrPushSymAx
-  | Just (sym1, con1, ind1, cos1) <- co1_is_axiom_maybe
-  , Just (sym2, con2, ind2, cos2) <- co2_is_axiom_maybe
-  , con1 == con2
-  , ind1 == ind2
-  , sym1 == not sym2
-  , let branch = coAxiomNthBranch con1 ind1
-        qtvs = coAxBranchTyVars branch ++ coAxBranchCoVars branch
-        lhs  = coAxNthLHS con1 ind1
-        rhs  = coAxBranchRHS branch
-        pivot_tvs = exactTyCoVarsOfType (if sym2 then rhs else lhs)
-  , all (`elemVarSet` pivot_tvs) qtvs
-  = fireTransRule "TrPushAxSym" co1 co2 $
-    if sym2
-       -- TrPushAxSym
-    then liftCoSubstWith role qtvs (opt_transList is cos1 (map mkSymCo cos2)) lhs
-       -- TrPushSymAx
-    else liftCoSubstWith role qtvs (opt_transList is (map mkSymCo cos1) cos2) rhs
-  where
-    co1_is_axiom_maybe = isAxiom_maybe co1
-    co2_is_axiom_maybe = isAxiom_maybe co2
-    role = coercionRole co1 -- should be the same as coercionRole co2!
 
 opt_trans_rule _ co1 co2        -- Identity rule
   | let ty1 = coercionLKind co1
@@ -1108,11 +1156,13 @@ chooseRole _    r = r
 
 -----------
 isAxiom_maybe :: Coercion -> Maybe (Bool, CoAxiom Branched, Int, [Coercion])
-isAxiom_maybe (SymCo co)
-  | Just (sym, con, ind, cos) <- isAxiom_maybe co
-  = Just (not sym, con, ind, cos)
-isAxiom_maybe (AxiomInstCo con ind cos)
-  = Just (False, con, ind, cos)
+-- We don't expect to see nested SymCo; and that lets us write a simple,
+-- non-recursive function. (If we see a nested SymCo we'll just fail,
+-- which is ok.)
+isAxiom_maybe (SymCo (AxiomInstCo ax ind cos))
+  = Just (True, ax, ind, cos)
+isAxiom_maybe (AxiomInstCo ax ind cos)
+  = Just (False, ax, ind, cos)
 isAxiom_maybe _ = Nothing
 
 matchAxiom :: Bool -- True = match LHS, False = match RHS


=====================================
compiler/GHC/Core/Opt/Simplify/Env.hs
=====================================
@@ -23,6 +23,7 @@ module GHC.Core.Opt.Simplify.Env (
         getInScope, setInScopeFromE, setInScopeFromF,
         setInScopeSet, modifyInScope, addNewInScopeIds,
         getSimplRules, enterRecGroupRHSs,
+        reSimplifying,
 
         -- * Substitution results
         SimplSR(..), mkContEx, substId, lookupRecBndr,
@@ -61,27 +62,31 @@ import GHC.Core.Utils
 import GHC.Core.Multiplicity     ( scaleScaled )
 import GHC.Core.Unfold
 import GHC.Core.TyCo.Subst (emptyIdSubstEnv)
-import GHC.Types.Var
-import GHC.Types.Var.Env
-import GHC.Types.Var.Set
-import GHC.Data.OrdList
-import GHC.Data.Graph.UnVar
-import GHC.Types.Id as Id
 import GHC.Core.Make            ( mkWildValBinder, mkCoreLet )
-import GHC.Builtin.Types
-import qualified GHC.Core.Type as Type
 import GHC.Core.Type hiding     ( substTy, substTyVar, substTyVarBndr, substCo
                                 , extendTvSubst, extendCvSubst )
 import qualified GHC.Core.Coercion as Coercion
 import GHC.Core.Coercion hiding ( substCo, substCoVar, substCoVarBndr )
-import GHC.Platform ( Platform )
+import qualified GHC.Core.Type as Type
+
+import GHC.Types.Var
+import GHC.Types.Var.Env
+import GHC.Types.Var.Set
+import GHC.Types.Id as Id
 import GHC.Types.Basic
+import GHC.Types.Unique.FM      ( pprUniqFM )
+
+import GHC.Builtin.Types
+
+import GHC.Data.OrdList
+import GHC.Data.Graph.UnVar
+import GHC.Platform ( Platform )
+
 import GHC.Utils.Monad
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Panic.Plain
 import GHC.Utils.Misc
-import GHC.Types.Unique.FM      ( pprUniqFM )
 
 import Data.List ( intersperse, mapAccumL )
 
@@ -151,6 +156,17 @@ following table:
     | Set by user                | SimplMode    | TopEnvConfig    |
     | Computed on initialization | SimplEnv     | SimplTopEnv     |
 
+Note [Inline depth]
+~~~~~~~~~~~~~~~~~~~
+When we inline an /already-simplified/ unfolding, we
+* Zap the substitution environment; the inlined thing is an OutExpr
+* Bump the seInlineDepth in the SimplEnv
+Both these tasks are done in zapSubstEnv.
+
+The seInlineDepth tells us how deep in inlining we are.  Currently,
+seInlineDepth is used for just one purpose: when we encounter a
+coercion we don't apply optCoercion to it if seInlineDepth>0.
+Reason: it has already been optimised once, no point in doing so again.
 -}
 
 data SimplEnv
@@ -180,7 +196,11 @@ data SimplEnv
         -- They are all OutVars, and all bound in this module
       , seInScope   :: !InScopeSet       -- OutVars only
 
-      , seCaseDepth :: !Int  -- Depth of multi-branch case alternatives
+      , seCaseDepth   :: !Int  -- Depth of multi-branch case alternatives
+
+      , seInlineDepth :: !Int  -- 0 initially, 1 when we inline an already-simplified
+                               -- unfolding, and simplify again; and so on
+                               -- See Note [Inline depth]
     }
 
 seArityOpts :: SimplEnv -> ArityOpts
@@ -488,14 +508,15 @@ points we're substituting. -}
 
 mkSimplEnv :: SimplMode -> (FamInstEnv, FamInstEnv) -> SimplEnv
 mkSimplEnv mode fam_envs
-  = SimplEnv { seMode      = mode
-             , seFamEnvs   = fam_envs
-             , seInScope   = init_in_scope
-             , seTvSubst   = emptyVarEnv
-             , seCvSubst   = emptyVarEnv
-             , seIdSubst   = emptyVarEnv
-             , seRecIds    = emptyUnVarSet
-             , seCaseDepth = 0 }
+  = SimplEnv { seMode        = mode
+             , seFamEnvs     = fam_envs
+             , seInScope     = init_in_scope
+             , seTvSubst     = emptyVarEnv
+             , seCvSubst     = emptyVarEnv
+             , seIdSubst     = emptyVarEnv
+             , seRecIds      = emptyUnVarSet
+             , seCaseDepth   = 0
+             , seInlineDepth = 0 }
         -- The top level "enclosing CC" is "SUBSUMED".
 
 init_in_scope :: InScopeSet
@@ -531,6 +552,9 @@ updMode upd env
 bumpCaseDepth :: SimplEnv -> SimplEnv
 bumpCaseDepth env = env { seCaseDepth = seCaseDepth env + 1 }
 
+reSimplifying :: SimplEnv -> Bool
+reSimplifying (SimplEnv { seInlineDepth = n }) = n>0
+
 ---------------------
 extendIdSubst :: SimplEnv -> Id -> SimplSR -> SimplEnv
 extendIdSubst env@(SimplEnv {seIdSubst = subst}) var res
@@ -616,7 +640,12 @@ setInScopeFromE.
 
 ---------------------
 zapSubstEnv :: SimplEnv -> SimplEnv
-zapSubstEnv env = env {seTvSubst = emptyVarEnv, seCvSubst = emptyVarEnv, seIdSubst = emptyVarEnv}
+-- See Note [Inline depth]
+-- We call zapSubstEnv precisely when we are about to
+-- simplify an already-simplified term
+zapSubstEnv env@(SimplEnv { seInlineDepth = n })
+  = env { seTvSubst = emptyVarEnv, seCvSubst = emptyVarEnv, seIdSubst = emptyVarEnv
+        , seInlineDepth = n+1 }
 
 setSubstEnv :: SimplEnv -> TvSubstEnv -> CvSubstEnv -> SimplIdSubst -> SimplEnv
 setSubstEnv env tvs cvs ids = env { seTvSubst = tvs, seCvSubst = cvs, seIdSubst = ids }


=====================================
compiler/GHC/Core/Opt/Simplify/Inline.hs
=====================================
@@ -89,14 +89,18 @@ StrictAnal.addStrictnessInfoToTopId
 
 callSiteInline :: Logger
                -> UnfoldingOpts
-               -> Int                   -- Case depth
+               -> Int -> Int            -- Case depth and inline depth
                -> Id                    -- The Id
                -> Bool                  -- True <=> unfolding is active
                -> Bool                  -- True if there are no arguments at all (incl type args)
                -> [ArgSummary]          -- One for each value arg; True if it is interesting
                -> CallCtxt              -- True <=> continuation is interesting
                -> Maybe CoreExpr        -- Unfolding, if any
-callSiteInline logger opts !case_depth id active_unfolding lone_variable arg_infos cont_info
+callSiteInline logger opts
+               !case_depth     -- See Note [Avoid inlining into deeply nested cases]
+               !inline_depth   -- Currently not used to control inlining
+                               -- but we pass it for debug-logging purposes
+               id active_unfolding lone_variable arg_infos cont_info
   = case idUnfolding id of
       -- idUnfolding checks for loop-breakers, returning NoUnfolding
       -- Things with an INLINE pragma may have an unfolding *and*
@@ -104,7 +108,7 @@ callSiteInline logger opts !case_depth id active_unfolding lone_variable arg_inf
         CoreUnfolding { uf_tmpl = unf_template
                       , uf_cache = unf_cache
                       , uf_guidance = guidance }
-          | active_unfolding -> tryUnfolding logger opts case_depth id lone_variable
+          | active_unfolding -> tryUnfolding logger opts case_depth inline_depth id lone_variable
                                     arg_infos cont_info unf_template
                                     unf_cache guidance
           | otherwise -> traceInline logger opts id "Inactive unfolding:" (ppr id) Nothing
@@ -133,8 +137,9 @@ traceInline logger opts inline_id str doc result
 
 {- Note [Avoid inlining into deeply nested cases]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Also called "exponential inlining".
 
-Consider a function f like this:
+Consider a function f like this: (#18730)
 
   f arg1 arg2 =
     case ...
@@ -145,46 +150,44 @@ This function is small. So should be safe to inline.
 However sometimes this doesn't quite work out like that.
 Consider this code:
 
-f1 arg1 arg2 ... = ...
-    case _foo of
-      alt1 -> ... f2 arg1 ...
-      alt2 -> ... f2 arg2 ...
+    f1 arg1 arg2 ... = ...
+        case _foo of
+          alt1 -> ... f2 arg1 ...
+          alt2 -> ... f2 arg2 ...
 
-f2 arg1 arg2 ... = ...
-    case _foo of
-      alt1 -> ... f3 arg1 ...
-      alt2 -> ... f3 arg2 ...
+    f2 arg1 arg2 ... = ...
+        case _foo of
+          alt1 -> ... f3 arg1 ...
+          alt2 -> ... f3 arg2 ...
 
-f3 arg1 arg2 ... = ...
+    f3 arg1 arg2 ... = ...
 
-... repeats up to n times. And then f1 is
-applied to some arguments:
+    ... repeats up to n times. And then f1 is
+    applied to some arguments:
 
-foo = ... f1 <interestingArgs> ...
+    foo = ... f1 <interestingArgs> ...
 
-Initially f2..fn are not interesting to inline so we don't.
-However we see that f1 is applied to interesting args.
-So it's an obvious choice to inline those:
+Initially f2..fn are not interesting to inline so we don't.  However we see
+that f1 is applied to interesting args.  So it's an obvious choice to inline
+those:
 
-foo =
-    ...
-      case _foo of
-        alt1 -> ... f2 <interestingArg> ...
-        alt2 -> ... f2 <interestingArg> ...
+    foo = ...
+          case _foo of
+            alt1 -> ... f2 <interestingArg> ...
+            alt2 -> ... f2 <interestingArg> ...
 
-As a result we go and inline f2 both mentions of f2 in turn are now applied to interesting
-arguments and f2 is small:
+As a result we go and inline f2 both mentions of f2 in turn are now applied to
+interesting arguments and f2 is small:
 
-foo =
-    ...
-      case _foo of
-        alt1 -> ... case _foo of
-            alt1 -> ... f3 <interestingArg> ...
-            alt2 -> ... f3 <interestingArg> ...
+    foo = ...
+          case _foo of
+            alt1 -> ... case _foo of
+                alt1 -> ... f3 <interestingArg> ...
+                alt2 -> ... f3 <interestingArg> ...
 
-        alt2 -> ... case _foo of
-            alt1 -> ... f3 <interestingArg> ...
-            alt2 -> ... f3 <interestingArg> ...
+            alt2 -> ... case _foo of
+                alt1 -> ... f3 <interestingArg> ...
+                alt2 -> ... f3 <interestingArg> ...
 
 The same thing happens for each binding up to f_n, duplicating the amount of inlining
 done in each step. Until at some point we are either done or run out of simplifier
@@ -201,19 +204,73 @@ The heuristic can be tuned in two ways:
 
 * We can ignore the first n levels of case nestings for inlining decisions using
   -funfolding-case-threshold.
-* The penalty grows linear with the depth. It's computed as size*(depth-threshold)/scaling.
+
+* The penalty grows linear with the depth. It's computed as
+     size*(depth-threshold)/scaling.
   Scaling can be set with -funfolding-case-scaling.
 
+Reflections and wrinkles
+
+* See also Note [Do not add unfoldings to join points at birth] in
+  GHC.Core.Opt.Simplify.Iteration
+
+* The case total case depth is really the wrong thing; it will inhibit inlining of a
+  local function, just because there is some giant case nest further out.  What we
+  want is the /difference/ in case-depth between the binding site and the call site.
+  That could be done quite easily by adding the case-depth to the Unfolding of the
+  function.
+
+* What matters more than /depth/ is total /width/; that is how many alternatives
+  are in the tree.  We could perhaps multiply depth by width at each case expression.
+
+* There might be a case nest with many alternatives, but the function is called in
+  only a handful of them.  So maybe we should ignore case-depth, and instead penalise
+  funtions that are called many times -- after all, inlining them bloats code.
+
+  But in the scenario above, we are simplifying an inlined fuction, without doing a
+  global occurrence analysis each time.  So if we based the penalty on multiple
+  occurences, we should /also/ add a penalty when simplifying an already-simplified
+  expression.  We do track this (seInlineDepth) but currently we barely use it.
+
+  An advantage of using occurrences+inline depth is that it'll work when no
+  case expressions are involved.  See #15488.
+
+* Test T18730 did not involve join points.  But join points are very prone to
+  the same kind of thing.  For exampe in #13253, and several related tickets,
+  we got an exponential blowup in code size from a program that looks like
+  this.
+
+  let j1a x = case f y     of { True -> p;   False -> q }
+      j1b x = case f y     of { True -> q;   False -> p }
+      j2a x = case f (y+1) of { True -> j1a x; False -> j1b x}
+      j2b x = case f (y+1) of { True -> j1b x; False -> j1a x}
+      ...
+  in case f (y+10) of { True -> j10a 7; False -> j10b 8 }
+
+  The first danger is this: in Simplifier iteration 1 postInlineUnconditionally
+  inlines the last functions, j10a and j10b (they are both small).  Now we have
+  two calls to j9a and two to j9b.  In the next Simplifer iteration,
+  postInlineUnconditionally inlines all four of these calls, leaving four calls
+  to j8a and j8b. Etc.
+
+  Happily, this probably /won't/ happen because the Simplifier works top down, so it'll
+  inline j1a/j1b into j2a/j2b, which will make the latter bigger; so the process
+  will stop.  But we still need to stop the inline cascade described at the head
+  of this Note.
+
 Some guidance on setting these defaults:
 
 * A low treshold (<= 2) is needed to prevent exponential cases from spiraling out of
   control. We picked 2 for no particular reason.
+
 * Scaling the penalty by any more than 30 means the reproducer from
   T18730 won't compile even with reasonably small values of n. Instead
   it will run out of runs/ticks. This means to positively affect the reproducer
   a scaling <= 30 is required.
+
 * A scaling of >= 15 still causes a few very large regressions on some nofib benchmarks.
   (+80% for gc/fulsom, +90% for real/ben-raytrace, +20% for spectral/fibheaps)
+
 * A scaling of >= 25 showed no regressions on nofib. However it showed a number of
   (small) regression for compiler perf benchmarks.
 
@@ -222,15 +279,15 @@ This gives us minimal compiler perf regressions. No nofib runtime regressions an
 will still avoid this pattern sometimes. This is a "safe" default, where we err on
 the side of compiler blowup instead of risking runtime regressions.
 
-For cases where the default falls short the flag can be changed to allow more/less inlining as
-needed on a per-module basis.
+For cases where the default falls short the flag can be changed to allow
+more/less inlining as needed on a per-module basis.
 
 -}
 
-tryUnfolding :: Logger -> UnfoldingOpts -> Int -> Id -> Bool -> [ArgSummary] -> CallCtxt
+tryUnfolding :: Logger -> UnfoldingOpts -> Int -> Int -> Id -> Bool -> [ArgSummary] -> CallCtxt
              -> CoreExpr -> UnfoldingCache -> UnfoldingGuidance
              -> Maybe CoreExpr
-tryUnfolding logger opts !case_depth id lone_variable arg_infos
+tryUnfolding logger opts !case_depth !inline_depth id lone_variable arg_infos
              cont_info unf_template unf_cache guidance
  = case guidance of
      UnfNever -> traceInline logger opts id str (text "UnfNever") Nothing
@@ -263,8 +320,7 @@ tryUnfolding logger opts !case_depth id lone_variable arg_infos
           small_enough = adjusted_size <= unfoldingUseThreshold opts
           discount = computeDiscount arg_discounts res_discount arg_infos cont_info
 
-          extra_doc = vcat [ text "case depth =" <+> int case_depth
-                           , text "depth based penalty =" <+> int depth_penalty
+          extra_doc = vcat [ text "depth based penalty =" <+> int depth_penalty
                            , text "discounted size =" <+> int adjusted_size ]
   where
     -- Unpack the UnfoldingCache lazily because it may not be needed, and all
@@ -281,6 +337,8 @@ tryUnfolding logger opts !case_depth id lone_variable arg_infos
              , text "is exp:" <+> ppr is_exp
              , text "is work-free:" <+> ppr is_wf
              , text "guidance" <+> ppr guidance
+             , text "case depth =" <+> int case_depth
+             , text "inline depth =" <+> int inline_depth
              , extra_doc
              , text "ANSWER =" <+> if yes_or_no then text "YES" else text "NO"]
 


=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -13,14 +13,12 @@ module GHC.Core.Opt.Simplify.Iteration ( simplTopBinds, simplExpr, simplImpRules
 
 import GHC.Prelude
 
-import GHC.Platform
-
 import GHC.Driver.Flags
 
 import GHC.Core
 import GHC.Core.Opt.Simplify.Monad
 import GHC.Core.Opt.ConstantFold
-import GHC.Core.Type hiding ( substTy, substTyVar, extendTvSubst, extendCvSubst )
+import GHC.Core.Type hiding ( substCo, substTy, substTyVar, extendTvSubst, extendCvSubst )
 import GHC.Core.TyCo.Compare( eqType )
 import GHC.Core.Opt.Simplify.Env
 import GHC.Core.Opt.Simplify.Inline
@@ -45,6 +43,7 @@ import GHC.Core.Opt.Arity ( ArityType, exprArity, arityTypeBotSigs_maybe
                           , pushCoTyArg, pushCoValArg, exprIsDeadEnd
                           , typeArity, arityTypeArity, etaExpandAT )
 import GHC.Core.SimpleOpt ( exprIsConApp_maybe, joinPointBinding_maybe, joinPointBindings_maybe )
+-- import GHC.Core.FVs     ( mkRuleInfo, exprsFreeIds )
 import GHC.Core.FVs     ( mkRuleInfo )
 import GHC.Core.Rules   ( lookupRule, getRules )
 import GHC.Core.Multiplicity
@@ -60,10 +59,12 @@ import GHC.Types.Unique ( hasKey )
 import GHC.Types.Basic
 import GHC.Types.Tickish
 import GHC.Types.Var    ( isTyCoVar )
+-- import GHC.Types.Var.Set
 import GHC.Builtin.PrimOps ( PrimOp (SeqOp) )
 import GHC.Builtin.Types.Prim( realWorldStatePrimTy )
 import GHC.Builtin.Names( runRWKey )
 
+-- import GHC.Data.Maybe   ( isNothing, isJust, orElse, mapMaybe )
 import GHC.Data.Maybe   ( isNothing, orElse, mapMaybe )
 import GHC.Data.FastString
 import GHC.Unit.Module ( moduleName )
@@ -399,7 +400,8 @@ simplJoinBind env is_rec cont old_bndr new_bndr rhs rhs_se
         ; completeBind env (BC_Join is_rec cont) old_bndr new_bndr rhs' }
 
 --------------------------
-simplAuxBind :: SimplEnv
+simplAuxBind :: String
+             -> SimplEnv
              -> InId            -- Old binder; not a JoinId
              -> OutExpr         -- Simplified RHS
              -> SimplM (SimplFloats, SimplEnv)
@@ -411,17 +413,22 @@ simplAuxBind :: SimplEnv
 --
 -- Precondition: rhs satisfies the let-can-float invariant
 
-simplAuxBind env bndr new_rhs
+simplAuxBind _str env bndr new_rhs
   | assertPpr (isId bndr && not (isJoinId bndr)) (ppr bndr) $
     isDeadBinder bndr   -- Not uncommon; e.g. case (a,b) of c { (p,q) -> p }
   = return (emptyFloats env, env)    --  Here c is dead, and we avoid
                                      --  creating the binding c = (a,b)
 
   -- The cases would be inlined unconditionally by completeBind:
-  -- but it seems not uncommon, and avoids faff to do it here
-  -- This is safe because it's only used for auxiliary bindings, which
-  -- have no NOLINE pragmas, nor RULEs
+  -- but it seems not uncommon, and it turns to be a little more
+  -- efficient (in compile time allocations) to do it here.
+  -- Effectively this is just a poor man's postInlineUnconditionally
+  -- See Note [Post-inline for single-use things] in GHC.Core.Opt.Simplify.Utils
+  -- Note: auxiliary bindings have no NOLINE pragmas, RULEs, or stable unfoldings
   | exprIsTrivial new_rhs  -- Short-cut for let x = y in ...
+    || case (idOccInfo bndr) of
+          OneOcc{ occ_n_br = 1, occ_in_lam = NotInsideLam } -> True
+          _                                                 -> False
   = return ( emptyFloats env
            , case new_rhs of
                 Coercion co -> extendCvSubst env bndr co
@@ -587,11 +594,10 @@ Note [Concrete types] in GHC.Tc.Utils.Concrete.
 -}
 
 tryCastWorkerWrapper :: SimplEnv -> BindContext
-                     -> InId -> OccInfo
-                     -> OutId -> OutExpr
+                     -> InId -> OutId -> OutExpr
                      -> SimplM (SimplFloats, SimplEnv)
 -- See Note [Cast worker/wrapper]
-tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
+tryCastWorkerWrapper env bind_cxt old_bndr bndr (Cast rhs co)
   | BC_Let top_lvl is_rec <- bind_cxt  -- Not join points
   , not (isDFunId bndr) -- nor DFuns; cast w/w is no help, and we can't transform
                         --            a DFunUnfolding in mk_worker_unfolding
@@ -617,7 +623,7 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
 
                triv_rhs = Cast (Var work_id_w_unf) co
 
-        ; if postInlineUnconditionally env bind_cxt bndr occ_info triv_rhs
+        ; if postInlineUnconditionally env bind_cxt old_bndr bndr triv_rhs
              -- Almost always True, because the RHS is trivial
              -- In that case we want to eliminate the binding fast
              -- We conservatively use postInlineUnconditionally so that we
@@ -660,7 +666,7 @@ tryCastWorkerWrapper env bind_cxt old_bndr occ_info bndr (Cast rhs co)
              | isStableSource src -> return (unf { uf_tmpl = mkCast unf_rhs (mkSymCo co) })
            _ -> mkLetUnfolding uf_opts top_lvl VanillaSrc work_id work_rhs
 
-tryCastWorkerWrapper env _ _ _ bndr rhs  -- All other bindings
+tryCastWorkerWrapper env _ _ bndr rhs  -- All other bindings
   = do { traceSmpl "tcww:no" (vcat [ text "bndr:" <+> ppr bndr
                                    , text "rhs:" <+> ppr rhs ])
         ; return (mkFloatBind env (NonRec bndr rhs)) }
@@ -939,7 +945,6 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
  = assert (isId new_bndr) $
    do { let old_info = idInfo old_bndr
             old_unf  = realUnfoldingInfo old_info
-            occ_info = occInfo old_info
 
          -- Do eta-expansion on the RHS of the binding
          -- See Note [Eta-expanding at let bindings] in GHC.Core.Opt.Simplify.Utils
@@ -952,7 +957,7 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
       ; let new_bndr_w_info = addLetBndrInfo new_bndr new_arity new_unfolding
         -- See Note [In-scope set as a substitution]
 
-      ; if postInlineUnconditionally env bind_cxt new_bndr_w_info occ_info eta_rhs
+      ; if postInlineUnconditionally env bind_cxt old_bndr new_bndr_w_info eta_rhs
 
         then -- Inline and discard the binding
              do  { tick (PostInlineUnconditionally old_bndr)
@@ -966,8 +971,9 @@ completeBind env bind_cxt old_bndr new_bndr new_rhs
                 -- substitution will happen, since we are going to discard the binding
 
         else -- Keep the binding; do cast worker/wrapper
-             -- pprTrace "Binding" (ppr new_bndr <+> ppr new_unfolding) $
-             tryCastWorkerWrapper env bind_cxt old_bndr occ_info new_bndr_w_info eta_rhs }
+--             simplTrace "completeBind" (vcat [ text "bndrs" <+> ppr old_bndr <+> ppr new_bndr
+--                                             , text "eta_rhs" <+> ppr eta_rhs ]) $
+             tryCastWorkerWrapper env bind_cxt old_bndr new_bndr_w_info eta_rhs }
 
 addLetBndrInfo :: OutId -> ArityType -> Unfolding -> OutId
 addLetBndrInfo new_bndr new_arity_type new_unf
@@ -1331,10 +1337,16 @@ simplCoercionF env co cont
 
 simplCoercion :: SimplEnv -> InCoercion -> SimplM OutCoercion
 simplCoercion env co
-  = do { let opt_co = optCoercion opts (getSubst env) co
+  = do { let opt_co | reSimplifying env = substCo env co
+                    | otherwise         = optCoercion opts subst co
+             -- If (reSimplifying env) is True we have already
+             -- simplified this coercion once, and we don't
+             -- want do so again; doing so repeatedly risks
+             -- non-linear behaviour
        ; seqCo opt_co `seq` return opt_co }
   where
-    opts = seOptCoercionOpts env
+    subst = getSubst env
+    opts  = seOptCoercionOpts env
 
 -----------------------------------
 -- | Push a TickIt context outwards past applications and cases, as
@@ -1443,8 +1455,8 @@ simplTick env tickish expr cont
   splitCont :: SimplCont -> (SimplCont, SimplCont)
   splitCont cont@(ApplyToTy { sc_cont = tail }) = (cont { sc_cont = inc }, outc)
     where (inc,outc) = splitCont tail
-  splitCont (CastIt co c) = (CastIt co inc, outc)
-    where (inc,outc) = splitCont c
+  splitCont cont@(CastIt { sc_cont = tail }) = (cont { sc_cont = inc }, outc)
+    where (inc,outc) = splitCont tail
   splitCont other = (mkBoringStop (contHoleType other), other)
 
   getDoneId (DoneId id)  = Just id
@@ -1500,8 +1512,11 @@ rebuild env expr cont
   = case cont of
       Stop {}          -> return (emptyFloats env, expr)
       TickIt t cont    -> rebuild env (mkTick t expr) cont
-      CastIt co cont   -> rebuild env (mkCast expr co) cont
-                       -- NB: mkCast implements the (Coercion co |> g) optimisation
+      CastIt { sc_co = co, sc_opt = opt, sc_cont = cont }
+        -> rebuild env (mkCast expr co') cont
+           -- NB: mkCast implements the (Coercion co |> g) optimisation
+        where
+          co' = optOutCoercion env co opt
 
       Select { sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont }
         -> rebuildCase (se `setInScopeFromE` env) expr bndr alts cont
@@ -1552,7 +1567,7 @@ completeBindX env from_what bndr rhs body cont
                                                bndr2 (emptyFloats env) rhs
               -- NB: it makes a surprisingly big difference (5% in compiler allocation
               -- in T9630) to pass 'env' rather than 'env1'.  It's fine to pass 'env',
-              -- because this is simplNonRecX, so bndr is not in scope in the RHS.
+              -- because this is completeBindX, so bndr is not in scope in the RHS.
 
         ; (bind_float, env2) <- completeBind (env2 `setInScopeFromF` rhs_floats)
                                              (BC_Let NotTopLevel NonRecursive)
@@ -1600,36 +1615,73 @@ isReflexiveCo
 
 In investigating this I saw missed opportunities for on-the-fly
 coercion shrinkage. See #15090.
+
+Note [Avoid re-simplifying coercions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In some benchmarks (with deeply nested cases) we successively push
+casts onto the SimplCont.  We don't want to call the coercion optimiser
+on each successive composition -- that's at least quadratic.  So:
+
+* The CastIt constructor in SimplCont has a `sc_opt :: Bool` flag to
+  record whether the coercion optimiser has been applied to the coercion.
+
+* In `simplCast`, when we see (Cast e co), we simplify `co` to get
+  an OutCoercion, and built a CastIt with sc_opt=True.
+
+  Actually not quite: if we are simplifying the result of inlining an
+  unfolding (seInlineDepth > 0), then instead of /optimising/ it again,
+  just /substitute/ which is cheaper.  See `simplCoercion`.
+
+* In `addCoerce` (in `simplCast`) if we combine this new coercion with
+  an existing once, we build a CastIt for (co1 ; co2) with sc_opt=False.
+
+* When unpacking a CastIt, in `rebuildCall` and `rebuild`, we optimise
+  the (presumably composed) coercion if sc_opt=False; this is done
+  by `optOutCoercion`.
+
+* When duplicating a continuation in `mkDupableContWithDmds`, before
+  duplicating a CastIt, optimise the coercion. Otherwise we'll end up
+  optimising it separately in the duplicate copies.
 -}
 
 
-simplCast :: SimplEnv -> InExpr -> Coercion -> SimplCont
+optOutCoercion :: SimplEnv -> OutCoercion -> Bool -> OutCoercion
+-- See Note [Avoid re-simplifying coercions]
+optOutCoercion env co already_optimised
+  | already_optimised = co  -- See Note [Avoid re-simplifying coercions]
+  | otherwise         = optCoercion opts empty_subst co
+  where
+    empty_subst = mkEmptySubst (seInScope env)
+    opts = seOptCoercionOpts env
+
+simplCast :: SimplEnv -> InExpr -> InCoercion -> SimplCont
           -> SimplM (SimplFloats, OutExpr)
 simplCast env body co0 cont0
   = do  { co1   <- {-#SCC "simplCast-simplCoercion" #-} simplCoercion env co0
         ; cont1 <- {-#SCC "simplCast-addCoerce" #-}
                    if isReflCo co1
                    then return cont0  -- See Note [Optimising reflexivity]
-                   else addCoerce co1 cont0
+                   else addCoerce co1 True cont0
+                        -- True <=> co1 is optimised
         ; {-#SCC "simplCast-simplExprF" #-} simplExprF env body cont1 }
   where
+
         -- If the first parameter is MRefl, then simplifying revealed a
         -- reflexive coercion. Omit.
-        addCoerceM :: MOutCoercion -> SimplCont -> SimplM SimplCont
-        addCoerceM MRefl   cont = return cont
-        addCoerceM (MCo co) cont = addCoerce co cont
-
-        addCoerce :: OutCoercion -> SimplCont -> SimplM SimplCont
-        addCoerce co1 (CastIt co2 cont)  -- See Note [Optimising reflexivity]
-          | isReflexiveCo co' = return cont
-          | otherwise         = addCoerce co' cont
-          where
-            co' = mkTransCo co1 co2
-
-        addCoerce co (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
+        addCoerceM :: MOutCoercion -> Bool -> SimplCont -> SimplM SimplCont
+        addCoerceM MRefl    _   cont = return cont
+        addCoerceM (MCo co) opt cont = addCoerce co opt cont
+
+        addCoerce :: OutCoercion -> Bool -> SimplCont -> SimplM SimplCont
+        addCoerce co1 _ (CastIt { sc_co = co2, sc_cont = cont })  -- See Note [Optimising reflexivity]
+          = addCoerce (mkTransCo co1 co2) False cont
+                      -- False: (mkTransCo co1 co2) is not fully optimised
+                      -- See Note [Avoid re-simplifying coercions]
+
+        addCoerce co opt (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
           | Just (arg_ty', m_co') <- pushCoTyArg co arg_ty
           = {-#SCC "addCoerce-pushCoTyArg" #-}
-            do { tail' <- addCoerceM m_co' tail
+            do { tail' <- addCoerceM m_co' opt tail
                ; return (ApplyToTy { sc_arg_ty  = arg_ty'
                                    , sc_cont    = tail'
                                    , sc_hole_ty = coercionLKind co }) }
@@ -1640,18 +1692,20 @@ simplCast env body co0 cont0
         -- where   co :: (s1->s2) ~ (t1->t2)
         --         co1 :: t1 ~ s1
         --         co2 :: s2 ~ t2
-        addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
-                                      , sc_dup = dup, sc_cont = tail
-                                      , sc_hole_ty = fun_ty })
+        addCoerce co opt cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
+                                          , sc_dup = dup, sc_cont = tail
+                                          , sc_hole_ty = fun_ty })
+          | not opt  -- pushCoValArg duplicates the coercion, so optimise first
+          = addCoerce (optOutCoercion env co opt) True cont
+
           | Just (m_co1, m_co2) <- pushCoValArg co
           , fixed_rep m_co1
           = {-#SCC "addCoerce-pushCoValArg" #-}
-            do { tail' <- addCoerceM m_co2 tail
+            do { tail' <- addCoerceM m_co2 opt tail
                ; case m_co1 of {
                    MRefl -> return (cont { sc_cont = tail'
                                          , sc_hole_ty = coercionLKind co }) ;
-                      -- Avoid simplifying if possible;
-                      -- See Note [Avoiding exponential behaviour]
+                      -- See Note [Avoiding simplifying repeatedly]
 
                    MCo co1 ->
             do { (dup', arg_se', arg') <- simplArg env dup fun_ty arg_se arg
@@ -1666,11 +1720,11 @@ simplCast env body co0 cont0
                                     , sc_cont = tail'
                                     , sc_hole_ty = coercionLKind co }) } } }
 
-        addCoerce co cont
-          | isReflexiveCo co = return cont  -- Having this at the end makes a huge
-                                            -- difference in T12227, for some reason
-                                            -- See Note [Optimising reflexivity]
-          | otherwise        = return (CastIt co cont)
+        addCoerce co opt cont
+          | isReflCo co = return cont  -- Having this at the end makes a huge
+                                       -- difference in T12227, for some reason
+                                       -- See Note [Optimising reflexivity]
+          | otherwise = return (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
 
         fixed_rep :: MCoercionR -> Bool
         fixed_rep MRefl    = True
@@ -1732,7 +1786,7 @@ simpl_lam env bndr body (ApplyToVal { sc_arg = arg, sc_env = arg_se
        ; let arg_ty = funArgTy fun_ty
        ; if | isSimplified dup  -- Don't re-simplify if we've simplified it once
                                 -- Including don't preInlineUnconditionally
-                                -- See Note [Avoiding exponential behaviour]
+                                -- See Note [Avoiding simplifying repeatedly]
             -> completeBindX env (FromBeta arg_ty) bndr arg body cont
 
             | Just env' <- preInlineUnconditionally env NotTopLevel bndr arg arg_se
@@ -1866,27 +1920,35 @@ Simplifier without first calling SimpleOpt, so anything involving
 GHCi or TH and operator sections will fall over if we don't take
 care here.
 
-Note [Avoiding exponential behaviour]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Avoiding simplifying repeatedly]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 One way in which we can get exponential behaviour is if we simplify a
 big expression, and then re-simplify it -- and then this happens in a
 deeply-nested way.  So we must be jolly careful about re-simplifying
-an expression (#13379).  That is why simplNonRecX does not try
-preInlineUnconditionally (unlike simplNonRecE).
+an expression (#13379).
 
 Example:
   f BIG, where f has a RULE
 Then
  * We simplify BIG before trying the rule; but the rule does not fire
- * We inline f = \x. x True
- * So if we did preInlineUnconditionally we'd re-simplify (BIG True)
+   (forcing this simplification is why we have the RULE in this example)
+ * We inline f = \x. g x, in `simpl_lam`
+ * So if `simpl_lam` did preInlineUnconditionally we get (g BIG)
+ * Now if g has a RULE we'll simplify BIG again, and this whole thing can
+   iterate.
+ * However, if `f` did not have a RULE, so that BIG has /not/ already been
+   simplified, we /want/ to do preInlineUnconditionally in simpl_lam.
 
-However, if BIG has /not/ already been simplified, we'd /like/ to
-simplify BIG True; maybe good things happen.  That is why
+So we go to some effort to avoid repeatedly simplifying the same thing:
 
-* simplLam has
-    - a case for (isSimplified dup), which goes via simplNonRecX, and
-    - a case for the un-simplified case, which goes via simplNonRecE
+* ApplyToVal has a (sc_dup :: DupFlag) field which records if the argument
+  has been evaluated.
+
+* simplArg checks this flag to avoid re-simplifying.
+
+* simpl_lam has:
+    - a case for (isSimplified dup), which goes via completeBindX, and
+    - a case for an un-simplified argument, which tries preInlineUnconditionally
 
 * We go to some efforts to avoid unnecessarily simplifying ApplyToVal,
   in at least two places
@@ -1894,6 +1956,11 @@ simplify BIG True; maybe good things happen.  That is why
     - In rebuildCall we avoid simplifying arguments before we have to
       (see Note [Trying rewrite rules])
 
+All that said /postInlineUnconditionally/ (called in `completeBind`) does
+fire in the above (f BIG) situation.  See Note [Post-inline for single-use
+things] in Simplify.Utils.  This certainly risks repeated simplification, but
+in practice seems to be a small win.
+
 
 ************************************************************************
 *                                                                      *
@@ -2214,8 +2281,10 @@ rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args
                       _             -> True
 
 ---------- Simplify type applications and casts --------------
-rebuildCall env info (CastIt co cont)
-  = rebuildCall env (addCastTo info co) cont
+rebuildCall env info (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
+  = rebuildCall env (addCastTo info co') cont
+  where
+    co' = optOutCoercion env co opt
 
 rebuildCall env info (ApplyToTy { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = cont })
   = rebuildCall env (addTyArgTo info arg_ty hole_ty) cont
@@ -2297,7 +2366,7 @@ rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args }) cont
 -----------------------------------
 tryInlining :: SimplEnv -> Logger -> OutId -> SimplCont -> SimplM (Maybe OutExpr)
 tryInlining env logger var cont
-  | Just expr <- callSiteInline logger uf_opts case_depth var active_unf
+  | Just expr <- callSiteInline logger uf_opts case_depth inline_depth var active_unf
                                 lone_variable arg_infos interesting_cont
   = do { dump_inline expr cont
        ; return (Just expr) }
@@ -2308,6 +2377,7 @@ tryInlining env logger var cont
   where
     uf_opts    = seUnfoldingOpts env
     case_depth = seCaseDepth env
+    inline_depth = seInlineDepth env
     (lone_variable, arg_infos, call_cont) = contArgs cont
     interesting_cont = interestingCallContext env call_cont
     active_unf       = activeUnfolding (seMode env) var
@@ -2345,7 +2415,7 @@ Then given (f Int e1) we rewrite to
    (\x. x True) e1
 without simplifying e1.  Now we can inline x into its unique call site,
 and absorb the True into it all in the same pass.  If we simplified
-e1 first, we couldn't do that; see Note [Avoiding exponential behaviour].
+e1 first, we couldn't do that; see Note [Avoiding simplifying repeatedly].
 
 So we try to apply rules if either
   (a) no_more_args: we've run out of argument that the rules can "see"
@@ -2961,7 +3031,7 @@ rebuildCase env scrut case_bndr alts cont
   where
     simple_rhs env wfloats case_bndr_rhs bs rhs =
       assert (null bs) $
-      do { (floats1, env') <- simplAuxBind env case_bndr case_bndr_rhs
+      do { (floats1, env') <- simplAuxBind "rebuildCase" env case_bndr case_bndr_rhs
              -- scrut is a constructor application,
              -- hence satisfies let-can-float invariant
          ; (floats2, expr') <- simplExprF env' rhs cont
@@ -3028,7 +3098,7 @@ rebuildCase env scrut case_bndr alts@[Alt _ bndrs rhs] cont
   | all_dead_bndrs
   , doCaseToLet scrut case_bndr
   = do { tick (CaseElim case_bndr)
-       ; (floats1, env')  <- simplAuxBind env case_bndr scrut
+       ; (floats1, env')  <- simplAuxBind "rebuildCaseAlt1" env case_bndr scrut
        ; (floats2, expr') <- simplExprF env' rhs cont
        ; return (floats1 `addFloats` floats2, expr') }
 
@@ -3241,7 +3311,6 @@ simplAlts env0 scrut case_bndr alts cont'
           --     See Note [Shadowing in prepareAlts] in GHC.Core.Opt.Simplify.Utils
 
         ; alts' <- mapM (simplAlt alt_env' (Just scrut') imposs_deflt_cons case_bndr' cont') in_alts
---      ; pprTrace "simplAlts" (ppr case_bndr $$ ppr alts $$ ppr cont') $ return ()
 
         ; let alts_ty' = contResultType cont'
         -- See Note [Avoiding space leaks in OutType]
@@ -3528,8 +3597,8 @@ knownCon env scrut dc_floats dc dc_ty_args dc_args bndr bs rhs cont
              -- occur in the RHS; and simplAuxBind may therefore discard it.
              -- Nevertheless we must keep it if the case-binder is alive,
              -- because it may be used in the con_app.  See Note [knownCon occ info]
-           ; (floats1, env2) <- simplAuxBind env' b' arg  -- arg satisfies let-can-float invariant
-           ; (floats2, env3)  <- bind_args env2 bs' args
+           ; (floats1, env2) <- simplAuxBind "knownCon" env' b' arg  -- arg satisfies let-can-float invariant
+           ; (floats2, env3) <- bind_args env2 bs' args
            ; return (floats1 `addFloats` floats2, env3) }
 
     bind_args _ _ _ =
@@ -3554,7 +3623,7 @@ knownCon env scrut dc_floats dc dc_ty_args dc_args bndr bs rhs cont
                                  ; let con_app = Var (dataConWorkId dc)
                                                  `mkTyApps` dc_ty_args
                                                  `mkApps`   dc_args
-                                 ; simplAuxBind env bndr con_app }
+                                 ; simplAuxBind "case-bndr" env bndr con_app }
 
 -------------------
 missingAlt :: SimplEnv -> Id -> [InAlt] -> SimplCont
@@ -3651,9 +3720,11 @@ mkDupableContWithDmds env _ cont
 
 mkDupableContWithDmds _ _ (Stop {}) = panic "mkDupableCont"     -- Handled by previous eqn
 
-mkDupableContWithDmds env dmds (CastIt ty cont)
+mkDupableContWithDmds env dmds (CastIt { sc_co = co, sc_opt = opt, sc_cont = cont })
   = do  { (floats, cont') <- mkDupableContWithDmds env dmds cont
-        ; return (floats, CastIt ty cont') }
+        ; return (floats, CastIt { sc_co = optOutCoercion env co opt
+                                 , sc_opt = True, sc_cont = cont' }) }
+                 -- optOutCoercion: see Note [Avoid re-simplifying coercions]
 
 -- Duplicating ticks for now, not sure if this is good or not
 mkDupableContWithDmds env dmds (TickIt t cont)
@@ -3684,6 +3755,7 @@ mkDupableContWithDmds env _
   | isNothing (isDataConId_maybe (ai_fun fun))
   , thumbsUpPlanA cont  -- See point (3) of Note [Duplicating join points]
   = -- Use Plan A of Note [Duplicating StrictArg]
+--    pprTrace "Using plan A" (ppr (ai_fun fun) $$ text "args" <+> ppr (ai_args fun) $$ text "cont" <+> ppr cont) $
     do { let (_ : dmds) = ai_dmds fun
        ; (floats1, cont')  <- mkDupableContWithDmds env dmds cont
                               -- Use the demands from the function to add the right
@@ -3707,14 +3779,15 @@ mkDupableContWithDmds env _
        ; (floats, join_rhs) <- rebuildCall env' (addValArgTo fun (Var arg_bndr) fun_ty) cont
        ; mkDupableStrictBind env' arg_bndr (wrapFloats floats join_rhs) rhs_ty }
   where
+    thumbsUpPlanA (StrictBind {})              = True
+    thumbsUpPlanA (Stop {})                    = True
+    thumbsUpPlanA (Select {})                  = False  -- Not quite sure of this one, but it
+                                                       -- benefits nofib digits-of-e1 quite a bit
     thumbsUpPlanA (StrictArg {})               = False
-    thumbsUpPlanA (CastIt _ k)                 = thumbsUpPlanA k
+    thumbsUpPlanA (CastIt { sc_cont = k })     = thumbsUpPlanA k
     thumbsUpPlanA (TickIt _ k)                 = thumbsUpPlanA k
     thumbsUpPlanA (ApplyToVal { sc_cont = k }) = thumbsUpPlanA k
     thumbsUpPlanA (ApplyToTy  { sc_cont = k }) = thumbsUpPlanA k
-    thumbsUpPlanA (Select {})                  = True
-    thumbsUpPlanA (StrictBind {})              = True
-    thumbsUpPlanA (Stop {})                    = True
 
 mkDupableContWithDmds env dmds
     (ApplyToTy { sc_cont = cont, sc_arg_ty = arg_ty, sc_hole_ty = hole_ty })
@@ -3775,8 +3848,7 @@ mkDupableContWithDmds env _
         -- NB: we don't use alt_env further; it has the substEnv for
         --     the alternatives, and we don't want that
 
-        ; let platform = sePlatform env
-        ; (join_floats, alts'') <- mapAccumLM (mkDupableAlt platform case_bndr')
+        ; (join_floats, alts'') <- mapAccumLM (mkDupableAlt env case_bndr')
                                               emptyJoinFloats alts'
 
         ; let all_floats = floats `addJoinFloats` join_floats
@@ -3817,11 +3889,11 @@ mkDupableStrictBind env arg_bndr join_rhs res_ty
                             , sc_cont   = mkBoringStop res_ty
                             } ) }
 
-mkDupableAlt :: Platform -> OutId
+mkDupableAlt :: SimplEnv -> OutId
              -> JoinFloats -> OutAlt
              -> SimplM (JoinFloats, OutAlt)
-mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
-  | exprIsTrivial alt_rhs_in   -- See point (2) of Note [Duplicating join points]
+mkDupableAlt _env case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
+  | ok_to_dup_alt case_bndr alt_bndrs alt_rhs_in   -- See point (2) of Note [Duplicating join points]
   = return (jfloats, Alt con alt_bndrs alt_rhs_in)
 
   | otherwise
@@ -3852,7 +3924,7 @@ mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
               filtered_binders = map fst abstracted_binders
               -- We want to make any binder with an evaldUnfolding strict in the rhs.
               -- See Note [Call-by-value for worker args] (which also applies to join points)
-              (rhs_with_seqs) = mkStrictFieldSeqs abstracted_binders alt_rhs_in
+              rhs_with_seqs = mkStrictFieldSeqs abstracted_binders alt_rhs_in
 
               final_args = varsToCoreExprs filtered_binders
                            -- Note [Join point abstraction]
@@ -3870,15 +3942,98 @@ mkDupableAlt _platform case_bndr jfloats (Alt con alt_bndrs alt_rhs_in)
               join_rhs   = mkLams (map zapIdUnfolding final_bndrs) rhs_with_seqs
 
         ; join_bndr <- newJoinId filtered_binders rhs_ty'
-
-        ; let join_call = mkApps (Var join_bndr) final_args
+        ; let -- join_bndr_w_unf = join_bndr `setIdUnfolding`
+              --                   mkUnfolding uf_opts VanillaSrc False False join_rhs Nothing
+              -- See Note [Do not add unfoldings to join points at birth]
+              join_call = mkApps (Var join_bndr) final_args
               alt'      = Alt con alt_bndrs join_call
 
         ; return ( jfloats `addJoinFlts` unitJoinFloat (NonRec join_bndr join_rhs)
                  , alt') }
                 -- See Note [Duplicated env]
 
+ok_to_dup_alt :: OutId -> [OutVar] -> OutExpr -> Bool
+-- See Note [Duplicating alternatives]
+ok_to_dup_alt _case_bndr _alt_bndrs alt_rhs
+  | (Var v, args) <- collectArgs alt_rhs
+  , all exprIsTrivial args
+  = isNothing (isDataConId_maybe v)
+  | otherwise
+  = False
+
 {-
+Note [Do not add unfoldings to join points at birth]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider this (#15360)
+
+   case (case (case (case ...))) of
+      Left x  -> e1
+      Right y -> e2
+
+We will make a join point for e1, e2, thus
+    $j1a x = e1
+    $j1b y = e2
+
+Now those join points count as "duplicable" , so we feel free to duplicate
+them into the loop nest.  And each of those calls are then subject to
+callSiteInline, which might inline them, if e1, e2 are reasonably small.  Now,
+if this applies recursive to the next `case` inwards, and so on, the net
+effect is that we can get an exponential number of calls to $j1a and $j1b, and
+an exponential number of inlinings (since each is done independently).
+
+This hit #15360 (not a complicated program!) badly.  Our simple solution is this:
+when a join point is born, we don't give it an unfolding.  So we end up with
+    $j1a x = e1
+    $j1b y = e2
+    $j2a x = ...$j1a ... $j1b...
+    $j2b x = ...$j1a ... $j1b...
+    ... and so on...
+
+Now we are into Note [Avoid inlining into deeply nested cases] in Simplify.Inline,
+which is still a challenge.  But at least we have a chance. If we add inlinings at
+birth we never get that chance.
+
+Wrinkle
+
+(JU1) It turns out that the same problem shows up in a different guise, via
+      Note [Post-inline for single-use things] in Simplify.Utils.  I think
+      we have something like
+         case K (join $j x = <rhs> in jblah) of K y{OneOcc} -> blah
+      where $j is a freshly-born join point.  After case-of-known-constructor
+      wo we end up substituting (join $j x = <rhs> in jblah) for `y` in `blah`;
+      and thus we re-simplify that join binding.  In test T15630 this results in
+      masssive duplication.
+
+      So in `simplLetUnfolding` we spot this case a bit hackily; a freshly-born
+      join point will have OccInfo of ManyOccs, unlike an existing join point which
+      will have OneOcc.  So in simplLetUnfolding we kill the unfolding of a freshly
+      born join point.
+
+I can't quite articulate precisely why this is so important.  But it makes a MASSIVE
+difference in T15630 (a fantastic test case); and at worst it'll merely delay inlining
+join points by one simplifier iteration.
+
+Note [Duplicating alternatives]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When should we duplicate an alternative, and when should we make a join point?
+We don't want to make a join point if it will /definitely/ be inlined; that
+takes extra work to build, and an extra Simplifier iteration to do the inlining.
+So consider
+
+   case (case x of True -> e2; False -> e2) of
+     K1 a b -> f b a
+     K2 x   -> g x v
+     K3 v   -> Just v
+
+The (f b a) would turn into a join point like
+   $j1 a b = f b a
+which would immediately inline again because the call is not smaller than the RHS.
+On the other hand, the (g x v) turns into
+   $j2 x = g x v
+which won't imediately inline.  Finally the (Just v) would turn into
+   $j3 v = Just v
+and you might think that would immediately inline.
+
 Note [Fusing case continuations]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It's important to fuse two successive case continuations when the
@@ -3923,7 +4078,7 @@ inlining join points.   Consider
 
 Here the join-point RHS is very small, just a constructor
 application (K x y).  So we might inline it to get
-    case (case v of        )
+    case (case v of          )
          (     p1 -> K f x1  ) of
          (     p2 -> K f x2  )
          (     p3 -> K f x3  )
@@ -3947,14 +4102,12 @@ To achieve this:
    phase.  (The Final phase is still quite early, so we might consider
    delaying still more.)
 
-2. In mkDupableAlt and mkDupableStrictBind, generate an alterative for
-   all alternatives, except for exprIsTrival RHSs. Previously we used
-   exprIsDupable.  This generates a lot more join points, but makes
-   them much more case-of-case friendly.
+2. In mkDupableAlt and mkDupableStrictBind, generate an alterative for all
+   alternatives, /unless/ the join point would be immediately inlined in the
+   following iteration: e.g. if its RHS is trivial.
 
-   It is definitely worth checking for exprIsTrivial, otherwise we get
-   an extra Simplifier iteration, because it is inlined in the next
-   round.
+   (Previously we used exprIsDupable.)  This generates a lot more join points,
+   but makes them much more case-of-case friendly.
 
 3. By the same token we want to use Plan B in
    Note [Duplicating StrictArg] when the RHS of the new join point
@@ -3978,7 +4131,7 @@ the join point only when the RHS is
 * a constructor application? or
 * just non-trivial?
 Currently, a bit ad-hoc, but we definitely want to retain the join
-point for data constructors in mkDupalbleALt (point 2); that is the
+point for data constructors in mkDupableAlt (point 2); that is the
 whole point of #19996 described above.
 
 Historical Note [Case binders and join points]
@@ -4241,30 +4394,40 @@ simplLetUnfolding :: SimplEnv
 simplLetUnfolding env bind_cxt id new_rhs rhs_ty arity unf
   | isStableUnfolding unf
   = simplStableUnfolding env bind_cxt id rhs_ty arity unf
+
   | isExitJoinId id
-  = return noUnfolding -- See Note [Do not inline exit join points] in GHC.Core.Opt.Exitify
+  = -- See Note [Do not inline exit join points] in GHC.Core.Opt.Exitify
+    return noUnfolding
+
+  | isJoinId id
+  , too_many_occs (idOccInfo id)
+  = -- This is a tricky one!
+    -- See wrinkle (JU1) in Note [Do not add unfoldings to join points at birth]
+    return noUnfolding
+
   | otherwise
   = -- Otherwise, we end up retaining all the SimpleEnv
     let !opts = seUnfoldingOpts env
     in mkLetUnfolding opts (bindContextLevel bind_cxt) VanillaSrc id new_rhs
 
+  where
+    too_many_occs (ManyOccs {})             = True
+    too_many_occs (OneOcc { occ_n_br = n }) = n > 10 -- See #23627
+    too_many_occs IAmDead                   = False
+    too_many_occs (IAmALoopBreaker {})      = False
+
 -------------------
 mkLetUnfolding :: UnfoldingOpts -> TopLevelFlag -> UnfoldingSource
                -> InId -> OutExpr -> SimplM Unfolding
 mkLetUnfolding !uf_opts top_lvl src id new_rhs
-  = return (mkUnfolding uf_opts src is_top_lvl is_bottoming new_rhs Nothing)
-            -- We make an  unfolding *even for loop-breakers*.
-            -- Reason: (a) It might be useful to know that they are WHNF
-            --         (b) In GHC.Iface.Tidy we currently assume that, if we want to
-            --             expose the unfolding then indeed we *have* an unfolding
-            --             to expose.  (We could instead use the RHS, but currently
-            --             we don't.)  The simple thing is always to have one.
+  = return (mkCoreUnfolding src is_top_lvl new_rhs Nothing guidance)
   where
-    -- Might as well force this, profiles indicate up to 0.5MB of thunks
-    -- just from this site.
-    !is_top_lvl   = isTopLevel top_lvl
-    -- See Note [Force bottoming field]
-    !is_bottoming = isDeadEndId id
+    guidance = calcUnfoldingGuidance uf_opts (isJoinId id) is_top_bottoming new_rhs
+
+    -- Strict binding; profiles indicate up to 0.5MB of thunks
+    -- just from this site. See Note [Force bottoming field]
+    !is_top_lvl       = isTopLevel top_lvl
+    !is_top_bottoming =is_top_lvl && isDeadEndId id
 
 -------------------
 simplStableUnfolding :: SimplEnv -> BindContext
@@ -4375,6 +4538,17 @@ Wrinkles
   in GHC.Core.Opt.Simplify.Utils.  We uphold this because the join-point
   case (bind_cxt = BC_Join {}) doesn't use eta_expand.
 
+Note [Heavily used join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+After inining join points we can end up with
+  let $j x = <rhs>
+  in case x1 of
+     True -> case x2 of
+                True -> $j blah1
+                False -> $j blah2
+     False -> case x3 of ....
+with a huge case tree
+
 Note [Force bottoming field]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We need to force bottoming, or the new unfolding holds


=====================================
compiler/GHC/Core/Opt/Simplify/Monad.hs
=====================================
@@ -219,7 +219,6 @@ newJoinId bndrs body_ty
              join_arity = length bndrs
              details    = JoinId join_arity Nothing
              id_info    = vanillaIdInfo `setArityInfo` arity
---                                        `setOccInfo` strongLoopBreaker
 
        ; return (mkLocalVar details name ManyTy join_id_ty id_info) }
 


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -162,9 +162,12 @@ data SimplCont
 
 
   | CastIt              -- (CastIt co K)[e] = K[ e `cast` co ]
-        OutCoercion             -- The coercion simplified
+      { sc_co   :: OutCoercion  -- The coercion simplified
                                 -- Invariant: never an identity coercion
-        SimplCont
+      , sc_opt  :: Bool         -- True <=> sc_co has had optCoercion applied to it
+                                --      See Note [Avoid re-simplifying coercions]
+                                --      in GHC.Core.Opt.Simplify.Iteration
+      , sc_cont :: SimplCont }
 
   | ApplyToVal         -- (ApplyToVal arg K)[e] = K[ e arg ]
       { sc_dup     :: DupFlag   -- See Note [DupFlag invariants]
@@ -272,8 +275,10 @@ instance Outputable SimplCont where
     = text "Stop" <> brackets (sep $ punctuate comma pps) <+> ppr ty
     where
       pps = [ppr interesting] ++ [ppr eval_sd | eval_sd /= topSubDmd]
-  ppr (CastIt co cont  )    = (text "CastIt" <+> pprOptCo co) $$ ppr cont
-  ppr (TickIt t cont)       = (text "TickIt" <+> ppr t) $$ ppr cont
+  ppr (CastIt { sc_co = co, sc_cont = cont })
+    = (text "CastIt" <+> pprOptCo co) $$ ppr cont
+  ppr (TickIt t cont)
+    = (text "TickIt" <+> ppr t) $$ ppr cont
   ppr (ApplyToTy  { sc_arg_ty = ty, sc_cont = cont })
     = (text "ApplyToTy" <+> pprParendType ty) $$ ppr cont
   ppr (ApplyToVal { sc_arg = arg, sc_dup = dup, sc_cont = cont, sc_hole_ty = hole_ty })
@@ -284,9 +289,9 @@ instance Outputable SimplCont where
     = (text "StrictBind" <+> ppr b) $$ ppr cont
   ppr (StrictArg { sc_fun = ai, sc_cont = cont })
     = (text "StrictArg" <+> ppr (ai_fun ai)) $$ ppr cont
-  ppr (Select { sc_dup = dup, sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont })
+  ppr (Select { sc_dup = dup, sc_bndr = bndr, sc_alts = alts, sc_cont = cont })
     = (text "Select" <+> ppr dup <+> ppr bndr) $$
-       whenPprDebug (nest 2 $ vcat [ppr (seTvSubst se), ppr alts]) $$ ppr cont
+      whenPprDebug (nest 2 $ ppr alts) $$ ppr cont
 
 
 {- Note [The hole type in ApplyToTy]
@@ -350,6 +355,7 @@ data ArgSpec
           , as_hole_ty :: OutType }   -- Type of the function (presumably forall a. blah)
 
   | CastBy OutCoercion                -- Cast by this; c.f. CastIt
+                                      -- Coercion is optimised
 
 instance Outputable ArgInfo where
   ppr (ArgInfo { ai_fun = fun, ai_args = args, ai_dmds = dmds })
@@ -412,7 +418,8 @@ pushSimplifiedArg env (ValArg { as_arg = arg, as_hole_ty = hole_ty }) cont
   = ApplyToVal { sc_arg = arg, sc_env = env, sc_dup = Simplified
                  -- The SubstEnv will be ignored since sc_dup=Simplified
                , sc_hole_ty = hole_ty, sc_cont = cont }
-pushSimplifiedArg _ (CastBy c) cont = CastIt c cont
+pushSimplifiedArg _ (CastBy c) cont
+  = CastIt { sc_co = c, sc_cont = cont, sc_opt = True }
 
 argInfoExpr :: OutId -> [ArgSpec] -> OutExpr
 -- NB: the [ArgSpec] is reversed so that the first arg
@@ -469,7 +476,7 @@ mkLazyArgStop ty fun_info = Stop ty (lazyArgContext fun_info) arg_sd
 -------------------
 contIsRhs :: SimplCont -> Maybe RecFlag
 contIsRhs (Stop _ (RhsCtxt is_rec) _) = Just is_rec
-contIsRhs (CastIt _ k)                = contIsRhs k   -- For f = e |> co, treat e as Rhs context
+contIsRhs (CastIt { sc_cont = k })    = contIsRhs k   -- For f = e |> co, treat e as Rhs context
 contIsRhs _                           = Nothing
 
 -------------------
@@ -483,7 +490,7 @@ contIsDupable (ApplyToTy  { sc_cont = k })      = contIsDupable k
 contIsDupable (ApplyToVal { sc_dup = OkToDup }) = True -- See Note [DupFlag invariants]
 contIsDupable (Select { sc_dup = OkToDup })     = True -- ...ditto...
 contIsDupable (StrictArg { sc_dup = OkToDup })  = True -- ...ditto...
-contIsDupable (CastIt _ k)                      = contIsDupable k
+contIsDupable (CastIt { sc_cont = k })          = contIsDupable k
 contIsDupable _                                 = False
 
 -------------------
@@ -492,13 +499,13 @@ contIsTrivial (Stop {})                                         = True
 contIsTrivial (ApplyToTy { sc_cont = k })                       = contIsTrivial k
 -- This one doesn't look right.  A value application is not trivial
 -- contIsTrivial (ApplyToVal { sc_arg = Coercion _, sc_cont = k }) = contIsTrivial k
-contIsTrivial (CastIt _ k)                                      = contIsTrivial k
+contIsTrivial (CastIt { sc_cont = k })                          = contIsTrivial k
 contIsTrivial _                                                 = False
 
 -------------------
 contResultType :: SimplCont -> OutType
 contResultType (Stop ty _ _)                = ty
-contResultType (CastIt _ k)                 = contResultType k
+contResultType (CastIt { sc_cont = k })     = contResultType k
 contResultType (StrictBind { sc_cont = k }) = contResultType k
 contResultType (StrictArg { sc_cont = k })  = contResultType k
 contResultType (Select { sc_cont = k })     = contResultType k
@@ -509,7 +516,7 @@ contResultType (TickIt _ k)                 = contResultType k
 contHoleType :: SimplCont -> OutType
 contHoleType (Stop ty _ _)                    = ty
 contHoleType (TickIt _ k)                     = contHoleType k
-contHoleType (CastIt co _)                    = coercionLKind co
+contHoleType (CastIt { sc_co = co })          = coercionLKind co
 contHoleType (StrictBind { sc_bndr = b, sc_dup = dup, sc_env = se })
   = perhapsSubstTy dup se (idType b)
 contHoleType (StrictArg  { sc_fun_ty = ty })  = funArgTy ty
@@ -529,7 +536,8 @@ contHoleType (Select { sc_dup = d, sc_bndr =  b, sc_env = se })
 -- case-of-case transformation.
 contHoleScaling :: SimplCont -> Mult
 contHoleScaling (Stop _ _ _) = OneTy
-contHoleScaling (CastIt _ k) = contHoleScaling k
+contHoleScaling (CastIt { sc_cont = k })
+  = contHoleScaling k
 contHoleScaling (StrictBind { sc_bndr = id, sc_cont = k })
   = idMult id `mkMultMul` contHoleScaling k
 contHoleScaling (Select { sc_bndr = id, sc_cont = k })
@@ -548,14 +556,14 @@ countArgs :: SimplCont -> Int
 -- and other values; skipping over casts.
 countArgs (ApplyToTy  { sc_cont = cont }) = 1 + countArgs cont
 countArgs (ApplyToVal { sc_cont = cont }) = 1 + countArgs cont
-countArgs (CastIt _ cont)                 = countArgs cont
+countArgs (CastIt     { sc_cont = cont }) = countArgs cont
 countArgs _                               = 0
 
 countValArgs :: SimplCont -> Int
 -- Count value arguments only
 countValArgs (ApplyToTy  { sc_cont = cont }) = countValArgs cont
 countValArgs (ApplyToVal { sc_cont = cont }) = 1 + countValArgs cont
-countValArgs (CastIt _ cont)                 = countValArgs cont
+countValArgs (CastIt     { sc_cont = cont }) = countValArgs cont
 countValArgs _                               = 0
 
 -------------------
@@ -575,7 +583,7 @@ contArgs cont
     go args (ApplyToVal { sc_arg = arg, sc_env = se, sc_cont = k })
                                         = go (is_interesting arg se : args) k
     go args (ApplyToTy { sc_cont = k }) = go args k
-    go args (CastIt _ k)                = go args k
+    go args (CastIt { sc_cont = k })    = go args k
     go args k                           = (False, reverse args, k)
 
     is_interesting arg se = interestingArg se arg
@@ -594,10 +602,10 @@ contArgs cont
 -- about what to do then and no call sites so far seem to care.
 contEvalContext :: SimplCont -> SubDemand
 contEvalContext k = case k of
-  (Stop _ _ sd)              -> sd
-  (TickIt _ k)               -> contEvalContext k
-  (CastIt _ k)               -> contEvalContext k
-  ApplyToTy{sc_cont=k}       -> contEvalContext k
+  Stop _ _ sd              -> sd
+  TickIt _ k               -> contEvalContext k
+  CastIt   { sc_cont = k } -> contEvalContext k
+  ApplyToTy{ sc_cont = k } -> contEvalContext k
     --  ApplyToVal{sc_cont=k}      -> mkCalledOnceDmd $ contEvalContext k
     -- Not 100% sure that's correct, . Here's an example:
     --   f (e x) and f :: <SC(S,C(1,L))>
@@ -881,7 +889,7 @@ interestingCallContext env cont
     interesting (Stop _ cci _)               = cci
     interesting (TickIt _ k)                 = interesting k
     interesting (ApplyToTy { sc_cont = k })  = interesting k
-    interesting (CastIt _ k)                 = interesting k
+    interesting (CastIt { sc_cont = k })     = interesting k
         -- If this call is the arg of a strict function, the context
         -- is a bit interesting.  If we inline here, we may get useful
         -- evaluation information to avoid repeated evals: e.g.
@@ -921,7 +929,7 @@ contHasRules cont
   where
     go (ApplyToVal { sc_cont = cont }) = go cont
     go (ApplyToTy  { sc_cont = cont }) = go cont
-    go (CastIt _ cont)                 = go cont
+    go (CastIt { sc_cont = cont })     = go cont
     go (StrictArg { sc_fun = fun })    = ai_encl fun
     go (Stop _ RuleArgCtxt _)          = True
     go (TickIt _ c)                    = go c
@@ -1514,15 +1522,14 @@ rules] for details.
 
 postInlineUnconditionally
     :: SimplEnv -> BindContext
-    -> OutId            -- The binder (*not* a CoVar), including its unfolding
-    -> OccInfo          -- From the InId
+    -> InId -> OutId    -- The binder (*not* a CoVar), including its unfolding
     -> OutExpr
     -> Bool
 -- Precondition: rhs satisfies the let-can-float invariant
 -- See Note [Core let-can-float invariant] in GHC.Core
 -- Reason: we don't want to inline single uses, or discard dead bindings,
 --         for unlifted, side-effect-ful bindings
-postInlineUnconditionally env bind_cxt bndr occ_info rhs
+postInlineUnconditionally env bind_cxt old_bndr bndr rhs
   | not active                  = False
   | isWeakLoopBreaker occ_info  = False -- If it's a loop-breaker of any kind, don't inline
                                         -- because it might be referred to "earlier"
@@ -1530,27 +1537,23 @@ postInlineUnconditionally env bind_cxt bndr occ_info rhs
   | isTopLevel (bindContextLevel bind_cxt)
                                 = False -- Note [Top level and postInlineUnconditionally]
   | exprIsTrivial rhs           = True
-  | BC_Join {} <- bind_cxt              -- See point (1) of Note [Duplicating join points]
-  , not (phase == FinalPhase)   = False -- in Simplify.hs
+  | BC_Join {} <- bind_cxt      = False -- See point (1) of Note [Duplicating join points]
+--   , not (phase == FinalPhase)   = False -- in Simplify.hs
   | otherwise
   = case occ_info of
       OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt, occ_n_br = n_br }
         -- See Note [Inline small things to avoid creating a thunk]
 
-        -> n_br < 100  -- See Note [Suppress exponential blowup]
+        | let not_inside_lam = in_lam == NotInsideLam
+        -> n_br < 100  -- See #23627
 
-           && smallEnoughToInline uf_opts unfolding     -- Small enough to dup
-                        -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
-                        --
-                        -- NB: Do NOT inline arbitrarily big things, even if occ_n_br=1
-                        -- Reason: doing so risks exponential behaviour.  We simplify a big
-                        --         expression, inline it, and simplify it again.  But if the
-                        --         very same thing happens in the big expression, we get
-                        --         exponential cost!
-                        -- PRINCIPLE: when we've already simplified an expression once,
-                        -- make sure that we only inline it if it's reasonably small.
-
-           && (in_lam == NotInsideLam ||
+           && (  (n_br == 1 && not_inside_lam)
+                      -- See Note [Post-inline for single-use things]
+              || (is_lazy && smallEnoughToInline uf_opts unfolding))
+                      -- Lazy, and small enough to dup
+                      -- ToDo: consider discount on smallEnoughToInline if int_cxt is true
+
+           && (not_inside_lam ||
                         -- Outside a lambda, we want to be reasonably aggressive
                         -- about inlining into multiple branches of case
                         -- e.g. let x = <non-value>
@@ -1571,19 +1574,9 @@ postInlineUnconditionally env bind_cxt bndr occ_info rhs
 
       _ -> False
 
--- Here's an example that we don't handle well:
---      let f = if b then Left (\x.BIG) else Right (\y.BIG)
---      in \y. ....case f of {...} ....
--- Here f is used just once, and duplicating the case work is fine (exprIsCheap).
--- But
---  - We can't preInlineUnconditionally because that would invalidate
---    the occ info for b.
---  - We can't postInlineUnconditionally because the RHS is big, and
---    that risks exponential behaviour
---  - We can't call-site inline, because the rhs is big
--- Alas!
-
   where
+    is_lazy   = not (isStrictId bndr)
+    occ_info  = idOccInfo old_bndr
     unfolding = idUnfolding bndr
     uf_opts   = seUnfoldingOpts env
     phase     = sePhase env
@@ -1608,37 +1601,51 @@ in allocation if you miss this out.  And bits of GHC itself start
 to allocate more.  An egregious example is test perf/compiler/T14697,
 where GHC.Driver.CmdLine.$wprocessArgs allocated hugely more.
 
-Note [Suppress exponential blowup]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In #13253, and several related tickets, we got an exponential blowup
-in code size from postInlineUnconditionally.  The trouble comes when
-we have
-  let j1a = case f y     of { True -> p;   False -> q }
-      j1b = case f y     of { True -> q;   False -> p }
-      j2a = case f (y+1) of { True -> j1a; False -> j1b }
-      j2b = case f (y+1) of { True -> j1b; False -> j1a }
-      ...
-  in case f (y+10) of { True -> j10a; False -> j10b }
-
-when there are many branches. In pass 1, postInlineUnconditionally
-inlines j10a and j10b (they are both small).  Now we have two calls
-to j9a and two to j9b.  In pass 2, postInlineUnconditionally inlines
-all four of these calls, leaving four calls to j8a and j8b. Etc.
-Yikes!  This is exponential!
-
-A possible plan: stop doing postInlineUnconditionally
-for some fixed, smallish number of branches, say 4. But that turned
-out to be bad: see Note [Inline small things to avoid creating a thunk].
-And, as it happened, the problem with #13253 was solved in a
-different way (Note [Duplicating StrictArg] in Simplify).
-
-So I just set an arbitrary, high limit of 100, to stop any
-totally exponential behaviour.
-
-This still leaves the nasty possibility that /ordinary/ inlining (not
-postInlineUnconditionally) might inline these join points, each of
-which is individually quiet small.  I'm still not sure what to do
-about this (e.g. see #15488).
+Note [Post-inline for single-use things]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+If we have
+
+   let x = rhs in ...x...
+
+and `x` is used exactly once, and not inside a lambda, then we will usually
+preInlineUnconditinally. But we can still get this situation in
+postInlineUnconditionally:
+
+  case K rhs of K x -> ...x....
+
+Here we'll use `simplAuxBind` to bind `x` to (the already-simplified) `rhs`;
+and `x` is used exactly once.  It's beneficial to inline right away; otherwise
+we risk creating
+
+   let x = rhs in ...x...
+
+which will take another iteration of the Simplifier to eliminate.  We do this in
+two places
+
+1. In the full `postInlineUnconditionally` look for the special case
+   of "one occurrence, not under a lambda", and inline unconditionally then.
+
+   This is a bit risky: see Note [Avoiding simplifying repeatedly] in
+   Simplify.Iteration.  But in practice it seems to be a small win.
+
+2. `simplAuxBind` does a kind of poor-man's `postInlineUnconditionally`.  It
+   does not need to account for many of the cases (e.g. top level) that the
+   full `postInlineUnconditionally` does.  Moreover, we don't have an
+   OutId, which `postInlineUnconditionally` needs.  I got a slight improvement
+   in compiler performance when I added this test.
+
+Here's an example that we don't currently handle well:
+     let f = if b then Left (\x.BIG) else Right (\y.BIG)
+     in \y. ....case f of {...} ....
+Here f is used just once, and duplicating the case work is fine (exprIsCheap).
+But
+ - We can't preInlineUnconditionally because that would invalidate
+   the occ info for b.
+ - We can't postInlineUnconditionally because the RHS is big, and
+   that risks exponential behaviour
+ - We can't call-site inline, because the rhs is big
+Alas!
+
 
 Note [Top level and postInlineUnconditionally]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/SimpleOpt.hs
=====================================
@@ -512,6 +512,12 @@ do_beta_by_substitution bndr rhs
   = exprIsTrivial rhs                   -- Can duplicate
     || safe_to_inline (idOccInfo bndr)  -- Occurs at most once
 
+do_case_elim :: CoreExpr -> Id -> [Id] -> Bool
+do_case_elim scrut case_bndr alt_bndrs
+  =  exprIsHNF scrut
+  && safe_to_inline (idOccInfo case_bndr)
+  && all isDeadBinder alt_bndrs
+
 -------------------
 simple_out_bind :: TopLevelFlag
                 -> SimpleOptEnv
@@ -1290,13 +1296,17 @@ exprIsConApp_maybe ise@(ISE in_scope id_unf) expr
          in go subst' (float:floats) expr cont
 
     go subst floats (Case scrut b _ [Alt con vars expr]) cont
+       | do_case_elim scrut' b vars  -- See Note [Case elim in exprIsConApp_maybe]
+       = go (extend subst b scrut') floats expr cont
+       | otherwise
        = let
-          scrut'           = subst_expr subst scrut
           (subst', b')     = subst_bndr subst b
           (subst'', vars') = subst_bndrs subst' vars
           float            = FloatCase scrut' b' con vars'
          in
            go subst'' (float:floats) expr cont
+       where
+          scrut'           = subst_expr subst scrut
 
     go (Right sub) floats (Var v) cont
        = go (Left (getSubstInScope sub))
@@ -1417,6 +1427,27 @@ dealWithStringLiteral fun str co =
       in pushCoDataCon consDataCon [Type charTy, char_expr, rest] co
 
 {-
+Note [Case elim in exprIsConApp_maybe]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+   data K a = MkK !a
+
+   $WMkK x = case x of y -> K y   -- Wrapper for MkK
+
+   ...case $WMkK v of K w -> <rhs>
+
+We call `exprIsConApp_maybe` on ($WMkK v); we inline the wrapper
+and beta-reduce, so we get to
+   exprIsConApp_maybe (case v of y -> K y)
+
+So we may float the case, and end up with
+   case v of y -> <rhs>[y/w]
+
+But if `v` is already evaluated, the next run of the Simplifier will
+eliminate the case, and we may then make more progress with <rhs>.
+Better to do it in one iteration.  Hence the `do_case_elim`
+check in `exprIsConApp_maybe`.
+
 Note [Unfolding DFuns]
 ~~~~~~~~~~~~~~~~~~~~~~
 DFuns look like


=====================================
compiler/GHC/Core/TyCo/Rep.hs
=====================================
@@ -152,7 +152,7 @@ data Type
                         --    for example unsaturated type synonyms
                         --    can appear as the right hand side of a type synonym.
 
-  | ForAllTy
+  | ForAllTy            -- See Note [Weird typing rule for ForAllTy]
         {-# UNPACK #-} !ForAllTyBinder
         Type            -- ^ A Π type.
              -- Note [When we quantify over a coercion variable]
@@ -938,9 +938,15 @@ instance Outputable Coercion where
   ppr = pprCo
 
 instance Outputable CoSel where
-  ppr (SelTyCon n _r) = text "Tc" <> parens (int n)
-  ppr SelForAll       = text "All"
-  ppr (SelFun fs)     = text "Fun" <> parens (ppr fs)
+  ppr (SelTyCon n r) = text "Tc" <> parens (int n <> comma <> pprOneCharRole r)
+  ppr SelForAll      = text "All"
+  ppr (SelFun fs)    = text "Fun" <> parens (ppr fs)
+
+
+pprOneCharRole :: Role -> SDoc
+pprOneCharRole Nominal          = char 'N'
+pprOneCharRole Representational = char 'R'
+pprOneCharRole Phantom          = char 'P'
 
 instance Outputable FunSel where
   ppr SelMult = text "mult"
@@ -1054,7 +1060,7 @@ SelTyCon, SelForAll, and SelFun.
       r = tyConRole tc r0 i
       i < n    (i is zero-indexed)
       ----------------------------------
-      SelCo (SelTyCon i r) : si ~r ti
+      SelCo (SelTyCon i r) co : si ~r ti
 
   "Not a newtype": see Note [SelCo and newtypes]
   "Not an arrow type": see SelFun below
@@ -1074,17 +1080,17 @@ SelTyCon, SelForAll, and SelFun.
       co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
       r = funRole r0 SelMult
       ----------------------------------
-      SelCo (SelFun SelMult) : m1 ~r m2
+      SelCo (SelFun SelMult) co : m1 ~r m2
 
       co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
       r = funRole r0 SelArg
       ----------------------------------
-      SelCo (SelFun SelArg) : s1 ~r s2
+      SelCo (SelFun SelArg) co : s1 ~r s2
 
       co : (s1 %{m1}-> t1) ~r0 (s2 %{m2}-> t2)
       r = funRole r0 SelRes
       ----------------------------------
-      SelCo (SelFun SelRes) : t1 ~r t2
+      SelCo (SelFun SelRes) co : t1 ~r t2
 
 Note [FunCo]
 ~~~~~~~~~~~~
@@ -1167,11 +1173,11 @@ because the kinds of the bound tyvars can be different.
 The typing rule is:
 
 
-  kind_co : k1 ~ k2
-  tv1:k1 |- co : t1 ~ t2
+  kind_co : k1 ~N k2
+  tv1:k1 |- co : t1 ~r t2
   -------------------------------------------------------------------
-  ForAllCo tv1 kind_co co : all tv1:k1. t1  ~
-                            all tv1:k2. (t2[tv1 |-> tv1 |> sym kind_co])
+  ForAllCo tv1 kind_co co : all tv1:k1. t1  ~r
+                            all tv1:k2. (t2[tv1 := (tv1 |> sym kind_co)])
 
 First, the TyCoVar stored in a ForAllCo is really an optimisation: this field
 should be a Name, as its kind is redundant. Thinking of the field as a Name


=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -256,17 +256,18 @@ inlineBoringOk e
 
 calcUnfoldingGuidance
         :: UnfoldingOpts
+        -> Bool          -- This is a join point
         -> Bool          -- Definitely a top-level, bottoming binding
         -> CoreExpr      -- Expression to look at
         -> UnfoldingGuidance
-calcUnfoldingGuidance opts is_top_bottoming (Tick t expr)
+calcUnfoldingGuidance opts is_join is_top_bottoming (Tick t expr)
   | not (tickishIsCode t)  -- non-code ticks don't matter for unfolding
-  = calcUnfoldingGuidance opts is_top_bottoming expr
-calcUnfoldingGuidance opts is_top_bottoming expr
+  = calcUnfoldingGuidance opts is_join is_top_bottoming expr
+calcUnfoldingGuidance opts is_join is_top_bottoming expr
   = case sizeExpr opts bOMB_OUT_SIZE val_bndrs body of
       TooBig -> UnfNever
       SizeIs size cased_bndrs scrut_discount
-        | uncondInline expr n_val_bndrs size
+        | uncondInline is_join expr n_val_bndrs size
         -> UnfWhen { ug_unsat_ok = unSaturatedOk
                    , ug_boring_ok =  boringCxtOk
                    , ug_arity = n_val_bndrs }   -- Note [INLINE for small functions]
@@ -432,11 +433,12 @@ sharing the wrapper closure.
 The solution: don’t ignore coercion arguments after all.
 -}
 
-uncondInline :: CoreExpr -> Arity -> Int -> Bool
+uncondInline :: Bool -> CoreExpr -> Arity -> Int -> Bool
 -- Inline unconditionally if there no size increase
 -- Size of call is arity (+1 for the function)
 -- See Note [INLINE for small functions]
-uncondInline rhs arity size
+uncondInline is_join rhs arity size
+  | is_join   = size < 10
   | arity > 0 = size <= 10 * (arity + 1) -- See Note [INLINE for small functions] (1)
   | otherwise = exprIsTrivial rhs        -- See Note [INLINE for small functions] (4)
 
@@ -594,6 +596,7 @@ sizeExpr opts !bOMB_OUT_SIZE top_args expr
            DataConWorkId dc -> conSize    dc (length val_args)
            PrimOpId op _    -> primOpSize op (length val_args)
            ClassOpId {}     -> classOpSize opts top_args val_args
+           JoinId {}        -> sizeZero  -- See Note [Inlining join points]
            _                -> funSize opts top_args fun (length val_args) voids
 
     ------------
@@ -685,6 +688,7 @@ callSize n_val_args voids = 10 * (1 + n_val_args - voids)
         -- Add 1 for each non-trivial arg;
         -- the allocation cost, as in let(rec)
 
+{-
 -- | The size of a jump to a join point
 jumpSize
  :: Int  -- ^ number of value args
@@ -695,6 +699,7 @@ jumpSize n_val_args voids = 2 * (1 + n_val_args - voids)
   -- bug #6048, but making them any more expensive loses a 21% improvement in
   -- spectral/puzzle. TODO Perhaps adjusting the default threshold would be a
   -- better solution?
+-}
 
 funSize :: UnfoldingOpts -> [Id] -> Id -> Int -> Int -> ExprSize
 -- Size for functions that are not constructors or primops
@@ -705,9 +710,9 @@ funSize opts top_args fun n_val_args voids
   | otherwise = SizeIs size arg_discount res_discount
   where
     some_val_args = n_val_args > 0
-    is_join = isJoinId fun
+--    is_join = isJoinId fun
 
-    size | is_join              = jumpSize n_val_args voids
+    size -- | is_join              = jumpSize n_val_args voids
          | not some_val_args    = 0
          | otherwise            = callSize n_val_args voids
 
@@ -772,6 +777,21 @@ win", but its terribly dangerous because a function with many many
 case branches, each finishing with a constructor, can have an
 arbitrarily large discount.  This led to terrible code bloat: see #6099.
 
+Note [Inlining join points]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+   join j1 a b c d = blah
+   join j2 x = j1 x v x w
+   in ...(jump j2 t)....
+
+Then j1 is just an indirection to j1 with a bit of argument shuffling.
+We want to inline it even though it has more arguments:
+   join j1 a b c d = blah
+   in ...(jump j1 t v t w)...
+
+So we charge nothing for join-point calls; a bit like we make constructor
+applications cheap (see Note [Constructor size and result discount]).
+
 Note [Unboxed tuple size and result discount]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 However, unboxed tuples count as size zero. I found occasions where we had


=====================================
compiler/GHC/Core/Unfold/Make.hs
=====================================
@@ -117,7 +117,7 @@ mkWorkerUnfolding opts work_fn
   = mkCoreUnfolding src top_lvl new_tmpl Nothing guidance
   where
     new_tmpl = simpleOptExpr opts (work_fn tmpl)
-    guidance = calcUnfoldingGuidance (so_uf_opts opts) False new_tmpl
+    guidance = calcUnfoldingGuidance (so_uf_opts opts) False False new_tmpl
 
 mkWorkerUnfolding _ _ _ = noUnfolding
 
@@ -328,7 +328,7 @@ mkUnfolding opts src top_lvl is_bottoming expr cache
   = mkCoreUnfolding src top_lvl expr cache guidance
   where
     is_top_bottoming = top_lvl && is_bottoming
-    guidance         = calcUnfoldingGuidance opts is_top_bottoming expr
+    guidance         = calcUnfoldingGuidance opts False is_top_bottoming expr
         -- NB: *not* (calcUnfoldingGuidance (occurAnalyseExpr expr))!
         -- See Note [Calculate unfolding guidance on the non-occ-anal'd expression]
 


=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -263,41 +263,28 @@ mkPiMCo v (MCo co) = MCo (mkPiCo Representational v co)
 -- | Wrap the given expression in the coercion safely, dropping
 -- identity coercions and coalescing nested coercions
 mkCast :: HasDebugCallStack => CoreExpr -> CoercionR -> CoreExpr
-mkCast e co
-  | assertPpr (coercionRole co == Representational)
-              (text "coercion" <+> ppr co <+> text "passed to mkCast"
-               <+> ppr e <+> text "has wrong role" <+> ppr (coercionRole co)) $
-    isReflCo co
-  = e
-
-mkCast (Coercion e_co) co
-  | isCoVarType (coercionRKind co)
-       -- The guard here checks that g has a (~#) on both sides,
-       -- otherwise decomposeCo fails.  Can in principle happen
-       -- with unsafeCoerce
-  = Coercion (mkCoCast e_co co)
-
-mkCast (Cast expr co2) co
-  = warnPprTrace (let { from_ty = coercionLKind co;
-                        to_ty2  = coercionRKind co2 } in
-                     not (from_ty `eqType` to_ty2))
-             "mkCast"
-             (vcat ([ text "expr:" <+> ppr expr
-                   , text "co2:" <+> ppr co2
-                   , text "co:" <+> ppr co ])) $
-    mkCast expr (mkTransCo co2 co)
-
-mkCast (Tick t expr) co
-   = Tick t (mkCast expr co)
 
 mkCast expr co
-  = let from_ty = coercionLKind co in
-    warnPprTrace (not (from_ty `eqType` exprType expr))
+  = assertPpr (coercionRole co == Representational)
+              (text "coercion" <+> ppr co <+> text "passed to mkCast"
+               <+> ppr expr <+> text "has wrong role" <+> ppr (coercionRole co)) $
+    warnPprTrace (not (coercionLKind co `eqType` exprType expr))
           "Trying to coerce" (text "(" <> ppr expr
           $$ text "::" <+> ppr (exprType expr) <> text ")"
           $$ ppr co $$ ppr (coercionType co)
           $$ callStackDoc) $
-    (Cast expr co)
+    case expr of
+      Cast expr co2 -> mkCast expr (mkTransCo co2 co)
+      Tick t expr   -> Tick t (mkCast expr co)
+
+      Coercion e_co | isCoVarType (coercionRKind co)
+         -- The guard here checks that g has a (~#) on both sides,
+         -- otherwise decomposeCo fails.  Can in principle happen
+         -- with unsafeCoerce
+                      -> Coercion (mkCoCast e_co co)
+
+      _ | isReflCo co -> expr
+        | otherwise   -> Cast expr co
 
 
 {- *********************************************************************


=====================================
compiler/GHC/HsToCore/Pmc/Solver/Types.hs
=====================================
@@ -325,6 +325,11 @@ lookupVarInfoNT ts x = case lookupVarInfo ts x of
     go _                = Nothing
 
 trvVarInfo :: Functor f => (VarInfo -> f (a, VarInfo)) -> Nabla -> Id -> f (a, Nabla)
+{-# INLINE trvVarInfo #-}
+-- This function is called a lot and we want to specilise it, not only
+-- for the type class, but also for its 'f' function argument.
+-- Before the INLINE pragma it sometimes inlined and sometimes didn't,
+-- depending delicately on GHC's optimisations.  Better to use a pragma.
 trvVarInfo f nabla at MkNabla{ nabla_tm_st = ts at TmSt{ts_facts = env} } x
   = set_vi <$> f (lookupVarInfo ts x)
   where


=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -1872,8 +1872,8 @@ ppr_co _ (IfaceUnivCo prov role ty1 ty2)
 
 ppr_co ctxt_prec (IfaceInstCo co ty)
   = maybeParen ctxt_prec appPrec $
-    text "Inst" <+> pprParendIfaceCoercion co
-                        <+> pprParendIfaceCoercion ty
+    text "Inst" <+> sep [ pprParendIfaceCoercion co
+                        , pprParendIfaceCoercion ty ]
 
 ppr_co ctxt_prec (IfaceAxiomRuleCo tc cos)
   = maybeParen ctxt_prec appPrec $ ppr tc <+> parens (interpp'SP cos)


=====================================
compiler/GHC/IfaceToCore.hs
=====================================
@@ -1778,7 +1778,7 @@ tcUnfolding toplvl name _ info (IfCoreUnfold src cache if_guidance if_expr)
         ; expr <- tcUnfoldingRhs (isCompulsorySource src) toplvl name if_expr
         ; let guidance = case if_guidance of
                  IfWhen arity unsat_ok boring_ok -> UnfWhen arity unsat_ok boring_ok
-                 IfNoGuidance -> calcUnfoldingGuidance uf_opts is_top_bottoming expr
+                 IfNoGuidance -> calcUnfoldingGuidance uf_opts False is_top_bottoming expr
           -- See Note [Tying the 'CoreUnfolding' knot]
         ; return $ mkCoreUnfolding src True expr (Just cache) guidance }
   where


=====================================
testsuite/tests/perf/compiler/T15630.hs
=====================================
@@ -1,5 +1,28 @@
 module T15630 where
 
+{- This is a fansastic test cose.
+
+* It scales really easily (just add or remove fields).
+
+* It can demonstrate massive (exponental) blow up if you get inlining
+  for join points wrong.
+
+* I found that a more monomorphic variant, T15630a, tickled a very similar
+  exponential -blowup, but somehow in a slighlty different way.  To be specific,
+  at the time of writing, HEAD was fine on T15630, but blew up on T15630a.
+  So both tests are valuable.
+
+* Also worth noting: even if it doesn't blow up, it can result in two
+  very different programs.  Below are the good and bad versions for 5
+  fields.  Note that the good version passes Maybes to the join points,
+  the ultimate values of the fields.  But the bad version passes an
+  accumulating *function* to the join points. Lots of PAPs much less
+  efficient.
+
+See Note [Do not add unfoldings to join points at birth] in
+GHc.Core.Opt.Simplify.Iteration.
+-}
+
 data IValue = IDefault
             | IInt Int
             | IBlob String
@@ -54,3 +77,220 @@ getMenuItem vs = fst $ (pure TestStructure, vs)
              <+> (getInt ?)
              <+> (getInt ?)
              <+> (getString ?)
+
+
+{-
+------------- The good version (5 fields) ----------------
+getMenuItem
+  = \ (vs_az6 :: [IValue]) ->
+      case vs_az6 of {
+        [] -> case T15630.<+>1 of wild1_00 { };
+        : v_az3 vs1_az4 ->
+          case vs1_az4 of {
+            [] -> case T15630.<+>1 of wild2_00 { };
+            : v1_X4 vs2_X5 ->
+              case vs2_X5 of {
+                [] -> case T15630.<+>1 of wild3_00 { };
+                : v2_X7 vs3_X8 ->
+                  case vs3_X8 of {
+                    [] -> case T15630.<+>1 of wild4_00 { };
+                    : v3_Xa vs4_Xb ->
+                      case vs4_Xb of {
+                        [] -> case T15630.<+>1 of wild5_00 { };
+                        : v4_Xd vs5_Xe ->
+                          case v_az3 of {
+                            __DEFAULT -> T15630.getMenuItem1;
+                            IInt i_ayQ ->
+                              join {
+                                $j_sPO [Dmd=MC(1,L)] :: Maybe String -> Either () TestStructure
+                                [LclId[JoinId(1)(Nothing)], Arity=1, Str=<L>, Unf=OtherCon []]
+                                $j_sPO (y_Xf [OS=OneShot] :: Maybe String)
+                                  = join {
+                                      $j1_sPR [Dmd=MC(1,L)] :: Maybe Int -> Either () TestStructure
+                                      [LclId[JoinId(1)(Nothing)], Arity=1, Str=<L>, Unf=OtherCon []]
+                                      $j1_sPR (y1_Xg [OS=OneShot] :: Maybe Int)
+                                        = case v3_Xa of {
+                                            IDefault ->
+                                              case v4_Xd of {
+                                                IDefault ->
+                                                  Data.Either.Right
+                                                    @()
+                                                    @TestStructure
+                                                    (T15630.TestStructure
+                                                       i_ayQ
+                                                       y_Xf
+                                                       y1_Xg
+                                                       (Nothing @String)
+                                                       (Nothing @Int));
+                                                IInt i1_Xk ->
+                                                  Data.Either.Right
+                                                    @()
+                                                    @TestStructure
+                                                    (T15630.TestStructure
+                                                       i_ayQ
+                                                       y_Xf
+                                                       y1_Xg
+                                                       (Nothing @String)
+                                                       (Just @Int i1_Xk));
+                                                IBlob ipv_sPo -> T15630.getMenuItem1
+                                              };
+                                            IInt ipv_sPm -> T15630.getMenuItem1;
+                                            IBlob b_ayW ->
+                                              case v4_Xd of {
+                                                IDefault ->
+                                                  Data.Either.Right
+                                                    @()
+                                                    @TestStructure
+                                                    (T15630.TestStructure
+                                                       i_ayQ
+                                                       y_Xf
+                                                       y1_Xg
+                                                       (Just @String b_ayW)
+                                                       (Nothing @Int));
+                                                IInt i1_Xk ->
+                                                  Data.Either.Right
+                                                    @()
+                                                    @TestStructure
+                                                    (T15630.TestStructure
+                                                       i_ayQ
+                                                       y_Xf
+                                                       y1_Xg
+                                                       (Just @String b_ayW)
+                                                       (Just @Int i1_Xk));
+                                                IBlob ipv_sPo -> T15630.getMenuItem1
+                                              }
+                                          } } in
+                                    case v2_X7 of {
+                                      IDefault -> jump $j1_sPR (Nothing @Int);
+                                      IInt i1_Xi -> jump $j1_sPR (Just @Int i1_Xi);
+                                      IBlob ipv_sPk -> T15630.getMenuItem1
+                                    } } in
+                              case v1_X4 of {
+                                IDefault -> jump $j_sPO (Nothing @String);
+                                IInt ipv_sPi -> T15630.getMenuItem1;
+                                IBlob b_ayW -> jump $j_sPO (Just @String b_ayW)
+                              }}}}}}}
+
+
+------------- The bad version ----------------
+getMenuItem
+  = \ (vs_azD :: [IValue]) ->
+      case vs_azD of {
+        [] -> case T15630.<+>1 of wild1_00 { };
+        : v_azA vs1_azB ->
+          case vs1_azB of {
+            [] -> case T15630.<+>1 of wild2_00 { };
+            : v1_X5 vs2_X6 ->
+              case vs2_X6 of {
+                [] -> case T15630.<+>1 of wild3_00 { };
+                : v2_X9 vs3_Xa ->
+                  case vs3_Xa of {
+                    [] -> case T15630.<+>1 of wild4_00 { };
+                    : v3_Xd vs4_Xe ->
+                      case vs4_Xe of {
+                        [] -> case T15630.<+>1 of wild5_00 { };
+                        : v4_Xh vs5_Xi ->
+                          case v_azA of {
+                            __DEFAULT -> T15630.getMenuItem1;
+                            IInt i_azn ->
+                              join {
+                                $j_sQw [Dmd=MC(1,L)]
+                                  :: (Maybe String -> Maybe Int -> TestStructure)
+                                     -> Either () TestStructure
+                                [LclId[JoinId(1)(Nothing)],
+                                 Arity=1,
+                                 Str=<MC(1,C(1,L))>,
+                                 Unf=OtherCon []]
+                                $j_sQw (f_aPr [OS=OneShot]
+                                          :: Maybe String -> Maybe Int -> TestStructure)
+                                  = case v3_Xd of {
+                                      IDefault ->
+                                        case v4_Xh of {
+                                          IDefault ->
+                                            Data.Either.Right
+                                              @()
+                                              @TestStructure
+                                              (f_aPr
+                                                 (Nothing @String)
+                                                 (Nothing @Int));
+                                          IInt i1_Xl ->
+                                            Data.Either.Right
+                                              @()
+                                              @TestStructure
+                                              (f_aPr
+                                                 (Nothing @String)
+                                                 (Just @Int i1_Xl));
+                                          IBlob ipv_sPM -> T15630.getMenuItem1
+                                        };
+                                      IInt ipv_sPK -> T15630.getMenuItem1;
+                                      IBlob b_azt ->
+                                        case v4_Xh of {
+                                          IDefault ->
+                                            Data.Either.Right
+                                              @()
+                                              @TestStructure
+                                              (f_aPr
+                                                 (Just @String b_azt)
+                                                 (Nothing @Int));
+                                          IInt i1_Xl ->
+                                            Data.Either.Right
+                                              @()
+                                              @TestStructure
+                                              (f_aPr
+                                                 (Just @String b_azt)
+                                                 (Just @Int i1_Xl));
+                                          IBlob ipv_sPM -> T15630.getMenuItem1
+                                        }
+                                    } } in
+                              case v1_X5 of {
+                                IDefault ->
+                                  case v2_X9 of {
+                                    IDefault ->
+                                      jump $j_sQw
+                                        (\ (ds_dNN [OS=OneShot] :: Maybe String)
+                                           (ds1_dNO [OS=OneShot] :: Maybe Int) ->
+                                           T15630.TestStructure
+                                             i_azn
+                                             (Nothing @String)
+                                             (Nothing @Int)
+                                             ds_dNN
+                                             ds1_dNO);
+                                    IInt i1_Xk ->
+                                      jump $j_sQw
+                                        (\ (ds_dNN [OS=OneShot] :: Maybe String)
+                                           (ds1_dNO [OS=OneShot] :: Maybe Int) ->
+                                           T15630.TestStructure
+                                             i_azn
+                                             (Nothing @String)
+                                             (Just @Int i1_Xk)
+                                             ds_dNN
+                                             ds1_dNO);
+                                    IBlob ipv_sPI -> T15630.getMenuItem1
+                                  };
+                                IInt ipv_sPG -> T15630.getMenuItem1;
+                                IBlob b_azt ->
+                                  case v2_X9 of {
+                                    IDefault ->
+                                      jump $j_sQw
+                                        (\ (ds_Xl [OS=OneShot] :: Maybe String)
+                                           (ds1_Xm [OS=OneShot] :: Maybe Int) ->
+                                           T15630.TestStructure
+                                             i_azn
+                                             (Just @String b_azt)
+                                             (Nothing @Int)
+                                             ds_Xl
+                                             ds1_Xm);
+                                    IInt i1_Xk ->
+                                      jump $j_sQw
+                                        (\ (ds_Xm [OS=OneShot] :: Maybe String)
+                                           (ds1_Xn [OS=OneShot] :: Maybe Int) ->
+                                           T15630.TestStructure
+                                             i_azn
+                                             (Just @String b_azt)
+                                             (Just @Int i1_Xk)
+                                             ds_Xm
+                                             ds1_Xn);
+                                    IBlob ipv_sPI -> T15630.getMenuItem1
+    }}}}}}}}
+
+-}


=====================================
testsuite/tests/perf/compiler/T15630a.hs
=====================================
@@ -0,0 +1,64 @@
+module T15630a where
+
+data IValue = IDefault
+            | IInt Int
+            | IBlob String
+
+(?) :: (IValue -> Either x a) -> IValue -> Either x (Maybe a)
+-- With this NOINLINE pragma we get good behaviour, but disastrous without
+-- {-# NOINLINE [0] (?) #-}
+(?) _ IDefault = pure Nothing
+(?) p x        = Just <$> p x
+
+getInt :: IValue -> Either () Int
+{-# NOINLINE getInt #-}
+getInt (IInt i) = Right i
+getInt v = Left ()
+
+getString :: IValue -> Either () String
+{-# NOINLINE getString #-}
+getString (IBlob b) = Right $ b
+getString v = Left ()
+
+(<+>) :: (Either x (a -> b), [IValue]) -> (IValue -> Either x a) -> (Either x b, [IValue])
+(<+>) (f, (v:vs)) p = (f <*> (p v), vs)
+
+data TestStructure = TestStructure
+    { _param1 :: Int
+    , _param2 :: Maybe String
+    , _param3 :: Maybe Int
+    , _param4 :: Maybe String
+    , _param5 :: Maybe Int
+    , _param6 :: Maybe Int
+
+    , _param7 :: Maybe String
+    , _param8 :: Maybe String
+    , _param9 :: Maybe Int
+    , _param10 :: Maybe Int
+    , _param11 :: Maybe String
+    , _param12 :: Maybe String
+    , _param13 :: Maybe Int
+    , _param14 :: Maybe Int
+    , _param15 :: Maybe String
+
+    }
+
+getMenuItem :: [IValue] -> Either () TestStructure
+getMenuItem vs = fst $ (pure TestStructure, vs)
+             <+> getInt
+             <+> (getString ?)
+             <+> (getInt ?)
+             <+> (getString ?)
+             <+> (getInt ?)
+             <+> (getInt ?)
+
+             <+> (getString ?)
+             <+> (getString ?)
+             <+> (getInt ?)
+             <+> (getInt ?)
+             <+> (getString ?)
+             <+> (getString ?)
+             <+> (getInt ?)
+             <+> (getInt ?)
+             <+> (getString ?)
+


=====================================
testsuite/tests/simplCore/should_compile/T18730.hs → testsuite/tests/perf/compiler/T18730.hs
=====================================


=====================================
testsuite/tests/simplCore/should_compile/T18730_A.hs → testsuite/tests/perf/compiler/T18730_A.hs
=====================================


=====================================
testsuite/tests/perf/compiler/all.T
=====================================
@@ -202,6 +202,15 @@ test('CoOpt_Singletons',
 
 #########
 
+# Moved from simplCore/should_compile
+test('T18730',
+      [ only_ways(['optasm'])
+      , collect_compiler_stats('bytes allocated',1)
+      , extra_files(['T8730_aux.hs'])
+      ],
+      multimod_compile,
+      ['T18730_A', '-v0 -O'])
+
 test ('LargeRecord',
       [ only_ways(['normal']),
         collect_compiler_stats('bytes allocated',1)
@@ -527,6 +536,11 @@ test('T15630',
       ],
       compile,
       ['-O2'])
+test('T15630a',
+      [collect_compiler_stats()
+      ],
+      compile,
+      ['-O2'])
 
 # See https://gitlab.haskell.org/ghc/ghc/merge_requests/312#note_186960
 test ('WWRec',


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -341,7 +341,6 @@ test('T18603', normal, compile, ['-dcore-lint -O'])
 # T18649 should /not/ generate a specialisation rule
 test('T18649', normal, compile, ['-O -ddump-rules -Wno-simplifiable-class-constraints'])
 
-test('T18730', normal, multimod_compile, ['T18730_A', '-dcore-lint -O'])
 test('T18747A', normal, compile, [''])
 test('T18747B', normal, compile, [''])
 test('T18815', only_ways(['optasm']), makefile_test, ['T18815'])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8fc0f2b29ed83b26d59c59fdebffeea97225cc1c...be385269dafbad2d3af3347afac4690b4f9f4933

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/8fc0f2b29ed83b26d59c59fdebffeea97225cc1c...be385269dafbad2d3af3347afac4690b4f9f4933
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230712/b1ed23f8/attachment-0001.html>


More information about the ghc-commits mailing list