[Git][ghc/ghc][wip/T18078] Implement cast worker/wrapper properly

Simon Peyton Jones gitlab at gitlab.haskell.org
Fri May 22 15:26:17 UTC 2020



Simon Peyton Jones pushed to branch wip/T18078 at Glasgow Haskell Compiler / GHC


Commits:
10b0b9b1 by Simon Peyton Jones at 2020-05-22T16:24:52+01:00
Implement cast worker/wrapper properly

The cast worker/wrapper transformation transforms
   x = e |> co
into
   y = e
   x = y |> co

This is done by the simplifier, but we were being
careless about transferring IdInfo from x to y,
and about what to do if x is a NOINLNE function.
This resulted in a series of bugs:
     #17673, #18093, #18078.

This patch fixes all that:

* Main change is in GHC.Core.Opt.Simplify, and
  the new prepareBinding function, which does this
  cast worker/wrapper transform.
  See Note [Cast worker/wrappers].

* There is quite a bit of refactoring around
  prepareRhs, makeTrivial etc.  It's nicer now.

* Some wrappers from strictness and cast w/w, notably those for
  a function with a NOINLINE, should inline very late. There
  wasn't really a mechanism for that, which was an existing bug
  really; so I invented a new finalPhase = Phase (-1).  It's used
  for all simplifier runs after the user-visible phase 2,1,0 have
  run.  (No new runs of the simplifier are introduced thereby.)

  See new Note [Compiler phases] in GHC.Types.Basic;
  the main changes are in GHC.Core.Opt.Driver

* Doing this made me trip over two places where the AnonArgFlag on a
  FunTy was being lost so we could end up with (Num a -> ty)
  rather than (Num a => ty)
    - In coercionLKind/coercionRKind
    - In contHoleType in the Simplifier

  I fixed the former by defining mkFunctionType and using it in
  coercionLKind/RKind.

  I could have done the same for the latter, but the information
  is almost to hand.  So I fixed the latter by
    - adding sc_hole_ty to ApplyToVal (like ApplyToTy),
    - adding as_hole_ty to ValArg (like TyArg)
    - adding sc_fun_ty to StrictArg
  Turned out I could then remove ai_type from ArgInfo.  This is
  just moving the deck chairs around, but it worked out nicely.

  See the new Note [AnonArgFlag] in GHC.Types.Var

* When looking at the 'arity decrease' thing (#18093) I discovered
  that stable unfoldings had a much lower arity than the actual
  optimised function.  That's what led to the arity-decrease
  message.  Simple solution: eta-expand.

  It's described in Note [Eta-expand stable unfoldings]
  in GHC.Core.Opt.Simplify

* I also discovered that unsafeCoerce wasn't being inlined if
  the context was boring.  So (\x. f (unsafeCoerce x)) would
  create a thunk -- yikes!  I fixed that by making inlineBoringOK
  a bit cleverer: see Note [Inline unsafeCoerce] in GHC.Core.Unfold.

  I also found that unsafeCoerceName was unused, so I removed it.

I made a test case for #18078, and a very similar one for #17673.

The net effect of all this on nofib is very modest, but positive:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           anna          -0.4%     -0.1%     -3.1%     -3.1%      0.0%
 fannkuch-redux          -0.4%     -0.3%     -0.1%     -0.1%      0.0%
       maillist          -0.4%     -0.1%     -7.8%     -1.0%    -14.3%
      primetest          -0.4%    -15.6%     -7.1%     -6.6%      0.0%
--------------------------------------------------------------------------------
            Min          -0.9%    -15.6%    -13.3%    -14.2%    -14.3%
            Max          -0.3%      0.0%    +12.1%    +12.4%      0.0%
 Geometric Mean          -0.4%     -0.2%     -2.3%     -2.2%     -0.1%

- - - - -


29 changed files:

- compiler/GHC/Builtin/Names.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Opt/Driver.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/SpecConstr.hs
- compiler/GHC/Core/Opt/WorkWrap.hs
- compiler/GHC/Core/SimpleOpt.hs
- compiler/GHC/Core/Type.hs
- compiler/GHC/Core/Unfold.hs
- compiler/GHC/Core/Utils.hs
- compiler/GHC/CoreToStg/Prep.hs
- compiler/GHC/Types/Basic.hs
- compiler/GHC/Types/Var.hs
- libraries/base/Unsafe/Coerce.hs
- testsuite/tests/codeGen/should_compile/debug.stdout
- testsuite/tests/deSugar/should_compile/T2431.stderr
- testsuite/tests/perf/compiler/T16473.stdout
- testsuite/tests/simplCore/should_compile/T13143.stderr
- testsuite/tests/simplCore/should_compile/T3772.stdout
- testsuite/tests/simplCore/should_compile/T7360.stderr
- testsuite/tests/simplCore/should_compile/T7865.stdout
- testsuite/tests/stranal/should_compile/Makefile
- testsuite/tests/stranal/should_compile/T16029.stdout
- + testsuite/tests/stranal/should_compile/T17673.hs
- + testsuite/tests/stranal/should_compile/T17673.stdout
- + testsuite/tests/stranal/should_compile/T18078.hs
- + testsuite/tests/stranal/should_compile/T18078.stdout
- testsuite/tests/stranal/should_compile/all.T


Changes:

=====================================
compiler/GHC/Builtin/Names.hs
=====================================
@@ -475,7 +475,6 @@ basicKnownKeyNames
         , unsafeEqualityTyConName
         , unsafeReflDataConName
         , unsafeCoercePrimName
-        , unsafeCoerceName
     ]
 
 genericTyConNames :: [Name]
@@ -1333,12 +1332,11 @@ typeErrorShowTypeDataConName =
 
 -- Unsafe coercion proofs
 unsafeEqualityProofName, unsafeEqualityTyConName, unsafeCoercePrimName,
-  unsafeCoerceName, unsafeReflDataConName :: Name
+  unsafeReflDataConName :: Name
 unsafeEqualityProofName = varQual uNSAFE_COERCE (fsLit "unsafeEqualityProof") unsafeEqualityProofIdKey
 unsafeEqualityTyConName = tcQual uNSAFE_COERCE (fsLit "UnsafeEquality") unsafeEqualityTyConKey
 unsafeReflDataConName   = dcQual uNSAFE_COERCE (fsLit "UnsafeRefl")     unsafeReflDataConKey
 unsafeCoercePrimName    = varQual uNSAFE_COERCE (fsLit "unsafeCoerce#") unsafeCoercePrimIdKey
-unsafeCoerceName        = varQual uNSAFE_COERCE (fsLit "unsafeCoerce")  unsafeCoerceIdKey
 
 -- Dynamic
 toDynName :: Name
@@ -2411,10 +2409,9 @@ naturalSDataConKey      = mkPreludeMiscIdUnique 568
 wordToNaturalIdKey      = mkPreludeMiscIdUnique 569
 
 -- Unsafe coercion proofs
-unsafeEqualityProofIdKey, unsafeCoercePrimIdKey, unsafeCoerceIdKey :: Unique
+unsafeEqualityProofIdKey, unsafeCoercePrimIdKey :: Unique
 unsafeEqualityProofIdKey = mkPreludeMiscIdUnique 570
 unsafeCoercePrimIdKey    = mkPreludeMiscIdUnique 571
-unsafeCoerceIdKey        = mkPreludeMiscIdUnique 572
 
 {-
 ************************************************************************


=====================================
compiler/GHC/Core/Coercion.hs
=====================================
@@ -2187,7 +2187,7 @@ coercionLKind co
     go (TyConAppCo _ tc cos)    = mkTyConApp tc (map go cos)
     go (AppCo co1 co2)          = mkAppTy (go co1) (go co2)
     go (ForAllCo tv1 _ co1)     = mkTyCoInvForAllTy tv1 (go co1)
-    go (FunCo _ co1 co2)        = mkVisFunTy (go co1) (go co2)
+    go (FunCo _ co1 co2)        = mkFunctionType (go co1) (go co2)
     go (CoVarCo cv)             = coVarLType cv
     go (HoleCo h)               = coVarLType (coHoleCoVar h)
     go (UnivCo _ _ ty1 _)       = ty1
@@ -2244,7 +2244,7 @@ coercionRKind co
     go (AppCo co1 co2)          = mkAppTy (go co1) (go co2)
     go (CoVarCo cv)             = coVarRType cv
     go (HoleCo h)               = coVarRType (coHoleCoVar h)
-    go (FunCo _ co1 co2)        = mkVisFunTy (go co1) (go co2)
+    go (FunCo _ co1 co2)        = mkFunctionType (go co1) (go co2)
     go (UnivCo _ _ _ ty2)       = ty2
     go (SymCo co)               = coercionLKind co
     go (TransCo _ co2)          = go co2


=====================================
compiler/GHC/Core/Opt/Driver.hs
=====================================
@@ -37,7 +37,7 @@ import GHC.Core.Opt.FloatOut ( floatOutwards )
 import GHC.Core.FamInstEnv
 import GHC.Types.Id
 import GHC.Utils.Error  ( withTiming, withTimingD, DumpFormat (..) )
-import GHC.Types.Basic  ( CompilerPhase(..), isDefaultInlinePragma, defaultInlinePragma )
+import GHC.Types.Basic
 import GHC.Types.Var.Set
 import GHC.Types.Var.Env
 import GHC.Core.Opt.LiberateCase ( liberateCase )
@@ -141,8 +141,10 @@ getCoreToDo dflags
 
     maybe_rule_check phase = runMaybe rule_check (CoreDoRuleCheck phase)
 
-    maybe_strictness_before phase
-      = runWhen (phase `elem` strictnessBefore dflags) CoreDoDemand
+    maybe_strictness_before (Phase phase)
+      | phase `elem` strictnessBefore dflags = CoreDoDemand
+    maybe_strictness_before _
+      = CoreDoNothing
 
     base_mode = SimplMode { sm_phase      = panic "base_mode"
                           , sm_names      = []
@@ -152,20 +154,20 @@ getCoreToDo dflags
                           , sm_inline     = True
                           , sm_case_case  = True }
 
-    simpl_phase phase names iter
+    simpl_phase phase name iter
       = CoreDoPasses
       $   [ maybe_strictness_before phase
           , CoreDoSimplify iter
-                (base_mode { sm_phase = Phase phase
-                           , sm_names = names })
+                (base_mode { sm_phase = phase
+                           , sm_names = [name] })
 
-          , maybe_rule_check (Phase phase) ]
+          , maybe_rule_check phase ]
 
-    simpl_phases = CoreDoPasses [ simpl_phase phase ["main"] max_iter
-                                | phase <- [phases, phases-1 .. 1] ]
+    -- Run GHC's internal simplification phase, after all rules have run.
+    -- See Note [Compiler phases] in GHC.Types.Basic
+    simplify name = simpl_phase finalPhase name max_iter
 
-
-        -- initial simplify: mk specialiser happy: minimum effort please
+    -- initial simplify: mk specialiser happy: minimum effort please
     simpl_gently = CoreDoSimplify max_iter
                        (base_mode { sm_phase = InitialPhase
                                   , sm_names = ["Gentle"]
@@ -182,7 +184,7 @@ getCoreToDo dflags
 
     demand_analyser = (CoreDoPasses (
                            dmd_cpr_ww ++
-                           [simpl_phase 0 ["post-worker-wrapper"] max_iter]
+                           [simplify "post-worker-wrapper"]
                            ))
 
     -- Static forms are moved to the top level with the FloatOut pass.
@@ -203,7 +205,7 @@ getCoreToDo dflags
      if opt_level == 0 then
        [ static_ptrs_float_outwards,
          CoreDoSimplify max_iter
-             (base_mode { sm_phase = Phase 0
+             (base_mode { sm_phase = finalPhase
                         , sm_names = ["Non-opt simplification"] })
        ]
 
@@ -251,8 +253,10 @@ getCoreToDo dflags
            -- GHC.Iface.Tidy.StaticPtrTable.
            static_ptrs_float_outwards,
 
-        simpl_phases,
-
+        -- Run the simplier phases 2,1,0 to allow rewrite rules to fire
+        CoreDoPasses [ simpl_phase (Phase phase) "main" max_iter
+                     | phase <- [phases, phases-1 .. 1] ],
+        simpl_phase (Phase 0) "main" (max max_iter 3),
                 -- Phase 0: allow all Ids to be inlined now
                 -- This gets foldr inlined before strictness analysis
 
@@ -263,7 +267,6 @@ getCoreToDo dflags
                 -- ==>  let k = BIG in letrec go = \xs -> ...(k x).... in go xs
                 -- ==>  let k = BIG in letrec go = \xs -> ...(BIG x).... in go xs
                 -- Don't stop now!
-        simpl_phase 0 ["main"] (max max_iter 3),
 
         runWhen do_float_in CoreDoFloatInwards,
             -- Run float-inwards immediately before the strictness analyser
@@ -274,9 +277,10 @@ getCoreToDo dflags
 
         runWhen call_arity $ CoreDoPasses
             [ CoreDoCallArity
-            , simpl_phase 0 ["post-call-arity"] max_iter
+            , simplify "post-call-arity"
             ],
 
+        -- Strictness analysis
         runWhen strictness demand_analyser,
 
         runWhen exitification CoreDoExitify,
@@ -302,24 +306,24 @@ getCoreToDo dflags
 
         runWhen do_float_in CoreDoFloatInwards,
 
-        maybe_rule_check (Phase 0),
+        maybe_rule_check finalPhase,
 
                 -- Case-liberation for -O2.  This should be after
                 -- strictness analysis and the simplification which follows it.
         runWhen liberate_case (CoreDoPasses [
             CoreLiberateCase,
-            simpl_phase 0 ["post-liberate-case"] max_iter
+            simplify "post-liberate-case"
             ]),         -- Run the simplifier after LiberateCase to vastly
                         -- reduce the possibility of shadowing
                         -- Reason: see Note [Shadowing] in GHC.Core.Opt.SpecConstr
 
         runWhen spec_constr CoreDoSpecConstr,
 
-        maybe_rule_check (Phase 0),
+        maybe_rule_check finalPhase,
 
         runWhen late_specialise
           (CoreDoPasses [ CoreDoSpecialising
-                        , simpl_phase 0 ["post-late-spec"] max_iter]),
+                        , simplify "post-late-spec"]),
 
         -- LiberateCase can yield new CSE opportunities because it peels
         -- off one layer of a recursive function (concretely, I saw this
@@ -328,11 +332,10 @@ getCoreToDo dflags
         runWhen ((liberate_case || spec_constr) && cse) CoreCSE,
 
         -- Final clean-up simplification:
-        simpl_phase 0 ["final"] max_iter,
+        simplify "final",
 
         runWhen late_dmd_anal $ CoreDoPasses (
-            dmd_cpr_ww ++
-            [simpl_phase 0 ["post-late-ww"] max_iter]
+            dmd_cpr_ww ++ [simplify "post-late-ww"]
           ),
 
         -- Final run of the demand_analyser, ensures that one-shot thunks are
@@ -342,7 +345,7 @@ getCoreToDo dflags
         -- can become /exponentially/ more expensive. See #11731, #12996.
         runWhen (strictness || late_dmd_anal) CoreDoDemand,
 
-        maybe_rule_check (Phase 0)
+        maybe_rule_check finalPhase
      ]
 
     -- Remove 'CoreDoNothing' and flatten 'CoreDoPasses' for clarity.


=====================================
compiler/GHC/Core/Opt/Simplify.hs
=====================================
@@ -43,12 +43,15 @@ import GHC.Types.Cpr    ( mkCprSig, botCpr )
 import GHC.Core.Ppr     ( pprCoreExpr )
 import GHC.Core.Unfold
 import GHC.Core.Utils
+import GHC.Core.Arity     ( etaExpand )
 import GHC.Core.SimpleOpt ( pushCoTyArg, pushCoValArg
                           , joinPointBinding_maybe, joinPointBindings_maybe )
 import GHC.Core.FVs     ( mkRuleInfo )
 import GHC.Core.Rules   ( lookupRule, getRules )
 import GHC.Types.Basic  ( TopLevelFlag(..), isNotTopLevel, isTopLevel,
-                          RecFlag(..), Arity )
+                          RecFlag(..), InlinePragma(..), Activation(..),
+                          SourceText(..), InlineSpec(..), activeDuringFinal,
+                          Arity )
 import GHC.Utils.Monad  ( mapAccumLM, liftIO )
 import GHC.Types.Var    ( isTyCoVar )
 import GHC.Data.Maybe   ( orElse )
@@ -315,8 +318,8 @@ simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se
 
         -- ANF-ise a constructor or PAP rhs
         -- We get at most one float per argument here
-        ; (let_floats, body2) <- {-#SCC "prepareRhs" #-} prepareRhs (getMode env) top_lvl
-                                            (getOccFS bndr1) (idInfo bndr1) body1
+        ; (let_floats, bndr2, body2) <- {-#SCC "prepareBinding" #-}
+                                        prepareBinding env top_lvl bndr bndr1 body1
         ; let body_floats2 = body_floats1 `addLetFloats` let_floats
 
         ; (rhs_floats, rhs')
@@ -341,7 +344,7 @@ simplLazyBind env top_lvl is_rec bndr bndr1 rhs rhs_se
                         ; return (floats, rhs') }
 
         ; (bind_float, env2) <- completeBind (env `setInScopeFromF` rhs_floats)
-                                             top_lvl Nothing bndr bndr1 rhs'
+                                             top_lvl Nothing bndr bndr2 rhs'
         ; return (rhs_floats `addFloats` bind_float, env2) }
 
 --------------------------
@@ -393,16 +396,16 @@ completeNonRecX :: TopLevelFlag -> SimplEnv
 
 completeNonRecX top_lvl env is_strict old_bndr new_bndr new_rhs
   = ASSERT2( not (isJoinId new_bndr), ppr new_bndr )
-    do  { (prepd_floats, rhs1) <- prepareRhs (getMode env) top_lvl (getOccFS new_bndr)
-                                             (idInfo new_bndr) new_rhs
+    do  { (prepd_floats, new_bndr, new_rhs)
+              <- prepareBinding env top_lvl old_bndr new_bndr new_rhs
         ; let floats = emptyFloats env `addLetFloats` prepd_floats
         ; (rhs_floats, rhs2) <-
-                if doFloatFromRhs NotTopLevel NonRecursive is_strict floats rhs1
+                if doFloatFromRhs NotTopLevel NonRecursive is_strict floats new_rhs
                 then    -- Add the floats to the main env
                      do { tick LetFloatFromLet
-                        ; return (floats, rhs1) }
+                        ; return (floats, new_rhs) }
                 else    -- Do not float; wrap the floats around the RHS
-                     return (emptyFloats env, wrapFloats floats rhs1)
+                     return (emptyFloats env, wrapFloats floats new_rhs)
 
         ; (bind_float, env2) <- completeBind (env `setInScopeFromF` rhs_floats)
                                              NotTopLevel Nothing
@@ -412,12 +415,146 @@ completeNonRecX top_lvl env is_strict old_bndr new_bndr new_rhs
 
 {- *********************************************************************
 *                                                                      *
-           prepareRhs, makeTrivial
+           prepareBinding, prepareRhs, makeTrivial
 *                                                                      *
 ************************************************************************
 
-Note [prepareRhs]
-~~~~~~~~~~~~~~~~~
+Note [Cast worker/wrappers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When we have a binding
+   x = e |> co
+we want to do something very similar to worker/wrapper:
+   $wx = e
+   x = $wx |> co
+
+So now x can be inlined freely.  There's a chance that e will be a
+constructor application or function, or something like that, so moving
+the coercion to the usage site may well cancel the coercions and lead
+to further optimisation.  Example:
+
+     data family T a :: *
+     data instance T Int = T Int
+
+     foo :: Int -> Int -> Int
+     foo m n = ...
+        where
+          t = T m
+          go 0 = 0
+          go n = case t of { T m -> go (n-m) }
+                -- This case should optimise
+
+We call this making a cast worker/wrapper, and it'd done by prepareBinding.
+
+We need to be careful with inline/noinline pragmas:
+  rec { {-# NOINLINE f #-}
+        f = (...g...) |> co
+      ; g = ...f... }
+This is legitimate -- it tells GHC to use f as the loop breaker
+rather than g.  Now we do the cast thing, to get something like
+  rec { $wf = ...g...
+      ; f = $wf |> co
+      ; g = ...f... }
+Where should the NOINLINE pragma go?  If we leave it on f we'll get
+  rec { $wf = ...g...
+      ; {-# NOINLINE f #-}
+        f = $wf |> co
+      ; g = ...f... }
+and that is bad bad: the whole point is that we want to inline that
+cast!  We want to transfer the pagma to $wf:
+  rec { {-# NOINLINE $wf #-}
+        $wf = ...g...
+      ; f = $wf |> co
+      ; g = ...f... }
+It's exactly like worker/wrapper for strictness analysis:
+  f is the wrapper and must inline like crazy
+  $wf is the worker and must carry f's original pragma
+See Note [Worker-wrapper for NOINLINE functions] in
+GHC.Core.Opt.WorkWrap.
+
+See #17673, #18093, #18078.
+
+Note [Preserve strictness in cast w/w]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In the Note [Cast worker/wrappers] transformation, keep the strictness info.
+Eg
+        f = e `cast` co    -- f has strictness SSL
+When we transform to
+        f' = e             -- f' also has strictness SSL
+        f = f' `cast` co   -- f still has strictness SSL
+
+Its not wrong to drop it on the floor, but better to keep it.
+
+Note [Cast w/w: unlifted]
+~~~~~~~~~~~~~~~~~~~~~~~~~
+BUT don't do cast worker/wrapper if 'e' has an unlifted type.
+This *can* happen:
+
+     foo :: Int = (error (# Int,Int #) "urk")
+                  `cast` CoUnsafe (# Int,Int #) Int
+
+If do the makeTrivial thing to the error call, we'll get
+    foo = case error (# Int,Int #) "urk" of v -> v `cast` ...
+But 'v' isn't in scope!
+
+These strange casts can happen as a result of case-of-case
+        bar = case (case x of { T -> (# 2,3 #); F -> error "urk" }) of
+                (# p,q #) -> p+q
+
+NOTE: Nowadays we don't use casts for these error functions;
+instead, we use (case erorr ... of {}). So I'm not sure
+this Note makes much sense any more.
+-}
+
+prepareBinding :: SimplEnv -> TopLevelFlag
+               -> InId -> OutId -> OutExpr
+               -> SimplM (LetFloats, OutId, OutExpr)
+
+prepareBinding env top_lvl old_bndr bndr rhs
+  | Cast rhs1 co <- rhs
+    -- Try for cast worker/wrapper
+    -- See Note [Cast worker/wrappers]
+  , not (isStableUnfolding (realIdUnfolding old_bndr))
+        -- Don't make a cast w/w if the thing is going to be inlined anyway
+  , not (exprIsTrivial rhs1)
+        -- Nor if the RHS is trivial; then again it'll be inlined
+  , let ty1 = coercionLKind co
+  , not (isUnliftedType ty1)
+        -- Not if rhs has an unlifted type; see Note [Cast w/w: unlifted]
+  = do { (floats, new_id) <- makeTrivialBinding (getMode env) top_lvl
+                                   (getOccFS bndr) worker_info rhs1 ty1
+       ; let bndr' = bndr `setInlinePragma` mkCastWrapperInlinePrag (idInlinePragma bndr)
+       ; return (floats, bndr', Cast (Var new_id) co) }
+
+  | otherwise
+  = do { (floats, rhs') <- prepareRhs (getMode env) top_lvl (getOccFS bndr) rhs
+       ; return (floats, bndr, rhs') }
+ where
+   info = idInfo bndr
+   worker_info = vanillaIdInfo `setStrictnessInfo` strictnessInfo info
+                               `setCprInfo`        cprInfo info
+                               `setDemandInfo`     demandInfo info
+                               `setInlinePragInfo` inlinePragInfo info
+                               `setArityInfo`      arityInfo info
+          -- We do /not/ want to transfer OccInfo, Rules, Unfolding
+          -- Note [Preserve strictness in cast w/w]
+
+mkCastWrapperInlinePrag :: InlinePragma -> InlinePragma
+-- See Note [Cast wrappers]
+mkCastWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
+  = InlinePragma { inl_src    = SourceText "{-# INLINE"
+                 , inl_inline = NoUserInline -- See Note [Wrapper NoUserInline]
+                 , inl_sat    = Nothing      --     in GHC.Core.Opt.WorkWrap
+                 , inl_act    = wrap_act     -- See Note [Wrapper activation]
+                 , inl_rule   = rule_info }  --     in GHC.Core.Opt.WorkWrap
+                                -- RuleMatchInfo is (and must be) unaffected
+  where
+    -- See Note [Wrapper activation] in GHC.Core.Opt.WorkWrap
+    wrap_act  = case act of
+                   NeverActive -> activeDuringFinal  -- Inline late because of rules
+                   _           -> act
+
+{- Note [prepareRhs]
+~~~~~~~~~~~~~~~~~~~~
 prepareRhs takes a putative RHS, checks whether it's a PAP or
 constructor application and, if so, converts it to ANF, so that the
 resulting thing can be inlined more easily.  Thus
@@ -435,26 +572,16 @@ That's what the 'go' loop in prepareRhs does
 -}
 
 prepareRhs :: SimplMode -> TopLevelFlag
-           -> FastString   -- Base for any new variables
-           -> IdInfo       -- IdInfo for the LHS of this binding
+           -> FastString    -- Base for any new variables
            -> OutExpr
            -> SimplM (LetFloats, OutExpr)
--- Transforms a RHS into a better RHS by adding floats
+-- Transforms a RHS into a better RHS by ANF'ing args
+-- for expandable RHSs: constructors and PAPs
 -- e.g        x = Just e
 -- becomes    a = e
 --            x = Just a
 -- See Note [prepareRhs]
-prepareRhs mode top_lvl occ info (Cast rhs co)  -- Note [Float coercions]
-  | let ty1 = coercionLKind co         -- Do *not* do this if rhs has an unlifted type
-  , not (isUnliftedType ty1)                 -- see Note [Float coercions (unlifted)]
-  = do  { (floats, rhs') <- makeTrivialWithInfo mode top_lvl occ sanitised_info rhs
-        ; return (floats, Cast rhs' co) }
-  where
-    sanitised_info = vanillaIdInfo `setStrictnessInfo` strictnessInfo info
-                                   `setCprInfo`        cprInfo info
-                                   `setDemandInfo`     demandInfo info
-
-prepareRhs mode top_lvl occ _ rhs0
+prepareRhs mode top_lvl occ rhs0
   = do  { (_is_exp, floats, rhs1) <- go 0 rhs0
         ; return (floats, rhs1) }
   where
@@ -498,61 +625,10 @@ prepareRhs mode top_lvl occ _ rhs0
     go _ other
         = return (False, emptyLetFloats, other)
 
-{-
-Note [Float coercions]
-~~~~~~~~~~~~~~~~~~~~~~
-When we find the binding
-        x = e `cast` co
-we'd like to transform it to
-        x' = e
-        x = x `cast` co         -- A trivial binding
-There's a chance that e will be a constructor application or function, or something
-like that, so moving the coercion to the usage site may well cancel the coercions
-and lead to further optimisation.  Example:
-
-     data family T a :: *
-     data instance T Int = T Int
-
-     foo :: Int -> Int -> Int
-     foo m n = ...
-        where
-          x = T m
-          go 0 = 0
-          go n = case x of { T m -> go (n-m) }
-                -- This case should optimise
-
-Note [Preserve strictness when floating coercions]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In the Note [Float coercions] transformation, keep the strictness info.
-Eg
-        f = e `cast` co    -- f has strictness SSL
-When we transform to
-        f' = e             -- f' also has strictness SSL
-        f = f' `cast` co   -- f still has strictness SSL
-
-Its not wrong to drop it on the floor, but better to keep it.
-
-Note [Float coercions (unlifted)]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-BUT don't do [Float coercions] if 'e' has an unlifted type.
-This *can* happen:
-
-     foo :: Int = (error (# Int,Int #) "urk")
-                  `cast` CoUnsafe (# Int,Int #) Int
-
-If do the makeTrivial thing to the error call, we'll get
-    foo = case error (# Int,Int #) "urk" of v -> v `cast` ...
-But 'v' isn't in scope!
-
-These strange casts can happen as a result of case-of-case
-        bar = case (case x of { T -> (# 2,3 #); F -> error "urk" }) of
-                (# p,q #) -> p+q
--}
-
 makeTrivialArg :: SimplMode -> ArgSpec -> SimplM (LetFloats, ArgSpec)
-makeTrivialArg mode (ValArg e)
+makeTrivialArg mode arg@(ValArg { as_arg = e })
   = do { (floats, e') <- makeTrivial mode NotTopLevel (fsLit "arg") e
-       ; return (floats, ValArg e') }
+       ; return (floats, arg { as_arg = e' }) }
 makeTrivialArg _ arg
   = return (emptyLetFloats, arg)  -- CastBy, TyArg
 
@@ -561,29 +637,32 @@ makeTrivial :: SimplMode -> TopLevelFlag
             -> OutExpr     -- ^ This expression satisfies the let/app invariant
             -> SimplM (LetFloats, OutExpr)
 -- Binds the expression to a variable, if it's not trivial, returning the variable
-makeTrivial mode top_lvl context expr
- = makeTrivialWithInfo mode top_lvl context vanillaIdInfo expr
-
-makeTrivialWithInfo :: SimplMode -> TopLevelFlag
-                    -> FastString  -- ^ a "friendly name" to build the new binder from
-                    -> IdInfo
-                    -> OutExpr     -- ^ This expression satisfies the let/app invariant
-                    -> SimplM (LetFloats, OutExpr)
--- Propagate strictness and demand info to the new binder
--- Note [Preserve strictness when floating coercions]
--- Returned SimplEnv has same substitution as incoming one
-makeTrivialWithInfo mode top_lvl occ_fs info expr
+makeTrivial mode top_lvl occ_fs expr
   | exprIsTrivial expr                          -- Already trivial
   || not (bindingOk top_lvl expr expr_ty)       -- Cannot trivialise
                                                 --   See Note [Cannot trivialise]
   = return (emptyLetFloats, expr)
 
+  | Cast expr' co <- expr
+  = do { (floats, triv_expr) <- makeTrivial mode top_lvl occ_fs expr'
+       ; return (floats, Cast triv_expr co) }
+
   | otherwise
-  = do  { (floats, expr1) <- prepareRhs mode top_lvl occ_fs info expr
-        ; if   exprIsTrivial expr1  -- See Note [Trivial after prepareRhs]
-          then return (floats, expr1)
-          else do
-        { uniq <- getUniqueM
+  = do { (floats, new_id) <- makeTrivialBinding mode top_lvl occ_fs
+                                                vanillaIdInfo expr expr_ty
+       ; return (floats, Var new_id) }
+  where
+    expr_ty = exprType expr
+
+makeTrivialBinding :: SimplMode -> TopLevelFlag
+                   -> FastString  -- ^ a "friendly name" to build the new binder from
+                   -> IdInfo
+                   -> OutExpr     -- ^ This expression satisfies the let/app invariant
+                   -> OutType     -- Type of the expression
+                   -> SimplM (LetFloats, OutId)
+makeTrivialBinding mode top_lvl occ_fs info expr expr_ty
+  = do  { (floats, expr1) <- prepareRhs mode top_lvl occ_fs expr
+        ; uniq <- getUniqueM
         ; let name = mkSystemVarName uniq occ_fs
               var  = mkLocalIdWithInfo name expr_ty info
 
@@ -595,9 +674,7 @@ makeTrivialWithInfo mode top_lvl occ_fs info expr
         ; let final_id = addLetBndrInfo var arity is_bot unf
               bind     = NonRec final_id expr2
 
-        ; return ( floats `addLetFlts` unitLetFloat bind, Var final_id ) }}
-   where
-     expr_ty = exprType expr
+        ; return ( floats `addLetFlts` unitLetFloat bind, final_id ) }
 
 bindingOk :: TopLevelFlag -> CoreExpr -> Type -> Bool
 -- True iff we can have a binding of this expression at this level
@@ -606,15 +683,8 @@ bindingOk top_lvl expr expr_ty
   | isTopLevel top_lvl = exprIsTopLevelBindable expr expr_ty
   | otherwise          = True
 
-{- Note [Trivial after prepareRhs]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If we call makeTrival on (e |> co), the recursive use of prepareRhs
-may leave us with
-   { a1 = e }  and   (a1 |> co)
-Now the latter is trivial, so we don't want to let-bind it.
-
-Note [Cannot trivialise]
-~~~~~~~~~~~~~~~~~~~~~~~~
+{- Note [Cannot trivialise]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider:
    f :: Int -> Addr#
 
@@ -696,7 +766,7 @@ completeBind env top_lvl mb_cont old_bndr new_bndr new_rhs
 
         -- Simplify the unfolding
       ; new_unfolding <- simplLetUnfolding env top_lvl mb_cont old_bndr
-                                           final_rhs (idType new_bndr) old_unf
+                          final_rhs (idType new_bndr) new_arity old_unf
 
       ; let final_bndr = addLetBndrInfo new_bndr new_arity is_bot new_unfolding
         -- See Note [In-scope set as a substitution]
@@ -928,6 +998,7 @@ simplExprF1 env (App fun arg) cont
                                 , sc_cont    = cont } }
       _       -> simplExprF env fun $
                  ApplyToVal { sc_arg = arg, sc_env = env
+                            , sc_hole_ty = substTy env (exprType fun)
                             , sc_dup = NoDup, sc_cont = cont }
 
 simplExprF1 env expr@(Lam {}) cont
@@ -1232,8 +1303,8 @@ rebuild env expr cont
       Select { sc_bndr = bndr, sc_alts = alts, sc_env = se, sc_cont = cont }
         -> rebuildCase (se `setInScopeFromE` env) expr bndr alts cont
 
-      StrictArg { sc_fun = fun, sc_cont = cont }
-        -> rebuildCall env (fun `addValArgTo` expr) cont
+      StrictArg { sc_fun = fun, sc_cont = cont, sc_fun_ty = fun_ty }
+        -> rebuildCall env (addValArgTo fun expr fun_ty ) cont
       StrictBind { sc_bndr = b, sc_bndrs = bs, sc_body = body
                  , sc_env = se, sc_cont = cont }
         -> do { (floats1, env') <- simplNonRecX (se `setInScopeFromE` env) b expr
@@ -1271,7 +1342,7 @@ In particular, we want to behave well on
 
 
  * (f |> co) @t1 @t2 ... @tn x1 .. xm
-   Here we wil use pushCoTyArg and pushCoValArg successively, which
+   Here we will use pushCoTyArg and pushCoValArg successively, which
    build up NthCo stacks.  Silly to do that if co is reflexive.
 
 However, we don't want to call isReflexiveCo too much, because it uses
@@ -1310,20 +1381,20 @@ simplCast env body co0 cont0
           where
             co' = mkTransCo co1 co2
 
-        addCoerce co cont@(ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
+        addCoerce co (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = tail })
           | Just (arg_ty', m_co') <- pushCoTyArg co arg_ty
-            -- N.B. As mentioned in Note [The hole type in ApplyToTy] this is
-            -- only needed by `sc_hole_ty` which is often not forced.
-            -- Consequently it is worthwhile using a lazy pattern match here to
-            -- avoid unnecessary coercionKind evaluations.
-          , let hole_ty = coercionLKind co
           = {-#SCC "addCoerce-pushCoTyArg" #-}
             do { tail' <- addCoerceM m_co' tail
-               ; return (cont { sc_arg_ty  = arg_ty'
-                              , sc_hole_ty = hole_ty  -- NB!  As the cast goes past, the
-                                                      -- type of the hole changes (#16312)
-                              , sc_cont    = tail' }) }
-
+               ; return (ApplyToTy { sc_arg_ty  = arg_ty'
+                                   , sc_cont    = tail'
+                                   , sc_hole_ty = coercionLKind co }) }
+                                        -- NB!  As the cast goes past, the
+                                        -- type of the hole changes (#16312)
+
+        -- (f |> co) e   ===>   (f (e |> co1)) |> co2
+        -- where   co :: (s1->s2) ~ (t1~t2)
+        --         co1 :: t1 ~ s1
+        --         co2 :: s2 ~ t2
         addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
                                       , sc_dup = dup, sc_cont = tail })
           | Just (co1, m_co2) <- pushCoValArg co
@@ -1347,7 +1418,8 @@ simplCast env body co0 cont0
                ; return (ApplyToVal { sc_arg  = mkCast arg' co1
                                     , sc_env  = arg_se'
                                     , sc_dup  = dup'
-                                    , sc_cont = tail' }) } }
+                                    , sc_cont = tail'
+                                    , sc_hole_ty = coercionLKind co }) } }
 
         addCoerce co cont
           | isReflexiveCo co = return cont  -- Having this at the end makes a huge
@@ -1426,7 +1498,7 @@ simplLamBndr env bndr
   | isId bndr && hasCoreUnfolding old_unf   -- Special case
   = do { (env1, bndr1) <- simplBinder env bndr
        ; unf'          <- simplStableUnfolding env1 NotTopLevel Nothing bndr
-                                               old_unf (idType bndr1)
+                                      (idType bndr1) (idArity bndr1) old_unf
        ; let bndr2 = bndr1 `setIdUnfolding` unf'
        ; return (modifyInScope env1 bndr2, bndr2) }
 
@@ -1874,22 +1946,24 @@ rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args
 rebuildCall env info (CastIt co cont)
   = rebuildCall env (addCastTo info co) cont
 
-rebuildCall env info (ApplyToTy { sc_arg_ty = arg_ty, sc_cont = cont })
-  = rebuildCall env (addTyArgTo info arg_ty) cont
+rebuildCall env info (ApplyToTy { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = cont })
+  = rebuildCall env (addTyArgTo info arg_ty hole_ty) cont
 
-rebuildCall env info@(ArgInfo { ai_encl = encl_rules, ai_type = fun_ty
+rebuildCall env info@(ArgInfo { ai_encl = encl_rules
                               , ai_strs = str:strs, ai_discs = disc:discs })
             (ApplyToVal { sc_arg = arg, sc_env = arg_se
-                        , sc_dup = dup_flag, sc_cont = cont })
+                        , sc_dup = dup_flag, sc_hole_ty = fun_ty
+                        , sc_cont = cont })
   | isSimplified dup_flag     -- See Note [Avoid redundant simplification]
-  = rebuildCall env (addValArgTo info' arg) cont
+  = rebuildCall env (addValArgTo info' arg fun_ty) cont
 
   | str         -- Strict argument
   , sm_case_case (getMode env)
   = -- pprTrace "Strict Arg" (ppr arg $$ ppr (seIdSubst env) $$ ppr (seInScope env)) $
     simplExprF (arg_se `setInScopeFromE` env) arg
                (StrictArg { sc_fun = info', sc_cci = cci_strict
-                          , sc_dup = Simplified, sc_cont = cont })
+                          , sc_dup = Simplified, sc_fun_ty = fun_ty
+                          , sc_cont = cont })
                 -- Note [Shadowing]
 
   | otherwise                           -- Lazy argument
@@ -1899,7 +1973,7 @@ rebuildCall env info@(ArgInfo { ai_encl = encl_rules, ai_type = fun_ty
         -- floating a demanded let.
   = do  { arg' <- simplExprC (arg_se `setInScopeFromE` env) arg
                              (mkLazyArgStop arg_ty cci_lazy)
-        ; rebuildCall env (addValArgTo info' arg') cont }
+        ; rebuildCall env (addValArgTo info' arg' fun_ty) cont }
   where
     info'  = info { ai_strs = strs, ai_discs = discs }
     arg_ty = funArgTy fun_ty
@@ -2107,9 +2181,11 @@ trySeqRules in_env scrut rhs cont
   where
     no_cast_scrut = drop_casts scrut
     scrut_ty  = exprType no_cast_scrut
-    seq_id_ty = idType seqId
-    res1_ty   = piResultTy seq_id_ty rhs_rep
-    res2_ty   = piResultTy res1_ty   scrut_ty
+    seq_id_ty = idType seqId                    -- forall r a (b::TYPE r). a -> b -> b
+    res1_ty   = piResultTy seq_id_ty rhs_rep    -- forall a (b::TYPE rhs_rep). a -> b -> b
+    res2_ty   = piResultTy res1_ty   scrut_ty   -- forall (b::TYPE rhs_rep). scrut_ty -> b -> b
+    res3_ty   = piResultTy res2_ty   rhs_ty     -- scrut_ty -> rhs_ty -> rhs_ty
+    res4_ty   = funResultTy res3_ty             -- rhs_ty -> rhs_ty
     rhs_ty    = substTy in_env (exprType rhs)
     rhs_rep   = getRuntimeRep rhs_ty
     out_args  = [ TyArg { as_arg_ty  = rhs_rep
@@ -2118,9 +2194,11 @@ trySeqRules in_env scrut rhs cont
                         , as_hole_ty = res1_ty }
                 , TyArg { as_arg_ty  = rhs_ty
                         , as_hole_ty = res2_ty }
-                , ValArg no_cast_scrut]
+                , ValArg { as_arg = no_cast_scrut
+                         , as_hole_ty = res3_ty } ]
     rule_cont = ApplyToVal { sc_dup = NoDup, sc_arg = rhs
-                           , sc_env = in_env, sc_cont = cont }
+                           , sc_env = in_env, sc_cont = cont
+                           , sc_hole_ty = res4_ty }
     -- Lazily evaluated, so we don't do most of this
 
     drop_casts (Cast e _) = drop_casts e
@@ -3110,7 +3188,8 @@ mkDupableCont env (StrictBind { sc_bndr = bndr, sc_bndrs = bndrs
                              , sc_dup  = OkToDup
                              , sc_cont = mkBoringStop res_ty } ) }
 
-mkDupableCont env (StrictArg { sc_fun = info, sc_cci = cci, sc_cont = cont })
+mkDupableCont env (StrictArg { sc_fun = info, sc_cci = cci
+                             , sc_cont = cont, sc_fun_ty = fun_ty })
         -- See Note [Duplicating StrictArg]
         -- NB: sc_dup /= OkToDup; that is caught earlier by contIsDupable
   = do { (floats1, cont') <- mkDupableCont env cont
@@ -3118,8 +3197,9 @@ mkDupableCont env (StrictArg { sc_fun = info, sc_cci = cci, sc_cont = cont })
                                            (ai_args info)
        ; return ( foldl' addLetFloats floats1 floats_s
                 , StrictArg { sc_fun = info { ai_args = args' }
-                            , sc_cci = cci
                             , sc_cont = cont'
+                            , sc_cci = cci
+                            , sc_fun_ty = fun_ty
                             , sc_dup = OkToDup} ) }
 
 mkDupableCont env (ApplyToTy { sc_cont = cont
@@ -3129,7 +3209,8 @@ mkDupableCont env (ApplyToTy { sc_cont = cont
                                     , sc_arg_ty = arg_ty, sc_hole_ty = hole_ty }) }
 
 mkDupableCont env (ApplyToVal { sc_arg = arg, sc_dup = dup
-                              , sc_env = se, sc_cont = cont })
+                              , sc_env = se, sc_cont = cont
+                              , sc_hole_ty = hole_ty })
   =     -- e.g.         [...hole...] (...arg...)
         --      ==>
         --              let a = ...arg...
@@ -3147,7 +3228,8 @@ mkDupableCont env (ApplyToVal { sc_arg = arg, sc_dup = dup
                                          -- arg'' in its in-scope set, even if makeTrivial
                                          -- has turned arg'' into a fresh variable
                                          -- See Note [StaticEnv invariant] in GHC.Core.Opt.Simplify.Utils
-                              , sc_dup = OkToDup, sc_cont = cont' }) }
+                              , sc_dup = OkToDup, sc_cont = cont'
+                              , sc_hole_ty = hole_ty }) }
 
 mkDupableCont env (Select { sc_bndr = case_bndr, sc_alts = alts
                           , sc_env = se, sc_cont = cont })
@@ -3491,11 +3573,11 @@ because we don't know its usage in each RHS separately
 simplLetUnfolding :: SimplEnv-> TopLevelFlag
                   -> MaybeJoinCont
                   -> InId
-                  -> OutExpr -> OutType
+                  -> OutExpr -> OutType -> Arity
                   -> Unfolding -> SimplM Unfolding
-simplLetUnfolding env top_lvl cont_mb id new_rhs rhs_ty unf
+simplLetUnfolding env top_lvl cont_mb id new_rhs rhs_ty arity unf
   | isStableUnfolding unf
-  = simplStableUnfolding env top_lvl cont_mb id unf rhs_ty
+  = simplStableUnfolding env top_lvl cont_mb id rhs_ty arity unf
   | isExitJoinId id
   = return noUnfolding -- See Note [Do not inline exit join points] in GHC.Core.Opt.Exitify
   | otherwise
@@ -3521,9 +3603,10 @@ mkLetUnfolding dflags top_lvl src id new_rhs
 simplStableUnfolding :: SimplEnv -> TopLevelFlag
                      -> MaybeJoinCont  -- Just k => a join point with continuation k
                      -> InId
-                     -> Unfolding -> OutType -> SimplM Unfolding
+                     -> OutType -> Arity -> Unfolding
+                     ->SimplM Unfolding
 -- Note [Setting the new unfolding]
-simplStableUnfolding env top_lvl mb_cont id unf rhs_ty
+simplStableUnfolding env top_lvl mb_cont id rhs_ty id_arity unf
   = case unf of
       NoUnfolding   -> return unf
       BootUnfolding -> return unf
@@ -3536,9 +3619,13 @@ simplStableUnfolding env top_lvl mb_cont id unf rhs_ty
 
       CoreUnfolding { uf_tmpl = expr, uf_src = src, uf_guidance = guide }
         | isStableSource src
-        -> do { expr' <- case mb_cont of -- See Note [Rules and unfolding for join points]
-                           Just cont -> simplJoinRhs unf_env id expr cont
-                           Nothing   -> simplExprC unf_env expr (mkBoringStop rhs_ty)
+        -> do { expr' <- case mb_cont of
+                           Just cont -> -- Binder is a join point
+                                        -- See Note [Rules and unfolding for join points]
+                                        simplJoinRhs unf_env id expr cont
+                           Nothing   -> -- Binder is not a join point
+                                        do { expr' <- simplExprC unf_env expr (mkBoringStop rhs_ty)
+                                           ; return (eta_expand expr') }
               ; case guide of
                   UnfWhen { ug_arity = arity
                           , ug_unsat_ok = sat_ok
@@ -3575,7 +3662,41 @@ simplStableUnfolding env top_lvl mb_cont id unf rhs_ty
     unf_env    = updMode (updModeForStableUnfoldings act) env
          -- See Note [Simplifying inside stable unfoldings] in GHC.Core.Opt.Simplify.Utils
 
-{-
+    -- See Note [Eta-expand stable unfoldings]
+    eta_expand expr
+      | not eta_on         = expr
+      | exprIsTrivial expr = expr
+      | otherwise          = etaExpand id_arity expr
+    eta_on = sm_eta_expand (getMode env)
+
+{- Note [Eta-expand stable unfoldings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For INLINE/INLINABLE things (which get stable unfoldings) there's a danger
+of getting
+   f :: Int -> Int -> Int -> Blah
+   [ Arity = 3                 -- Good arity
+   , Unf=Stable (\xy. blah)    -- Less good arity, only 2
+   f = \pqr. e
+
+This can happen because f's RHS is optimised more vigorously than
+its stable unfolding.  Now suppose we have a call
+   g = f x
+Because f has arity=3, g will have arity=2.  But if we inline f (using
+its stable unfolding) g's arity will reduce to 1, because <blah>
+hasn't been optimised yet.  This happened in the 'parsec' library,
+for Text.Pasec.Char.string.
+
+Generally, if we know that 'f' has arity N, it seems sensible to
+eta-expand the stable unfolding to arity N too. Simple and consistent.
+
+Wrinkles
+* Don't eta-expand a trivial expr, else each pass will eta-reduce it,
+  and then eta-expand again. See Note [Do not eta-expand trivial expressions]
+  in GHC.Core.Opt.Simplify.Utils.
+* Don't eta-expand join points; see Note [Do not eta-expand join points]
+  in GHC.Core.Opt.Simplify.Utils.  We uphold this because the join-point
+  case (mb_cont = Just _) doesn't use eta_expand.
+
 Note [Force bottoming field]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We need to force bottoming, or the new unfolding holds


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -118,7 +118,9 @@ data SimplCont
         SimplCont
 
   | ApplyToVal         -- (ApplyToVal arg K)[e] = K[ e arg ]
-      { sc_dup  :: DupFlag      -- See Note [DupFlag invariants]
+      { sc_dup     :: DupFlag   -- See Note [DupFlag invariants]
+      , sc_hole_ty :: OutType   -- Type of the function, presumably (forall a. blah)
+                                -- See Note [The hole type in ApplyToTy/Val]
       , sc_arg  :: InExpr       -- The argument,
       , sc_env  :: StaticEnv    -- see Note [StaticEnv invariant]
       , sc_cont :: SimplCont }
@@ -126,7 +128,7 @@ data SimplCont
   | ApplyToTy          -- (ApplyToTy ty K)[e] = K[ e ty ]
       { sc_arg_ty  :: OutType     -- Argument type
       , sc_hole_ty :: OutType     -- Type of the function, presumably (forall a. blah)
-                                  -- See Note [The hole type in ApplyToTy]
+                                  -- See Note [The hole type in ApplyToTy/Val]
       , sc_cont    :: SimplCont }
 
   | Select             -- (Select alts K)[e] = K[ case e of alts ]
@@ -151,6 +153,9 @@ data SimplCont
       , sc_fun  :: ArgInfo     -- Specifies f, e1..en, Whether f has rules, etc
                                --     plus strictness flags for *further* args
       , sc_cci  :: CallCtxt    -- Whether *this* argument position is interesting
+      , sc_fun_ty :: OutType   -- Type of the function (f e1 .. en),
+                               -- presumably (arg_ty -> res_ty)
+                               -- where res_ty is expected by sc_cont
       , sc_cont :: SimplCont }
 
   | TickIt              -- (TickIt t K)[e] = K[ tick t e ]
@@ -254,8 +259,6 @@ data ArgInfo
         ai_fun   :: OutId,      -- The function
         ai_args  :: [ArgSpec],  -- ...applied to these args (which are in *reverse* order)
 
-        ai_type  :: OutType,    -- Type of (f a1 ... an)
-
         ai_rules :: FunRules,   -- Rules for this function
 
         ai_encl :: Bool,        -- Flag saying whether this function
@@ -271,37 +274,36 @@ data ArgInfo
     }
 
 data ArgSpec
-  = ValArg OutExpr                    -- Apply to this (coercion or value); c.f. ApplyToVal
+  = ValArg { as_arg :: OutExpr        -- Apply to this (coercion or value); c.f. ApplyToVal
+           , as_hole_ty :: OutType }  -- Type of the function (presumably t1 -> t2)
   | TyArg { as_arg_ty  :: OutType     -- Apply to this type; c.f. ApplyToTy
           , as_hole_ty :: OutType }   -- Type of the function (presumably forall a. blah)
   | CastBy OutCoercion                -- Cast by this; c.f. CastIt
 
 instance Outputable ArgSpec where
-  ppr (ValArg e)                 = text "ValArg" <+> ppr e
+  ppr (ValArg { as_arg = arg })  = text "ValArg" <+> ppr arg
   ppr (TyArg { as_arg_ty = ty }) = text "TyArg" <+> ppr ty
   ppr (CastBy c)                 = text "CastBy" <+> ppr c
 
-addValArgTo :: ArgInfo -> OutExpr -> ArgInfo
-addValArgTo ai arg = ai { ai_args = ValArg arg : ai_args ai
-                        , ai_type = applyTypeToArg (ai_type ai) arg
-                        , ai_rules = decRules (ai_rules ai) }
+addValArgTo :: ArgInfo -> OutExpr -> OutType -> ArgInfo
+addValArgTo ai arg hole_ty = ai { ai_args = arg_spec : ai_args ai
+                                , ai_rules = decRules (ai_rules ai) }
+  where
+    arg_spec = ValArg { as_arg = arg, as_hole_ty = hole_ty }
 
-addTyArgTo :: ArgInfo -> OutType -> ArgInfo
-addTyArgTo ai arg_ty = ai { ai_args = arg_spec : ai_args ai
-                          , ai_type = piResultTy poly_fun_ty arg_ty
-                          , ai_rules = decRules (ai_rules ai) }
+addTyArgTo :: ArgInfo -> OutType -> OutType -> ArgInfo
+addTyArgTo ai arg_ty hole_ty = ai { ai_args = arg_spec : ai_args ai
+                                  , ai_rules = decRules (ai_rules ai) }
   where
-    poly_fun_ty = ai_type ai
-    arg_spec    = TyArg { as_arg_ty = arg_ty, as_hole_ty = poly_fun_ty }
+    arg_spec = TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty }
 
 addCastTo :: ArgInfo -> OutCoercion -> ArgInfo
-addCastTo ai co = ai { ai_args = CastBy co : ai_args ai
-                     , ai_type = coercionRKind co }
+addCastTo ai co = ai { ai_args = CastBy co : ai_args ai }
 
 argInfoAppArgs :: [ArgSpec] -> [OutExpr]
 argInfoAppArgs []                              = []
 argInfoAppArgs (CastBy {}                : _)  = []  -- Stop at a cast
-argInfoAppArgs (ValArg e                 : as) = e       : argInfoAppArgs as
+argInfoAppArgs (ValArg { as_arg = arg }  : as) = arg     : argInfoAppArgs as
 argInfoAppArgs (TyArg { as_arg_ty = ty } : as) = Type ty : argInfoAppArgs as
 
 pushSimplifiedArgs :: SimplEnv -> [ArgSpec] -> SimplCont -> SimplCont
@@ -310,7 +312,9 @@ pushSimplifiedArgs env  (arg : args) k
   = case arg of
       TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty }
                -> ApplyToTy  { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = rest }
-      ValArg e -> ApplyToVal { sc_arg = e, sc_env = env, sc_dup = Simplified, sc_cont = rest }
+      ValArg { as_arg = arg, as_hole_ty = hole_ty }
+             -> ApplyToVal { sc_arg = arg, sc_env = env, sc_dup = Simplified
+                           , sc_hole_ty = hole_ty, sc_cont = rest }
       CastBy c -> CastIt c rest
   where
     rest = pushSimplifiedArgs env args k
@@ -323,7 +327,7 @@ argInfoExpr fun rev_args
   = go rev_args
   where
     go []                              = Var fun
-    go (ValArg a                 : as) = go as `App` a
+    go (ValArg { as_arg = arg }  : as) = go as `App` arg
     go (TyArg { as_arg_ty = ty } : as) = go as `App` Type ty
     go (CastBy co                : as) = mkCast (go as) co
 
@@ -409,11 +413,9 @@ contHoleType (TickIt _ k)                     = contHoleType k
 contHoleType (CastIt co _)                    = coercionLKind co
 contHoleType (StrictBind { sc_bndr = b, sc_dup = dup, sc_env = se })
   = perhapsSubstTy dup se (idType b)
-contHoleType (StrictArg { sc_fun = ai })      = funArgTy (ai_type ai)
-contHoleType (ApplyToTy  { sc_hole_ty = ty }) = ty  -- See Note [The hole type in ApplyToTy]
-contHoleType (ApplyToVal { sc_arg = e, sc_env = se, sc_dup = dup, sc_cont = k })
-  = mkVisFunTy (perhapsSubstTy dup se (exprType e))
-               (contHoleType k)
+contHoleType (StrictArg  { sc_fun_ty = ty })  = funArgTy ty
+contHoleType (ApplyToTy  { sc_hole_ty = ty }) = ty  -- See Note [The hole type in ApplyToTy/Val]
+contHoleType (ApplyToVal { sc_hole_ty = ty }) = ty  -- See Note [The hole type in ApplyToTy/Val]
 contHoleType (Select { sc_dup = d, sc_bndr =  b, sc_env = se })
   = perhapsSubstTy d se (idType b)
 
@@ -458,13 +460,13 @@ mkArgInfo :: SimplEnv
 
 mkArgInfo env fun rules n_val_args call_cont
   | n_val_args < idArity fun            -- Note [Unsaturated functions]
-  = ArgInfo { ai_fun = fun, ai_args = [], ai_type = fun_ty
+  = ArgInfo { ai_fun = fun, ai_args = []
             , ai_rules = fun_rules
             , ai_encl = False
             , ai_strs = vanilla_stricts
             , ai_discs = vanilla_discounts }
   | otherwise
-  = ArgInfo { ai_fun = fun, ai_args = [], ai_type = fun_ty
+  = ArgInfo { ai_fun = fun, ai_args = []
             , ai_rules = fun_rules
             , ai_encl  = interestingArgContext rules call_cont
             , ai_strs  = arg_stricts
@@ -1076,7 +1078,7 @@ seems to be to do a callSiteInline based on the fact that there is
 something interesting about the call site (it's strict).  Hmm.  That
 seems a bit fragile.
 
-Conclusion: inline top level things gaily until Phase 0 (the last
+Conclusion: inline top level things gaily until finalPhase (the last
 phase), at which point don't.
 
 Note [pre/postInlineUnconditionally in gentle mode]
@@ -1199,23 +1201,21 @@ preInlineUnconditionally env top_lvl bndr rhs rhs_env
       -- not ticks.  Counting ticks cannot be duplicated, and non-counting
       -- ticks around a Lam will disappear anyway.
 
-    early_phase = case sm_phase mode of
-                    Phase 0 -> False
-                    _       -> True
--- If we don't have this early_phase test, consider
---      x = length [1,2,3]
--- The full laziness pass carefully floats all the cons cells to
--- top level, and preInlineUnconditionally floats them all back in.
--- Result is (a) static allocation replaced by dynamic allocation
---           (b) many simplifier iterations because this tickles
---               a related problem; only one inlining per pass
---
--- On the other hand, I have seen cases where top-level fusion is
--- lost if we don't inline top level thing (e.g. string constants)
--- Hence the test for phase zero (which is the phase for all the final
--- simplifications).  Until phase zero we take no special notice of
--- top level things, but then we become more leery about inlining
--- them.
+    early_phase = not (isFinalPhase (sm_phase mode))
+    -- If we don't have this early_phase test, consider
+    --      x = length [1,2,3]
+    -- The full laziness pass carefully floats all the cons cells to
+    -- top level, and preInlineUnconditionally floats them all back in.
+    -- Result is (a) static allocation replaced by dynamic allocation
+    --           (b) many simplifier iterations because this tickles
+    --               a related problem; only one inlining per pass
+    --
+    -- On the other hand, I have seen cases where top-level fusion is
+    -- lost if we don't inline top level thing (e.g. string constants)
+    -- Hence the test for phase zero (which is the phase for all the final
+    -- simplifications).  Until phase zero we take no special notice of
+    -- top level things, but then we become more leery about inlining
+    -- them.
 
 {-
 ************************************************************************
@@ -1530,7 +1530,7 @@ tryEtaExpandRhs mode bndr rhs
          return (new_arity, is_bot, new_rhs) }
   where
     try_expand
-      | exprIsTrivial rhs
+      | exprIsTrivial rhs  -- See Note [Do not eta-expand trivial expressions]
       = return (exprArity rhs, False, rhs)
 
       | sm_eta_expand mode      -- Provided eta-expansion is on
@@ -1574,9 +1574,17 @@ because then 'genMap' will inline, and it really shouldn't: at least
 as far as the programmer is concerned, it's not applied to two
 arguments!
 
+Note [Do not eta-expand trivial expressions]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Do not eta-expand a trivial RHS like
+  f = g
+If we eta expand do
+  f = \x. g x
+we'll just eta-reduce again, and so on; so the
+simplifier never terminates.
+
 Note [Do not eta-expand join points]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 Similarly to CPR (see Note [Don't w/w join points for CPR] in
 GHC.Core.Opt.WorkWrap), a join point stands well to gain from its outer binding's
 eta-expansion, and eta-expanding a join point is fraught with issues like how to


=====================================
compiler/GHC/Core/Opt/SpecConstr.hs
=====================================
@@ -1760,8 +1760,8 @@ Note [Transfer activation]
 In which phase should the specialise-constructor rules be active?
 Originally I made them always-active, but Manuel found that this
 defeated some clever user-written rules.  Then I made them active only
-in Phase 0; after all, currently, the specConstr transformation is
-only run after the simplifier has reached Phase 0, but that meant
+in finalPhase; after all, currently, the specConstr transformation is
+only run after the simplifier has reached finalPhase, but that meant
 that specialisations didn't fire inside wrappers; see test
 simplCore/should_compile/spec-inline.
 


=====================================
compiler/GHC/Core/Opt/WorkWrap.hs
=====================================
@@ -245,8 +245,8 @@ NOINLINE pragma to the worker.
 (See #13143 for a real-world example.)
 
 It is crucial that we do this for *all* NOINLINE functions. #10069
-demonstrates what happens when we promise to w/w a (NOINLINE) leaf function, but
-fail to deliver:
+demonstrates what happens when we promise to w/w a (NOINLINE) leaf
+function, but fail to deliver:
 
   data C = C Int# Int#
 
@@ -426,8 +426,15 @@ Reminder: Note [Don't w/w INLINE things], so we don't need to worry
 
 Conclusion:
   - If the user said NOINLINE[n], respect that
-  - If the user said NOINLINE, inline the wrapper as late as
-    poss (phase 0). This is a compromise driven by (2) above
+
+  - If the user said NOINLINE, inline the wrapper only after
+    phase 0, the last user-visible phase.  That means that all
+    rules will have had a chance to fire.
+
+    What phase is after phase 0?  Answer: finalPhase, phase (-1).
+    That's the reason finalPhase exists. NB: user's can't write
+    INLINE[-1] f; it's syntactically illegal.
+
   - Otherwise inline wrapper in phase 2.  That allows the
     'gentle' simplification pass to apply specialisation rules
 
@@ -575,8 +582,8 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs
         work_uniq <- getUniqueM
         let work_rhs = work_fn rhs
             work_act = case fn_inline_spec of  -- See Note [Worker activation]
-                          NoInline -> fn_act
-                          _        -> wrap_act
+                          NoInline -> inl_act fn_inl_prag
+                          _        -> inl_act wrap_prag
 
             work_prag = InlinePragma { inl_src = SourceText "{-# INLINE"
                                      , inl_inline = fn_inline_spec
@@ -626,19 +633,7 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs
                           | otherwise   = topDmd
 
             wrap_rhs  = wrap_fn work_id
-            wrap_act  = case fn_act of  -- See Note [Wrapper activation]
-                           ActiveAfter {} -> fn_act
-                           NeverActive    -> activeDuringFinal
-                           _              -> activeAfterInitial
-            wrap_prag = InlinePragma { inl_src    = SourceText "{-# INLINE"
-                                     , inl_inline = NoUserInline
-                                     , inl_sat    = Nothing
-                                     , inl_act    = wrap_act
-                                     , inl_rule   = rule_match_info }
-                -- inl_act:    see Note [Wrapper activation]
-                -- inl_inline: see Note [Wrapper NoUserInline]
-                -- inl_rule:   RuleMatchInfo is (and must be) unaffected
-
+            wrap_prag = mkStrWrapperInlinePrag fn_inl_prag
             wrap_id   = fn_id `setIdUnfolding`  mkWwInlineRule dflags wrap_rhs arity
                               `setInlinePragma` wrap_prag
                               `setIdOccInfo`    noOccInfo
@@ -655,8 +650,6 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs
     rhs_fvs         = exprFreeVars rhs
     fn_inl_prag     = inlinePragInfo fn_info
     fn_inline_spec  = inl_inline fn_inl_prag
-    fn_act          = inl_act fn_inl_prag
-    rule_match_info = inlinePragmaRuleMatchInfo fn_inl_prag
     fn_unfolding    = unfoldingInfo fn_info
     arity           = arityInfo fn_info
                     -- The arity is set by the simplifier using exprEtaExpandArity
@@ -674,6 +667,20 @@ splitFun dflags fam_envs fn_id fn_info wrap_dmds div cpr rhs
                   | otherwise      = topCpr
 
 
+mkStrWrapperInlinePrag :: InlinePragma -> InlinePragma
+mkStrWrapperInlinePrag (InlinePragma { inl_act = act, inl_rule = rule_info })
+  = InlinePragma { inl_src    = SourceText "{-# INLINE"
+                 , inl_inline = NoUserInline -- See Note [Wrapper NoUserInline]
+                 , inl_sat    = Nothing
+                 , inl_act    = wrap_act     -- See Note [Wrapper activation]
+                 , inl_rule   = rule_info }  -- RuleMatchInfo is (and must be) unaffected
+  where
+    wrap_act  = case act of  -- See Note [Wrapper activation]
+                   ActiveAfter {} -> act
+                   NeverActive    -> activeDuringFinal
+                   _              -> activeAfterInitial
+
+
 {-
 Note [Demand on the worker]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/SimpleOpt.hs
=====================================
@@ -1340,8 +1340,7 @@ pushCoTyArg co ty
   | otherwise
   = Nothing
   where
-    tyL = coercionLKind co
-    tyR = coercionRKind co
+    Pair tyL tyR = coercionKind co
        -- co :: tyL ~ tyR
        -- tyL = forall (a1 :: k1). ty1
        -- tyR = forall (a2 :: k2). ty2


=====================================
compiler/GHC/Core/Type.hs
=====================================
@@ -51,7 +51,7 @@ module GHC.Core.Type (
         splitPiTy_maybe, splitPiTy, splitPiTys,
         mkTyConBindersPreferAnon,
         mkPiTy, mkPiTys,
-        mkLamType, mkLamTypes,
+        mkLamType, mkLamTypes, mkFunctionType,
         piResultTy, piResultTys,
         applyTysX, dropForAlls,
         mkFamilyTyConApp,
@@ -256,7 +256,7 @@ import {-# SOURCE #-} GHC.Core.Coercion
    , mkTyConAppCo, mkAppCo, mkCoVarCo, mkAxiomRuleCo
    , mkForAllCo, mkFunCo, mkAxiomInstCo, mkUnivCo
    , mkSymCo, mkTransCo, mkNthCo, mkLRCo, mkInstCo
-   , mkKindCo, mkSubCo, mkFunCo, mkAxiomInstCo
+   , mkKindCo, mkSubCo
    , decomposePiCos, coercionKind, coercionLKind
    , coercionRKind, coercionType
    , isReflexiveCo, seqCo )
@@ -1517,6 +1517,8 @@ mkLamType  :: Var -> Type -> Type
 mkLamTypes :: [Var] -> Type -> Type
 -- ^ 'mkLamType' for multiple type or value arguments
 
+mkLamTypes vs ty = foldr mkLamType ty vs
+
 mkLamType v body_ty
    | isTyVar v
    = ForAllTy (Bndr v Inferred) body_ty
@@ -1525,43 +1527,19 @@ mkLamType v body_ty
    , v `elemVarSet` tyCoVarsOfType body_ty
    = ForAllTy (Bndr v Required) body_ty
 
-   | isPredTy arg_ty  -- See Note [mkLamType: dictionary arguments]
-   = mkInvisFunTy arg_ty body_ty
-
    | otherwise
-   = mkVisFunTy arg_ty body_ty
-   where
-     arg_ty = varType v
-
-mkLamTypes vs ty = foldr mkLamType ty vs
+   = mkFunctionType (varType v) body_ty
 
-{- Note [mkLamType: dictionary arguments]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-If we have (\ (d :: Ord a). blah), we want to give it type
-           (Ord a => blah_ty)
-with a fat arrow; that is, using mkInvisFunTy, not mkVisFunTy.
 
-Why? After all, we are in Core, where (=>) and (->) behave the same.
-Yes, but the /specialiser/ does treat dictionary arguments specially.
-Suppose we do w/w on 'foo' in module A, thus (#11272, #6056)
-   foo :: Ord a => Int -> blah
-   foo a d x = case x of I# x' -> $wfoo @a d x'
+mkFunctionType :: Type -> Type -> Type
+-- This one works out the AnonArgFlag from the argument type
+-- See GHC.Types.Var Note [AnonArgFlag]
+mkFunctionType arg_ty res_ty
+   | isPredTy arg_ty -- See GHC.Types.Var Note [AnonArgFlag]
+   = mkInvisFunTy arg_ty res_ty
 
-   $wfoo :: Ord a => Int# -> blah
-
-Now in module B we see (foo @Int dOrdInt).  The specialiser will
-specialise this to $sfoo, where
-   $sfoo :: Int -> blah
-   $sfoo x = case x of I# x' -> $wfoo @Int dOrdInt x'
-
-Now we /must/ also specialise $wfoo!  But it wasn't user-written,
-and has a type built with mkLamTypes.
-
-Conclusion: the easiest thing is to make mkLamType build
-            (c => ty)
-when the argument is a predicate type.  See GHC.Core.TyCo.Rep
-Note [Types for coercions, predicates, and evidence]
--}
+   | otherwise
+   = mkVisFunTy arg_ty res_ty
 
 -- | Given a list of type-level vars and the free vars of a result kind,
 -- makes TyCoBinders, preferring anonymous binders


=====================================
compiler/GHC/Core/Unfold.hs
=====================================
@@ -412,6 +412,8 @@ inlineBoringOk e
                         , exprIsTrivial a  = go (credit-1) f
     go credit (Tick _ e)                   = go credit e -- dubious
     go credit (Cast e _)                   = go credit e
+    go credit (Case scrut _ _ [(_,_,rhs)]) -- See Note [Inline unsafeCoerce]
+      | isUnsafeEqualityProof scrut        = go credit rhs
     go _      (Var {})                     = boringCxtOk
     go _      _                            = boringCxtNotOk
 
@@ -459,7 +461,21 @@ calcUnfoldingGuidance dflags is_top_bottoming expr
                        | otherwise             = (+)
              -- See Note [Function and non-function discounts]
 
-{-
+{- Note [Inline unsafeCoerce]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We really want to inline unsafeCoerce, even when applied to boring
+arguments.  It doesn't look as if its RHS is smaller than the call
+   unsafeCoerce x = case unsafeEqualityProof @a @b of UnsafeRefl -> x
+but that case is discarded -- see Note [Implementing unsafeCoerce]
+in base:Unsafe.Coerce.
+
+Moreover, if we /don't/ inline it, we may be left with
+          f (unsafeCoerce x)
+which will build a thunk -- bad, bad, bad.
+
+Conclusion: we really want inlineBoringOk to be True of the RHS of
+unsafeCoerce.  This is (U4a) in Note [Implementing unsafeCoerce].
+
 Note [Computing the size of an expression]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The basic idea of sizeExpr is obvious enough: count nodes.  But getting the


=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -56,6 +56,9 @@ module GHC.Core.Utils (
         -- * Join points
         isJoinBind,
 
+        -- * unsafeEqualityProof
+        isUnsafeEqualityProof,
+
         -- * Dumping stuff
         dumpIdInfoOfProgram
     ) where
@@ -66,7 +69,7 @@ import GHC.Prelude
 import GHC.Platform
 
 import GHC.Core
-import GHC.Builtin.Names ( makeStaticName )
+import GHC.Builtin.Names ( makeStaticName, unsafeEqualityProofName )
 import GHC.Core.Ppr
 import GHC.Core.FVs( exprFreeVars )
 import GHC.Types.Var
@@ -2533,3 +2536,20 @@ dumpIdInfoOfProgram ppr_id_info binds = vcat (map printId ids)
   getIds (Rec bs)     = map fst bs
   printId id | isExportedId id = ppr id <> colon <+> (ppr_id_info (idInfo id))
              | otherwise       = empty
+
+
+{- *********************************************************************
+*                                                                      *
+             unsafeEqualityProof
+*                                                                      *
+********************************************************************* -}
+
+isUnsafeEqualityProof :: CoreExpr -> Bool
+-- See (U3) and (U4) in
+-- Note [Implementing unsafeCoerce] in base:Unsafe.Coerce
+isUnsafeEqualityProof e
+  | Var v `App` Type _ `App` Type _ `App` Type _ <- e
+  = idName v == unsafeEqualityProofName
+  | otherwise
+  = False
+


=====================================
compiler/GHC/CoreToStg/Prep.hs
=====================================
@@ -1011,15 +1011,6 @@ cpExprIsTrivial e
   | otherwise
   = exprIsTrivial e
 
-isUnsafeEqualityProof :: CoreExpr -> Bool
--- See (U3) and (U4) in
--- Note [Implementing unsafeCoerce] in base:Unsafe.Coerce
-isUnsafeEqualityProof e
-  | Var v `App` Type _ `App` Type _ `App` Type _ <- e
-  = idName v == unsafeEqualityProofName
-  | otherwise
-  = False
-
 -- This is where we arrange that a non-trivial argument is let-bound
 cpeArg :: CorePrepEnv -> Demand
        -> CoreArg -> Type -> UniqSM (Floats, CpeArg)


=====================================
compiler/GHC/Types/Basic.hs
=====================================
@@ -83,6 +83,7 @@ module GHC.Types.Basic (
         Activation(..), isActive, isActiveIn, competesWith,
         isNeverActive, isAlwaysActive, isEarlyActive,
         activeAfterInitial, activeDuringFinal,
+        finalPhase, isFinalPhase,
 
         RuleMatchInfo(..), isConLike, isFunLike,
         InlineSpec(..), noUserInlineSpec,
@@ -1300,6 +1301,25 @@ pprWithSourceText (SourceText src) _ = text src
 ************************************************************************
 
 When a rule or inlining is active
+
+Note [Compiler phases]
+~~~~~~~~~~~~~~~~~~~~~~
+The CompilerPhase says which phase the simplifier is running in:
+
+* InitialPhase: before all user-visible phases
+
+* Phase 2,1,0: user-visible phases; the phase number
+  controls rule ordering an inlining.
+
+* Phase (-1) = finalPhase: used for all subsequent simplifier
+  runs. By delaying inlining of wrappers to phase (-1) we can
+  ensure that RULE have a good chance to fire. See
+  Note [Wrapper activation] in GHC.Core.Opt.WorkWrap
+
+  Note: users don't have access to phase (-1); it's syntactically
+  illegal to write {-# INLINE[-1] f #-}
+
+The phase sequencing is done by GHC.Opt.Simplify.Driver
 -}
 
 -- | Phase Number
@@ -1317,12 +1337,21 @@ instance Outputable CompilerPhase where
 
 activeAfterInitial :: Activation
 -- Active in the first phase after the initial phase
--- Currently we have just phases [2,1,0]
+-- Currently we have just phases [2,1,0,-1]
+-- Where "-1" means GHC's internal simplification steps
+-- after all rules have run
 activeAfterInitial = ActiveAfter NoSourceText 2
 
 activeDuringFinal :: Activation
 -- Active in the final simplification phase (which is repeated)
-activeDuringFinal = ActiveAfter NoSourceText 0
+activeDuringFinal = ActiveAfter NoSourceText (-1)
+
+finalPhase :: CompilerPhase
+finalPhase = Phase (-1)
+
+isFinalPhase :: CompilerPhase -> Bool
+isFinalPhase (Phase (-1)) = True
+isFinalPhase _            = False
 
 -- See note [Pragma source text]
 data Activation = NeverActive


=====================================
compiler/GHC/Types/Var.hs
=====================================
@@ -465,10 +465,10 @@ instance Binary ArgFlag where
       _ -> return Inferred
 
 -- | The non-dependent version of 'ArgFlag'.
-
--- Appears here partly so that it's together with its friend ArgFlag,
--- but also because it is used in IfaceType, rather early in the
--- compilation chain
+-- See Note [AnonArgFlag]
+-- Appears here partly so that it's together with its friends ArgFlag
+-- and ForallVisFlag, but also because it is used in IfaceType, rather
+-- early in the compilation chain
 -- See Note [AnonArgFlag vs. ForallVisFlag]
 data AnonArgFlag
   = VisArg    -- ^ Used for @(->)@: an ordinary non-dependent arrow.
@@ -511,7 +511,60 @@ argToForallVisFlag Required  = ForallVis
 argToForallVisFlag Specified = ForallInvis
 argToForallVisFlag Inferred  = ForallInvis
 
-{-
+{- Note [AnonArgFlag]
+~~~~~~~~~~~~~~~~~~~~~
+AnonArgFlag is used principally in the FunTy constructor of Type.
+  FunTy VisArg   t1 t2   means   t1 -> t2
+  FunTy InvisArg t1 t2   means   t1 => t2
+
+However, the AnonArgFlag in a FunTy is just redundant, cached
+information.  In (FunTy { ft_af = af, ft_arg = t1, ft_res = t2 })
+  * if (isPredTy t1 = True)  then af = InvisArg
+  * if (isPredTy t1 = False) then af = VisArg
+where isPredTy is defined in GHC.Core.Type, and sees if t1's
+kind is Constraint.  See GHC.Core.TyCo.Rep
+Note [Types for coercions, predicates, and evidence]
+
+GHC.Core.Type.mkFunctionType :: Type -> Type -> Type
+uses isPredTy to decide the AnonArgFlag for the FunTy.
+
+The term (Lam b e), and coercion (FunCo co1 co2) don't carry
+AnonArgFlags; instead they use mkFunctionType when we want to
+get their types; see mkLamType and coercionLKind/RKind resp.
+This is just an engineering choice; we could cache here too
+if we wanted.
+
+Why bother with all this? After all, we are in Core, where (=>) and
+(->) behave the same.  We maintain this distinction throughout Core so
+that we can cheaply and conveniently determine
+* How to print a type
+* How to split up a type: tcSplitSigmaTy
+* How to specialise it (over type classes; GHC.Core.Opt.Specialise)
+
+For the specialisation point, consider
+(\ (d :: Ord a). blah).  We want to give it type
+           (Ord a => blah_ty)
+with a fat arrow; that is, using mkInvisFunTy, not mkVisFunTy.
+Why?  Because the /specialiser/ treats dictionary arguments specially.
+Suppose we do w/w on 'foo', thus (#11272, #6056)
+   foo :: Ord a => Int -> blah
+   foo a d x = case x of I# x' -> $wfoo @a d x'
+
+   $wfoo :: Ord a => Int# -> blah
+
+Now, at a call we see (foo @Int dOrdInt).  The specialiser will
+specialise this to $sfoo, where
+   $sfoo :: Int -> blah
+   $sfoo x = case x of I# x' -> $wfoo @Int dOrdInt x'
+
+Now we /must/ also specialise $wfoo!  But it wasn't user-written,
+and has a type built with mkLamTypes.
+
+Conclusion: the easiest thing is to make mkLamType build
+            (c => ty)
+when the argument is a predicate type.  See GHC.Core.TyCo.Rep
+Note [Types for coercions, predicates, and evidence]
+
 Note [AnonArgFlag vs. ForallVisFlag]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The AnonArgFlag and ForallVisFlag data types are quite similar at a first
@@ -522,15 +575,19 @@ glance:
 
 Both data types keep track of visibility of some sort. AnonArgFlag tracks
 whether a FunTy has a visible argument (->) or an invisible predicate argument
-(=>). ForallVisFlag tracks whether a `forall` quantifier is visible
-(forall a -> {...}) or invisible (forall a. {...}).
-
-Given their similarities, it's tempting to want to combine these two data types
-into one, but they actually represent distinct concepts. AnonArgFlag reflects a
-property of *Core* types, whereas ForallVisFlag reflects a property of the GHC
-AST. In other words, AnonArgFlag is all about internals, whereas ForallVisFlag
-is all about surface syntax. Therefore, they are kept as separate data types.
--}
+(=>). ForallVisFlag tracks whether a `forall` quantifier in a user-specified
+HsType is
+   visible:   forall a -> {...}
+   invisible: forall a. {...}
+In fact the visible form can currently only appear in kinds.
+
+Given their similarities, it's tempting to want to combine these two
+data types into one, but they actually represent distinct
+concepts. AnonArgFlag reflects a property of *Core* types, whereas
+ForallVisFlag reflects a property of the HsSyn source-code AST. In
+other words, AnonArgFlag is all about internals, whereas ForallVisFlag
+is all about surface syntax. Therefore, they are kept as separate data
+types.  -}
 
 {- *********************************************************************
 *                                                                      *


=====================================
libraries/base/Unsafe/Coerce.hs
=====================================
@@ -22,7 +22,6 @@ import GHC.Natural () -- See Note [Depend on GHC.Natural] in GHC.Base
 import GHC.Types
 
 {- Note [Implementing unsafeCoerce]
-
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The implementation of unsafeCoerce is surprisingly subtle.
 This Note describes the moving parts.  You will find more
@@ -126,9 +125,13 @@ several ways
      Flaoting the case is OK here, even though it broardens the
      scope, becuase we are done with simplification.
 
-(U4) GHC.CoreToStg.Prep.cpeExprIsTrivial anticipated the
+(U4) GHC.CoreToStg.Prep.cpeExprIsTrivial anticipates the
      upcoming discard of unsafeEqualityProof.
 
+(U4a) Ditto GHC.Core.Unfold.inlineBoringOk we want to treat
+      the RHS of unsafeCoerce as very small; see
+      Note [Inline unsafeCoerce] in that module.
+
 (U5) The definition of unsafeEqualityProof in Unsafe.Coerce
      looks very strange:
         unsafeEqualityProof = case unsafeEqualityProof @a @b of
@@ -161,7 +164,7 @@ several ways
      to simplify the ase when the two tpyes are equal.
 
 (U8) The is a super-magic RULE in GHC.base
-         map cocerce = coerce
+         map coerce = coerce
      (see Note [Getting the map/coerce RULE to work] in CoreOpt)
      But it's all about turning coerce into a cast, and unsafeCoerce
      no longer does that.  So we need a separate map/unsafeCoerce


=====================================
testsuite/tests/codeGen/should_compile/debug.stdout
=====================================
@@ -18,6 +18,7 @@ src<debug.hs:4:9>
 src<debug.hs:5:21-29>
 src<debug.hs:5:9-29>
 src<debug.hs:6:1-21>
+src<debug.hs:6:16-21>
 == CBE ==
 src<debug.hs:4:9>
 89


=====================================
testsuite/tests/deSugar/should_compile/T2431.stderr
=====================================
@@ -4,7 +4,7 @@ Result size of Tidy Core
   = {terms: 63, types: 43, coercions: 1, joins: 0/0}
 
 -- RHS size: {terms: 2, types: 4, coercions: 1, joins: 0/0}
-T2431.$WRefl [InlPrag=INLINE[0] CONLIKE] :: forall a. a :~: a
+T2431.$WRefl [InlPrag=INLINE[-1] CONLIKE] :: forall a. a :~: a
 [GblId[DataConWrapper],
  Caf=NoCafRefs,
  Cpr=m1,


=====================================
testsuite/tests/perf/compiler/T16473.stdout
=====================================
@@ -1,10 +1,10 @@
 Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op liftA2 (BUILTIN)
-Rule fired: Class op $p1Applicative (BUILTIN)
 Rule fired: Class op <*> (BUILTIN)
-Rule fired: Class op <$ (BUILTIN)
 Rule fired: Class op $p1Applicative (BUILTIN)
+Rule fired: Class op <$ (BUILTIN)
 Rule fired: Class op <*> (BUILTIN)
+Rule fired: Class op $p1Applicative (BUILTIN)
 Rule fired: Class op fmap (BUILTIN)
 Rule fired: Class op pure (BUILTIN)
 Rule fired: Class op pure (BUILTIN)


=====================================
testsuite/tests/simplCore/should_compile/T13143.stderr
=====================================
@@ -12,7 +12,7 @@ T13143.$wf = \ (@a) _ [Occ=Dead] -> T13143.$wf @a GHC.Prim.void#
 end Rec }
 
 -- RHS size: {terms: 4, types: 4, coercions: 0, joins: 0/0}
-f [InlPrag=NOUSERINLINE[0]] :: forall a. Int -> a
+f [InlPrag=NOUSERINLINE[-1]] :: forall a. Int -> a
 [GblId,
  Arity=1,
  Str=<B,A>b,


=====================================
testsuite/tests/simplCore/should_compile/T3772.stdout
=====================================
@@ -62,7 +62,7 @@ T3772.$wfoo
       }
 
 -- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0}
-foo [InlPrag=NOUSERINLINE[0]] :: Int -> ()
+foo [InlPrag=NOUSERINLINE[-1]] :: Int -> ()
 [GblId,
  Arity=1,
  Str=<S,1*U(U)>,


=====================================
testsuite/tests/simplCore/should_compile/T7360.stderr
=====================================
@@ -4,7 +4,7 @@ Result size of Tidy Core
   = {terms: 106, types: 47, coercions: 0, joins: 0/0}
 
 -- RHS size: {terms: 6, types: 3, coercions: 0, joins: 0/0}
-T7360.$WFoo3 [InlPrag=INLINE[0] CONLIKE] :: Int -> Foo
+T7360.$WFoo3 [InlPrag=INLINE[-1] CONLIKE] :: Int -> Foo
 [GblId[DataConWrapper],
  Arity=1,
  Caf=NoCafRefs,


=====================================
testsuite/tests/simplCore/should_compile/T7865.stdout
=====================================
@@ -1,6 +1,6 @@
 T7865.$wexpensive [InlPrag=NOINLINE]
 T7865.$wexpensive
-expensive [InlPrag=NOUSERINLINE[0]] :: Int -> Int
+expensive [InlPrag=NOUSERINLINE[-1]] :: Int -> Int
                  case T7865.$wexpensive ww1 of ww2 [Occ=Once] { __DEFAULT ->
 expensive
       case T7865.$wexpensive ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }


=====================================
testsuite/tests/stranal/should_compile/Makefile
=====================================
@@ -10,3 +10,9 @@ T13031:
 # take only one Int# argument
 T16029:
 	'$(TEST_HC)' $(TEST_HC_OPTS) -c -O -fforce-recomp T16029.hs -dsuppress-uniques -ddump-simpl | grep '::.*Int'
+
+T18078:
+	'$(TEST_HC)' $(TEST_HC_OPTS) -c -O -fforce-recomp T18078.hs -dsuppress-uniques -ddump-simpl | grep 'wf'
+
+T17673:
+	'$(TEST_HC)' $(TEST_HC_OPTS) -c -O -fforce-recomp T17673.hs -dsuppress-uniques -ddump-simpl | grep 'wf'


=====================================
testsuite/tests/stranal/should_compile/T16029.stdout
=====================================
@@ -1,4 +1,4 @@
-T16029.$WMkT [InlPrag=INLINE[0] CONLIKE] :: Int -> Int -> T
+T16029.$WMkT [InlPrag=INLINE[-1] CONLIKE] :: Int -> Int -> T
          Tmpl= \ (dt [Occ=Once!] :: Int) (dt [Occ=Once!] :: Int) ->
   = \ (dt [Occ=Once!] :: Int) (dt [Occ=Once!] :: Int) ->
   :: GHC.Prim.Int# -> GHC.Prim.Int#


=====================================
testsuite/tests/stranal/should_compile/T17673.hs
=====================================
@@ -0,0 +1,6 @@
+module T17673 where
+
+facIO :: Int -> IO Int
+facIO n | n < 2     = return 1
+        | otherwise = do n' <- facIO (n-1); return (n*n')
+{-# NOINLINE facIO #-}


=====================================
testsuite/tests/stranal/should_compile/T17673.stdout
=====================================
@@ -0,0 +1,5 @@
+T17673.$wfacIO [InlPrag=NOINLINE, Occ=LoopBreaker]
+T17673.$wfacIO
+          case T17673.$wfacIO (GHC.Prim.-# ww 1#) w of { (# ipv, ipv1 #) ->
+                 T17673.$wfacIO ww1 w1
+      case w of { GHC.Types.I# ww1 -> T17673.$wfacIO ww1 w1 }


=====================================
testsuite/tests/stranal/should_compile/T18078.hs
=====================================
@@ -0,0 +1,13 @@
+module T18078 where
+
+newtype N = N { unN :: Int -> Int }
+
+-- This an example of a worker/wrapper thing
+-- See Note [Cast worker/wrappers] in Simplify
+-- We should get good code, with a $wf calling itself
+-- but in 8.10 we do not
+f :: N
+{-# NOINLINE f #-}
+f = N (\n -> if n==0 then 0 else unN f (n-1))
+
+g x = unN f (x+1)


=====================================
testsuite/tests/stranal/should_compile/T18078.stdout
=====================================
@@ -0,0 +1,6 @@
+T18078.$wf [InlPrag=NOINLINE, Occ=LoopBreaker]
+T18078.$wf
+        __DEFAULT -> T18078.$wf (GHC.Prim.-# wild 1#);
+                 case T18078.$wf ww1 of ww2 [Occ=Once] { __DEFAULT ->
+      case T18078.$wf ww1 of ww2 { __DEFAULT -> GHC.Types.I# ww2 }
+      case T18078.$wf (GHC.Prim.+# x1 1#) of ww { __DEFAULT ->


=====================================
testsuite/tests/stranal/should_compile/all.T
=====================================
@@ -52,3 +52,6 @@ test('T17852',  [ grep_errmsg(r'\\$wf ::') ], compile, ['-ddump-worker-wrapper -
 test('T16029', normal, makefile_test, [])
 test('T10069',  [ grep_errmsg(r'(wc1).*Int#$') ], compile, ['-dppr-cols=200 -ddump-simpl'])
 test('T13380b',  [ grep_errmsg('bigDeadAction') ], compile, ['-dppr-cols=200 -ddump-simpl'])
+test('T18078', normal, makefile_test, [])
+test('T17673', normal, makefile_test, [])
+



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/10b0b9b1f2f06986d30c4fd511f4e4bbb7e62168

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/10b0b9b1f2f06986d30c4fd511f4e4bbb7e62168
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200522/0ca5bd16/attachment-0001.html>


More information about the ghc-commits mailing list