[Git][ghc/ghc][wip/T22434] 6 commits: Fire RULES in the Specialiser

Krzysztof Gogolewski (@monoidal) gitlab at gitlab.haskell.org
Fri Nov 11 02:56:25 UTC 2022



Krzysztof Gogolewski pushed to branch wip/T22434 at Glasgow Haskell Compiler / GHC


Commits:
f9f17b68 by Simon Peyton Jones at 2022-11-10T12:20:03+00:00
Fire RULES in the Specialiser

The Specialiser has, for some time, fires class-op RULES in the
specialiser itself: see
   Note [Specialisation modulo dictionary selectors]

This MR beefs it up a bit, so that it fires /all/ RULES in the
specialiser, not just class-op rules.  See
   Note [Fire rules in the specialiser]
The result is a bit more specialisation; see test
   simplCore/should_compile/T21851_2

This pushed me into a bit of refactoring.  I made a new data types
GHC.Core.Rules.RuleEnv, which combines
  - the several source of rules (local, home-package, external)
  - the orphan-module dependencies

in a single record for `getRules` to consult.  That drove a bunch of
follow-on refactoring, including allowing me to remove
cr_visible_orphan_mods from the CoreReader data type.

I moved some of the RuleBase/RuleEnv stuff into GHC.Core.Rule.

The reorganisation in the Simplifier improve compile times a bit
(geom mean -0.1%), but T9961 is an outlier

Metric Decrease:
    T9961

- - - - -
2b3d0bee by Simon Peyton Jones at 2022-11-10T12:21:13+00:00
Make indexError work better

The problem here is described at some length in
Note [Boxity for bottoming functions] and
Note [Reboxed crud for bottoming calls] in GHC.Core.Opt.DmdAnal.

This patch adds a SPECIALISE pragma for indexError, which
makes it much less vulnerable to the problem described in
these Notes.

(This came up in another line of work, where a small change made
indexError do reboxing (in nofib/spectral/simple/table_sort)
that didn't happen before my change.  I've opened #22404
to document the fagility.

- - - - -
399e921b by Simon Peyton Jones at 2022-11-10T12:21:14+00:00
Fix DsUselessSpecialiseForClassMethodSelector msg

The error message for DsUselessSpecialiseForClassMethodSelector
was just wrong (a typo in some earlier work); trivial fix

- - - - -
dac0682a by Sebastian Graf at 2022-11-10T21:16:01-05:00
WorkWrap: Unboxing unboxed tuples is not always useful (#22388)

See Note [Unboxing through unboxed tuples].

Fixes #22388.

- - - - -
1230c268 by Sebastian Graf at 2022-11-10T21:16:01-05:00
Boxity: Handle argument budget of unboxed tuples correctly (#21737)

Now Budget roughly tracks the combined width of all arguments after unarisation.
See the changes to `Note [Worker argument budgets]`.

Fixes #21737.

- - - - -
13b7c9ef by Simon Peyton Jones at 2022-11-11T03:41:37+01:00
Add a fast path for data constructor workers

See Note [Fast path for data constructors] in
GHC.Core.Opt.Simplify.Iteration

This bypasses lots of expensive logic, in the special case of
applications of data constructors.  It is a surprisingly worthwhile
improvement, as you can see in the figures below.

Metrics: compile_time/bytes allocated
------------------------------------------------
          CoOpt_Read(normal)   -2.0%
    CoOpt_Singletons(normal)   -2.0%
    ManyConstructors(normal)   -1.3%
              T10421(normal)   -1.9% GOOD
             T10421a(normal)   -1.5%
              T10858(normal)   -1.6%
              T11545(normal)   -1.7%
              T12234(optasm)   -1.3%
              T12425(optasm)   -1.9% GOOD
              T13035(normal)   -1.0% GOOD
              T13056(optasm)   -1.8%
              T13253(normal)   -3.3% GOOD
              T15164(normal)   -1.7%
              T15304(normal)   -3.4%
              T15630(normal)   -2.8%
              T16577(normal)   -4.3% GOOD
              T17096(normal)   -1.1%
              T17516(normal)   -3.1%
              T18282(normal)   -1.9%
              T18304(normal)   -1.2%
             T18698a(normal)   -1.2% GOOD
             T18698b(normal)   -1.5% GOOD
              T18923(normal)   -1.3%
               T1969(normal)   -1.3% GOOD
              T19695(normal)   -4.4% GOOD
             T21839c(normal)   -2.7% GOOD
             T21839r(normal)   -2.7% GOOD
               T4801(normal)   -3.8% GOOD
               T5642(normal)   -3.1% GOOD
               T6048(optasm)   -2.5% GOOD
               T9020(optasm)   -2.7% GOOD
               T9630(normal)   -2.1% GOOD
               T9961(normal)  -11.7% GOOD
               WWRec(normal)   -1.0%

                   geo. mean   -1.1%
                   minimum    -11.7%
                   maximum     +0.1%

Metric Decrease:
    T10421
    T12425
    T13035
    T13253
    T16577
    T18698a
    T18698b
    T1969
    T19695
    T21839c
    T21839r
    T4801
    T5642
    T6048
    T9020
    T9630
    T9961

- - - - -


29 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/InstEnv.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/Monad.hs
- compiler/GHC/Core/Opt/Pipeline.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Monad.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/Specialise.hs
- compiler/GHC/Core/Opt/WorkWrap/Utils.hs
- compiler/GHC/Core/Rules.hs
- compiler/GHC/Driver/Config/Core/Opt/Simplify.hs
- compiler/GHC/HsToCore/Errors/Ppr.hs
- compiler/GHC/Types/Id/Make.hs
- compiler/GHC/Unit/External.hs
- libraries/base/GHC/Ix.hs
- libraries/base/GHC/Real.hs
- testsuite/tests/simplCore/should_compile/T21851.stderr
- + testsuite/tests/simplCore/should_compile/T21851_2.hs
- + testsuite/tests/simplCore/should_compile/T21851_2.stderr
- + testsuite/tests/simplCore/should_compile/T21851_2a.hs
- testsuite/tests/simplCore/should_compile/all.T
- + testsuite/tests/stranal/should_compile/T22388.hs
- + testsuite/tests/stranal/should_compile/T22388.stderr
- testsuite/tests/stranal/should_compile/all.T
- + testsuite/tests/stranal/sigs/T21737.hs
- + testsuite/tests/stranal/sigs/T21737.stderr
- testsuite/tests/stranal/sigs/all.T


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -85,9 +85,8 @@ module GHC.Core (
         IsOrphan(..), isOrphan, notOrphan, chooseOrphanAnchor,
 
         -- * Core rule data types
-        CoreRule(..), RuleBase,
-        RuleName, RuleFun, IdUnfoldingFun, InScopeEnv,
-        RuleEnv(..), RuleOpts, mkRuleEnv, emptyRuleEnv,
+        CoreRule(..),
+        RuleName, RuleFun, IdUnfoldingFun, InScopeEnv, RuleOpts,
 
         -- ** Operations on 'CoreRule's
         ruleArity, ruleName, ruleIdName, ruleActivation,
@@ -105,7 +104,6 @@ import GHC.Core.Coercion
 import GHC.Core.Rules.Config ( RuleOpts )
 import GHC.Types.Name
 import GHC.Types.Name.Set
-import GHC.Types.Name.Env( NameEnv )
 import GHC.Types.Literal
 import GHC.Types.Tickish
 import GHC.Core.DataCon
@@ -1062,6 +1060,12 @@ has two major consequences
    M.  But it's painful, because it means we need to keep track of all
    the orphan modules below us.
 
+ * The "visible orphan modules" are all the orphan module in the transitive
+   closure of the imports of this module.
+
+ * During instance lookup, we filter orphan instances depending on
+   whether or not the instance is in a visible orphan module.
+
  * A non-orphan is not finger-printed separately.  Instead, for
    fingerprinting purposes it is treated as part of the entity it
    mentions on the LHS.  For example
@@ -1076,12 +1080,20 @@ has two major consequences
 
 Orphan-hood is computed
   * For class instances:
-      when we make a ClsInst
-    (because it is needed during instance lookup)
+    when we make a ClsInst in GHC.Core.InstEnv.mkLocalInstance
+      (because it is needed during instance lookup)
+    See Note [When exactly is an instance decl an orphan?]
+        in GHC.Core.InstEnv
+
+  * For rules
+    when we generate a CoreRule (GHC.Core.Rules.mkRule)
+
+  * For family instances:
+    when we generate an IfaceFamInst (GHC.Iface.Make.instanceToIfaceInst)
+
+Orphan-hood is persisted into interface files, in ClsInst, FamInst,
+and CoreRules.
 
-  * For rules and family instances:
-       when we generate an IfaceRule (GHC.Iface.Make.coreRuleToIfaceRule)
-                     or IfaceFamInst (GHC.Iface.Make.instanceToIfaceInst)
 -}
 
 {-
@@ -1096,49 +1108,6 @@ GHC.Core.FVs, GHC.Core.Subst, GHC.Core.Ppr, GHC.Core.Tidy also inspect the
 representation.
 -}
 
--- | Gathers a collection of 'CoreRule's. Maps (the name of) an 'Id' to its rules
-type RuleBase = NameEnv [CoreRule]
-        -- The rules are unordered;
-        -- we sort out any overlaps on lookup
-
--- | A full rule environment which we can apply rules from.  Like a 'RuleBase',
--- but it also includes the set of visible orphans we use to filter out orphan
--- rules which are not visible (even though we can see them...)
-data RuleEnv
-    = RuleEnv { re_base          :: [RuleBase] -- See Note [Why re_base is a list]
-              , re_visible_orphs :: ModuleSet
-              }
-
-mkRuleEnv :: RuleBase -> [Module] -> RuleEnv
-mkRuleEnv rules vis_orphs = RuleEnv [rules] (mkModuleSet vis_orphs)
-
-emptyRuleEnv :: RuleEnv
-emptyRuleEnv = RuleEnv [] emptyModuleSet
-
-{-
-Note [Why re_base is a list]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In Note [Overall plumbing for rules], it is explained that the final
-RuleBase which we must consider is combined from 4 different sources.
-
-During simplifier runs, the fourth source of rules is constantly being updated
-as new interfaces are loaded into the EPS. Therefore just before we check to see
-if any rules match we get the EPS RuleBase and combine it with the existing RuleBase
-and then perform exactly 1 lookup into the new map.
-
-It is more efficient to avoid combining the environments and store the uncombined
-environments as we can instead perform 1 lookup into each environment and then combine
-the results.
-
-Essentially we use the identity:
-
-> lookupNameEnv n (plusNameEnv_C (++) rb1 rb2)
->   = lookupNameEnv n rb1 ++ lookupNameEnv n rb2
-
-The latter being more efficient as we don't construct an intermediate
-map.
--}
 
 -- | A 'CoreRule' is:
 --


=====================================
compiler/GHC/Core/InstEnv.hs
=====================================
@@ -323,7 +323,9 @@ mkImportedInstance cls_nm mb_tcs dfun_name dfun oflag orphan
 {-
 Note [When exactly is an instance decl an orphan?]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-  (see GHC.Iface.Make.instanceToIfaceInst, which implements this)
+(See GHC.Iface.Make.instanceToIfaceInst, which implements this.)
+See Note [Orphans] in GHC.Core
+
 Roughly speaking, an instance is an orphan if its head (after the =>)
 mentions nothing defined in this module.
 


=====================================
compiler/GHC/Core/Opt/DmdAnal.hs
=====================================
@@ -45,6 +45,7 @@ import GHC.Builtin.PrimOps
 import GHC.Builtin.Types.Prim ( realWorldStatePrimTy )
 import GHC.Types.Unique.Set
 import GHC.Types.Unique.MemoFun
+import GHC.Types.RepType
 
 
 {-
@@ -1520,6 +1521,9 @@ $wtheresCrud = \ ww ww1 ->
       ...
 ```
 This is currently a bug that we willingly accept and it's documented in #21128.
+
+See also Note [indexError] in base:GHC.Ix, which describes how we use
+SPECIALISE to mitigate this problem for indexError.
 -}
 
 {- *********************************************************************
@@ -1762,7 +1766,7 @@ Note [Worker argument budget]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In 'finaliseArgBoxities' we don't want to generate workers with zillions of
 argument when, say given a strict record with zillions of fields.  So we
-limit the maximum number of worker args to the maximum of
+limit the maximum number of worker args ('max_wkr_args') to the maximum of
   - -fmax-worker-args=N
   - The number of args in the original function; if it already has has
     zillions of arguments we don't want to seek /fewer/ args in the worker.
@@ -1771,10 +1775,91 @@ limit the maximum number of worker args to the maximum of
 We pursue a "layered" strategy for unboxing: we unbox the top level of the
 argument(s), subject to budget; if there are any arguments left we unbox the
 next layer, using that depleted budget.
+Unboxing an argument *increases* the budget for the inner layer roughly
+according to how many registers that argument takes (unboxed tuples take
+multiple registers, see below), as determined by 'unariseArity'.
+Budget is spent when we have to pass a non-absent field as a parameter.
 
 To achieve this, we use the classic almost-circular programming technique in
 which we we write one pass that takes a lazy list of the Budgets for every
-layer.
+layer. The effect is that of a breadth-first search (over argument type and
+demand structure) to compute Budgets followed by a depth-first search to
+construct the product demands, but laziness allows us to do it all in one
+pass and without intermediate data structures.
+
+Suppose we have -fmax-worker-args=4 for the remainder of this Note.
+Then consider this example function:
+
+  boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+  boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+
+With a budget of 4 args to spend (number of args is only 2), we'd be served well
+to unbox both pairs, but not the triple. Indeed, that is what the algorithm
+computes, and the following pictogram shows how the budget layers are computed.
+Each layer is started with `n ~>`, where `n` is the budget at the start of the
+layer. We write -n~> when we spend budget (and n is the remaining budget) and
++n~> when we earn budget. We separate unboxed args with ][ and indicate
+inner budget threads becoming negative in braces {{}}, so that we see which
+unboxing decision we do *not* commit to. Without further ado:
+
+  4 ~> ][     (a,b) -3~>               ][     (c, ...) -2~>
+       ][      | |                     ][      |   |
+       ][      | +-------------+       ][      |   +-----------------+
+       ][      |               |       ][      |                     |
+       ][      v               v       ][      v                     v
+  2 ~> ][ +3~> a  -2~> ][      b  -1~> ][ +2~> c  -1~> ][        (d, e, f) -0~>
+       ][      |       ][      |       ][      |       ][ {{      |  |  |                          }}
+       ][      |       ][      |       ][      |       ][ {{      |  |  +----------------+         }}
+       ][      v       ][      v       ][      v       ][ {{      v  +------v            v         }}
+  0 ~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ {{ +1~> d -0~> ][ e -(-1)~> ][ f -(-2)~> }}
+
+Unboxing increments the budget we have on the next layer (because we don't need
+to retain the boxed arg), but in turn the inner layer must afford to retain all
+non-absent fields, each decrementing the budget. Note how the budget becomes
+negative when trying to unbox the triple and the unboxing decision is "rolled
+back". This is done by the 'positiveTopBudget' guard.
+
+There's a bit of complication as a result of handling unboxed tuples correctly;
+specifically, handling nested unboxed tuples. Consider (#21737)
+
+  unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+  unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+
+Recall that unboxed tuples will be flattened to individual arguments during
+unarisation. Here, `unboxed` will have 5 arguments at runtime because of the
+nested unboxed tuple, which will be flattened to 4 args. So it's best to leave
+`(a,b)` boxed (because we already are above our arg threshold), but unbox `c`
+through `f` because that doesn't increase the number of args post unarisation.
+
+Note that the challenge is that syntactically, `(# d, e, f #)` occurs in a
+deeper layer than `(a, b)`. Treating unboxed tuples as a regular data type, we'd
+make the same unboxing decisions as for `boxed` above; although our starting
+budget is 5 (Here, the number of args is greater than -fmax-worker-args), it's
+not enough to unbox the triple (we'd finish with budget -1). So we'd unbox `a`
+through `c`, but not `d` through `f`, which is silly, because then we'd end up
+having 6 arguments at runtime, of which `d` through `f` weren't unboxed.
+
+Hence we pretend that the fields of unboxed tuples appear in the same budget
+layer as the tuple itself. For example at the top-level, `(# x,y #)` is to be
+treated just like two arguments `x` and `y`.
+Of course, for that to work, our budget calculations must initialise
+'max_wkr_args' to 5, based on the 'unariseArity' of each Core arg: That would be
+1 for the pair and 4 for the unboxed pair. Then when we decide whether to unbox
+the unboxed pair, we *directly* recurse into the fields, spending our budget
+on retaining `c` and (after recursing once more) `d` through `f` as arguments,
+depleting our budget completely in the first layer. Pictorially:
+
+  5 ~> ][         (a,b) -4~>             ][         (# c, ... #)
+       ][ {{      | |                 }} ][      c  -3~> ][ (# d, e, f #)
+       ][ {{      | +-------+         }} ][      |       ][      d  -2~> ][      e  -1~> ][      f  -0~>
+       ][ {{      |         |         }} ][      |       ][      |       ][      |       ][      |
+       ][ {{      v         v         }} ][      v       ][      v       ][      v       ][      v
+  0 ~> ][ {{ +1~> a -0~> ][ b -(-1)~> }} ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~> ][ +1~> I# -0~>
+
+As you can see, we have no budget left to justify unboxing `(a,b)` on the second
+layer, which is good, because it would increase the number of args. Also note
+that we can still unbox `c` through `f` in this layer, because doing so has a
+net zero effect on budget.
 
 Note [The OPAQUE pragma and avoiding the reboxing of arguments]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1795,10 +1880,17 @@ W/W-transformation code that boxed arguments of 'f' must definitely be passed
 along in boxed form and as such dissuade the creation of reboxing workers.
 -}
 
-data Budgets = MkB Arity Budgets   -- An infinite list of arity budgets
+-- | How many registers does this type take after unarisation?
+unariseArity :: Type -> Arity
+unariseArity ty = length (typePrimRep ty)
+
+data Budgets = MkB !Arity Budgets   -- An infinite list of arity budgets
 
-incTopBudget :: Budgets -> Budgets
-incTopBudget (MkB n bg) = MkB (n+1) bg
+earnTopBudget :: Budgets -> Budgets
+earnTopBudget (MkB n bg) = MkB (n+1) bg
+
+spendTopBudget :: Arity -> Budgets -> Budgets
+spendTopBudget m (MkB n bg) = MkB (n-m) bg
 
 positiveTopBudget :: Budgets -> Bool
 positiveTopBudget (MkB n _) = n >= 0
@@ -1811,7 +1903,8 @@ finaliseArgBoxities env fn arity rhs div
              -- Then there are no binders; we don't worker/wrapper; and we
              -- simply want to give f the same demand signature as g
 
-  | otherwise
+  | otherwise -- NB: arity is the threshold_arity, which might be less than
+              -- manifest arity for join points
   = -- pprTrace "finaliseArgBoxities" (
     --   vcat [text "function:" <+> ppr fn
     --        , text "dmds before:" <+> ppr (map idDemandInfo (filter isId bndrs))
@@ -1823,8 +1916,10 @@ finaliseArgBoxities env fn arity rhs div
   where
     opts            = ae_opts env
     (bndrs, _body)  = collectBinders rhs
-    max_wkr_args    = dmd_max_worker_args opts `max` arity
-                      -- See Note [Worker argument budget]
+    unarise_arity   = sum [ unariseArity (idType b) | b <- bndrs, isId b ]
+    max_wkr_args    = dmd_max_worker_args opts `max` unarise_arity
+                      -- This is the budget initialisation step of
+                      -- Note [Worker argument budget]
 
     -- This is the key line, which uses almost-circular programming
     -- The remaining budget from one layer becomes the initial
@@ -1868,22 +1963,49 @@ finaliseArgBoxities env fn arity rhs div
       = case wantToUnboxArg env ty str_mark dmd of
           DropAbsent -> (bg, dmd)
 
-          DontUnbox | is_bot_fn, isTyVarTy ty -> (decremented_bg, dmd)
-                    | otherwise               -> (decremented_bg, trimBoxity dmd)
+          DontUnbox | is_bot_fn, isTyVarTy ty -> (retain_budget, dmd)
+                    | otherwise               -> (retain_budget, trimBoxity dmd)
             -- If bot: Keep deep boxity even though WW won't unbox
             -- See Note [Boxity for bottoming functions] case (A)
             -- trimBoxity: see Note [No lazy, Unboxed demands in demand signature]
-
-          DoUnbox triples -> (MkB (bg_top-1) final_bg_inner, final_dmd)
             where
-              (bg_inner', dmds') = go_args (incTopBudget bg_inner) triples
-                     -- incTopBudget: give one back for the arg we are unboxing
+              retain_budget = spendTopBudget (unariseArity ty) bg
+                -- spendTopBudget: spend from our budget the cost of the
+                -- retaining the arg
+                -- The unboxed case does happen here, for example
+                --   app g x = g x :: (# Int, Int #)
+                -- here, `x` is used `L`azy and thus Boxed
+
+          DoUnbox triples
+            | isUnboxedTupleType ty
+            , (bg', dmds') <- go_args bg triples
+            -> (bg', n :* (mkProd Unboxed $! dmds'))
+                     -- See Note [Worker argument budget]
+                     -- unboxed tuples are always unboxed, deeply
+                     -- NB: Recurse with bg, *not* bg_inner! The unboxed fields
+                     -- are at the same budget layer.
+
+            | isUnboxedSumType ty
+            -> pprPanic "Unboxing through unboxed sum" (ppr fn <+> ppr ty)
+                     -- We currently don't return DoUnbox for unboxed sums.
+                     -- But hopefully we will at some point. When that happens,
+                     -- it would still be impossible to predict the effect
+                     -- of dropping absent fields and unboxing others on the
+                     -- unariseArity of the sum without losing sanity.
+                     -- We could overwrite bg_top with the one from
+                     -- retain_budget while still unboxing inside the alts as in
+                     -- the tuple case for a conservative solution, though.
+
+            | otherwise
+            -> (spendTopBudget 1 (MkB bg_top final_bg_inner), final_dmd)
+            where
+              (bg_inner', dmds') = go_args (earnTopBudget bg_inner) triples
+                     -- earnTopBudget: give back the cost of retaining the
+                     -- arg we are insted unboxing.
               dmd' = n :* (mkProd Unboxed $! dmds')
-              (final_bg_inner, final_dmd)
+              ~(final_bg_inner, final_dmd) -- "~": This match *must* be lazy!
                  | positiveTopBudget bg_inner' = (bg_inner', dmd')
                  | otherwise                   = (bg_inner,  trimBoxity dmd)
-      where
-        decremented_bg = MkB (bg_top-1) bg_inner
 
     add_demands :: [Demand] -> CoreExpr -> CoreExpr
     -- Attach the demands to the outer lambdas of this expression


=====================================
compiler/GHC/Core/Opt/Monad.hs
=====================================
@@ -19,10 +19,10 @@ module GHC.Core.Opt.Monad (
 
     -- ** Reading from the monad
     getHscEnv, getModule,
-    getRuleBase, getExternalRuleBase,
+    initRuleEnv, getExternalRuleBase,
     getDynFlags, getPackageFamInstEnv,
     getInteractiveContext,
-    getVisibleOrphanMods, getUniqMask,
+    getUniqMask,
     getPrintUnqualified, getSrcSpanM,
 
     -- ** Writing to the monad
@@ -45,7 +45,7 @@ import GHC.Prelude hiding ( read )
 import GHC.Driver.Session
 import GHC.Driver.Env
 
-import GHC.Core
+import GHC.Core.Rules     ( RuleBase, RuleEnv, mkRuleEnv )
 import GHC.Core.Opt.Stats ( SimplCount, zeroSimplCount, plusSimplCount )
 
 import GHC.Types.Annotations
@@ -114,12 +114,11 @@ pprFloatOutSwitches sw
 
 data CoreReader = CoreReader {
         cr_hsc_env             :: HscEnv,
-        cr_rule_base           :: RuleBase,
+        cr_rule_base           :: RuleBase,  -- Home package table rules
         cr_module              :: Module,
         cr_print_unqual        :: PrintUnqualified,
         cr_loc                 :: SrcSpan,   -- Use this for log/error messages so they
                                              -- are at least tagged with the right source file
-        cr_visible_orphan_mods :: !ModuleSet,
         cr_uniq_mask           :: !Char      -- Mask for creating unique values
 }
 
@@ -181,19 +180,17 @@ runCoreM :: HscEnv
          -> RuleBase
          -> Char -- ^ Mask
          -> Module
-         -> ModuleSet
          -> PrintUnqualified
          -> SrcSpan
          -> CoreM a
          -> IO (a, SimplCount)
-runCoreM hsc_env rule_base mask mod orph_imps print_unqual loc m
+runCoreM hsc_env rule_base mask mod print_unqual loc m
   = liftM extract $ runIOEnv reader $ unCoreM m
   where
     reader = CoreReader {
             cr_hsc_env = hsc_env,
             cr_rule_base = rule_base,
             cr_module = mod,
-            cr_visible_orphan_mods = orph_imps,
             cr_print_unqual = print_unqual,
             cr_loc = loc,
             cr_uniq_mask = mask
@@ -245,15 +242,18 @@ liftIOWithCount what = liftIO what >>= (\(count, x) -> addSimplCount count >> re
 getHscEnv :: CoreM HscEnv
 getHscEnv = read cr_hsc_env
 
-getRuleBase :: CoreM RuleBase
-getRuleBase = read cr_rule_base
+getHomeRuleBase :: CoreM RuleBase
+getHomeRuleBase = read cr_rule_base
+
+initRuleEnv :: ModGuts -> CoreM RuleEnv
+initRuleEnv guts
+  = do { hpt_rules <- getHomeRuleBase
+       ; eps_rules <- getExternalRuleBase
+       ; return (mkRuleEnv guts eps_rules hpt_rules) }
 
 getExternalRuleBase :: CoreM RuleBase
 getExternalRuleBase = eps_rule_base <$> get_eps
 
-getVisibleOrphanMods :: CoreM ModuleSet
-getVisibleOrphanMods = read cr_visible_orphan_mods
-
 getPrintUnqualified :: CoreM PrintUnqualified
 getPrintUnqualified = read cr_print_unqual
 


=====================================
compiler/GHC/Core/Opt/Pipeline.hs
=====================================
@@ -22,7 +22,7 @@ import GHC.Platform.Ways  ( hasWay, Way(WayProf) )
 
 import GHC.Core
 import GHC.Core.Opt.CSE  ( cseProgram )
-import GHC.Core.Rules   ( mkRuleBase, ruleCheckProgram, getRules )
+import GHC.Core.Rules   ( RuleBase, mkRuleBase, ruleCheckProgram, getRules )
 import GHC.Core.Ppr     ( pprCoreBindings )
 import GHC.Core.Utils   ( dumpIdInfoOfProgram )
 import GHC.Core.Lint    ( lintAnnots )
@@ -53,9 +53,7 @@ import GHC.Utils.Logger as Logger
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
 
-import GHC.Unit.Module.Env
 import GHC.Unit.Module.ModGuts
-import GHC.Unit.Module.Deps
 
 import GHC.Types.Id.Info
 import GHC.Types.Basic
@@ -78,14 +76,12 @@ import GHC.Unit.Module
 core2core :: HscEnv -> ModGuts -> IO ModGuts
 core2core hsc_env guts@(ModGuts { mg_module  = mod
                                 , mg_loc     = loc
-                                , mg_deps    = deps
                                 , mg_rdr_env = rdr_env })
   = do { let builtin_passes = getCoreToDo dflags hpt_rule_base extra_vars
-             orph_mods = mkModuleSet (mod : dep_orphs deps)
              uniq_mask = 's'
-       ;
+
        ; (guts2, stats) <- runCoreM hsc_env hpt_rule_base uniq_mask mod
-                                    orph_mods print_unqual loc $
+                                    print_unqual loc $
                            do { hsc_env' <- getHscEnv
                               ; all_passes <- withPlugins (hsc_plugins hsc_env')
                                                 installCoreToDos
@@ -121,7 +117,8 @@ core2core hsc_env guts@(ModGuts { mg_module  = mod
 -}
 
 getCoreToDo :: DynFlags -> RuleBase -> [Var] -> [CoreToDo]
-getCoreToDo dflags rule_base extra_vars
+-- This function builds the pipeline of optimisations
+getCoreToDo dflags hpt_rule_base extra_vars
   = flatten_todos core_todo
   where
     phases        = simplPhases        dflags
@@ -176,7 +173,7 @@ getCoreToDo dflags rule_base extra_vars
 
     ----------------------------
     run_simplifier mode iter
-      = CoreDoSimplify $ initSimplifyOpts dflags extra_vars iter mode rule_base
+      = CoreDoSimplify $ initSimplifyOpts dflags extra_vars iter mode hpt_rule_base
 
     simpl_phase phase name iter = CoreDoPasses $
                                   [ maybe_strictness_before phase
@@ -573,11 +570,9 @@ ruleCheckPass current_phase pat guts = do
     logger <- getLogger
     withTiming logger (text "RuleCheck"<+>brackets (ppr $ mg_module guts))
                 (const ()) $ do
-        rb <- getRuleBase
-        vis_orphs <- getVisibleOrphanMods
-        let rule_fn fn = getRules (RuleEnv [rb] vis_orphs) fn
-                          ++ (mg_rules guts)
-        let ropts = initRuleOpts dflags
+        rule_env <- initRuleEnv guts
+        let rule_fn fn = getRules rule_env fn
+            ropts = initRuleOpts dflags
         liftIO $ logDumpMsg logger "Rule check"
                      (ruleCheckProgram ropts current_phase pat
                         rule_fn (mg_binds guts))


=====================================
compiler/GHC/Core/Opt/Simplify.hs
=====================================
@@ -10,7 +10,7 @@ import GHC.Prelude
 import GHC.Driver.Flags
 
 import GHC.Core
-import GHC.Core.Rules   ( extendRuleBaseList, extendRuleEnv, addRuleInfo )
+import GHC.Core.Rules
 import GHC.Core.Ppr     ( pprCoreBindings, pprCoreExpr )
 import GHC.Core.Opt.OccurAnal ( occurAnalysePgm, occurAnalyseExpr )
 import GHC.Core.Stats   ( coreBindsSize, coreBindsStats, exprSize )
@@ -31,7 +31,6 @@ import GHC.Utils.Constants (debugIsOn)
 import GHC.Unit.Env ( UnitEnv, ueEPS )
 import GHC.Unit.External
 import GHC.Unit.Module.ModGuts
-import GHC.Unit.Module.Deps
 
 import GHC.Types.Id
 import GHC.Types.Id.Info
@@ -81,7 +80,7 @@ simplifyExpr logger euc opts expr
               simpl_env = mkSimplEnv (se_mode opts) fam_envs
               top_env_cfg = se_top_env_cfg opts
               read_eps_rules = eps_rule_base <$> eucEPS euc
-              read_ruleenv = extendRuleEnv emptyRuleEnv <$> read_eps_rules
+              read_ruleenv = updExternalPackageRules emptyRuleEnv <$> read_eps_rules
 
         ; let sz = exprSize expr
 
@@ -132,11 +131,11 @@ simplExprGently env expr = do
 -- The values of this datatype are /only/ driven by the demands of that function.
 data SimplifyOpts = SimplifyOpts
   { so_dump_core_sizes :: !Bool
-  , so_iterations :: !Int
-  , so_mode :: !SimplMode
+  , so_iterations      :: !Int
+  , so_mode            :: !SimplMode
   , so_pass_result_cfg :: !(Maybe LintPassResultConfig)
-  , so_rule_base :: !RuleBase
-  , so_top_env_cfg :: !TopEnvConfig
+  , so_hpt_rules       :: !RuleBase
+  , so_top_env_cfg     :: !TopEnvConfig
   }
 
 simplifyPgm :: Logger
@@ -148,11 +147,10 @@ simplifyPgm :: Logger
 simplifyPgm logger unit_env opts
             guts@(ModGuts { mg_module = this_mod
                           , mg_rdr_env = rdr_env
-                          , mg_deps = deps
-                          , mg_binds = binds, mg_rules = rules
+                          , mg_binds = binds, mg_rules = local_rules
                           , mg_fam_inst_env = fam_inst_env })
   = do { (termination_msg, it_count, counts_out, guts')
-            <- do_iteration 1 [] binds rules
+            <- do_iteration 1 [] binds local_rules
 
         ; when (logHasDumpFlag logger Opt_D_verbose_core2core
                 && logHasDumpFlag logger Opt_D_dump_simpl_stats) $
@@ -169,7 +167,6 @@ simplifyPgm logger unit_env opts
     dump_core_sizes = so_dump_core_sizes opts
     mode            = so_mode opts
     max_iterations  = so_iterations opts
-    hpt_rule_base   = so_rule_base opts
     top_env_cfg     = so_top_env_cfg opts
     print_unqual    = mkPrintUnqualified unit_env rdr_env
     active_rule     = activeRule mode
@@ -178,13 +175,18 @@ simplifyPgm logger unit_env opts
     -- the old bindings are retained until the end of all simplifier iterations
     !guts_no_binds = guts { mg_binds = [], mg_rules = [] }
 
+    hpt_rule_env :: RuleEnv
+    hpt_rule_env = mkRuleEnv guts emptyRuleBase (so_hpt_rules opts)
+                   -- emptyRuleBase: no EPS rules yet; we will update
+                   -- them on each iteration to pick up the most up to date set
+
     do_iteration :: Int -- Counts iterations
                  -> [SimplCount] -- Counts from earlier iterations, reversed
-                 -> CoreProgram  -- Bindings in
-                 -> [CoreRule]   -- and orphan rules
+                 -> CoreProgram  -- Bindings
+                 -> [CoreRule]   -- Local rules for imported Ids
                  -> IO (String, Int, SimplCount, ModGuts)
 
-    do_iteration iteration_no counts_so_far binds rules
+    do_iteration iteration_no counts_so_far binds local_rules
         -- iteration_no is the number of the iteration we are
         -- about to begin, with '1' for the first
       | iteration_no > max_iterations   -- Stop if we've run out of iterations
@@ -200,7 +202,7 @@ simplifyPgm logger unit_env opts
                 -- number of iterations we actually completed
         return ( "Simplifier baled out", iteration_no - 1
                , totalise counts_so_far
-               , guts_no_binds { mg_binds = binds, mg_rules = rules } )
+               , guts_no_binds { mg_binds = binds, mg_rules = local_rules } )
 
       -- Try and force thunks off the binds; significantly reduces
       -- space usage, especially with -O.  JRS, 000620.
@@ -209,8 +211,8 @@ simplifyPgm logger unit_env opts
       = do {
                 -- Occurrence analysis
            let { tagged_binds = {-# SCC "OccAnal" #-}
-                     occurAnalysePgm this_mod active_unf active_rule rules
-                                     binds
+                     occurAnalysePgm this_mod active_unf active_rule
+                                     local_rules binds
                } ;
            Logger.putDumpFileMaybe logger Opt_D_dump_occur_anal "Occurrence analysis"
                      FormatCore
@@ -221,24 +223,29 @@ simplifyPgm logger unit_env opts
                 -- poke on IdInfo thunks, which in turn brings in new rules
                 -- behind the scenes.  Otherwise there's a danger we'll simply
                 -- miss the rules for Ids hidden inside imported inlinings
-                -- Hence just before attempting to match rules we read on the EPS
-                -- value and then combine it when the existing rule base.
+                -- Hence just before attempting to match a rule we read the EPS
+                -- value (via read_rule_env) and then combine it with the existing rule base.
                 -- See `GHC.Core.Opt.Simplify.Monad.getSimplRules`.
-           eps <- ueEPS unit_env ;
-           let  { -- Forcing this value to avoid unnessecary allocations.
+          eps <- ueEPS unit_env ;
+           let  { -- base_rule_env contains
+                  --    (a) home package rules, fixed across all iterations
+                  --    (b) local rules (substituted) from `local_rules` arg to do_iteration
+                  -- Forcing base_rule_env to avoid unnecessary allocations.
                   -- Not doing so results in +25.6% allocations of LargeRecord.
-                ; !rule_base = extendRuleBaseList hpt_rule_base rules
-                ; vis_orphs = this_mod : dep_orphs deps
-                ; base_ruleenv = mkRuleEnv rule_base vis_orphs
+                ; !base_rule_env = updLocalRules hpt_rule_env local_rules
+
+                ; read_eps_rules :: IO PackageRuleBase
                 ; read_eps_rules = eps_rule_base <$> ueEPS unit_env
-                ; read_ruleenv = extendRuleEnv base_ruleenv <$> read_eps_rules
+
+                ; read_rule_env :: IO RuleEnv
+                ; read_rule_env = updExternalPackageRules base_rule_env <$> read_eps_rules
 
                 ; fam_envs = (eps_fam_inst_env eps, fam_inst_env)
                 ; simpl_env = mkSimplEnv mode fam_envs } ;
 
                 -- Simplify the program
            ((binds1, rules1), counts1) <-
-             initSmpl logger read_ruleenv top_env_cfg sz $
+             initSmpl logger read_rule_env top_env_cfg sz $
                do { (floats, env1) <- {-# SCC "SimplTopBinds" #-}
                                       simplTopBinds simpl_env tagged_binds
 
@@ -246,7 +253,7 @@ simplifyPgm logger unit_env opts
                       -- for imported Ids.  Eg  RULE map my_f = blah
                       -- If we have a substitution my_f :-> other_f, we'd better
                       -- apply it to the rule to, or it'll never match
-                  ; rules1 <- simplImpRules env1 rules
+                  ; rules1 <- simplImpRules env1 local_rules
 
                   ; return (getTopFloatBinds floats, rules1) } ;
 


=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -1497,9 +1497,10 @@ rebuild env expr cont
       ApplyToTy  { sc_arg_ty = ty, sc_cont = cont}
         -> rebuild env (App expr (Type ty)) cont
 
-      ApplyToVal { sc_arg = arg, sc_env = se, sc_dup = dup_flag, sc_cont = cont}
+      ApplyToVal { sc_arg = arg, sc_env = se, sc_dup = dup_flag
+                 , sc_cont = cont, sc_hole_ty = fun_ty }
         -- See Note [Avoid redundant simplification]
-        -> do { (_, _, arg') <- simplArg env dup_flag se arg
+        -> do { (_, _, arg') <- simplArg env dup_flag fun_ty se arg
               ; rebuild env (App expr arg') cont }
 
 completeBindX :: SimplEnv
@@ -1598,7 +1599,8 @@ simplCast env body co0 cont0
         --         co1 :: t1 ~ s1
         --         co2 :: s2 ~ t2
         addCoerce co cont@(ApplyToVal { sc_arg = arg, sc_env = arg_se
-                                      , sc_dup = dup, sc_cont = tail })
+                                      , sc_dup = dup, sc_cont = tail
+                                      , sc_hole_ty = fun_ty })
           | Just (m_co1, m_co2) <- pushCoValArg co
           , fixed_rep m_co1
           = {-#SCC "addCoerce-pushCoValArg" #-}
@@ -1610,7 +1612,7 @@ simplCast env body co0 cont0
                       -- See Note [Avoiding exponential behaviour]
 
                    MCo co1 ->
-            do { (dup', arg_se', arg') <- simplArg env dup arg_se arg
+            do { (dup', arg_se', arg') <- simplArg env dup fun_ty arg_se arg
                     -- When we build the ApplyTo we can't mix the OutCoercion
                     -- 'co' with the InExpr 'arg', so we simplify
                     -- to make it all consistent.  It's a bit messy.
@@ -1636,14 +1638,16 @@ simplCast env body co0 cont0
           -- See Note [Representation polymorphism invariants] in GHC.Core
           -- test: typecheck/should_run/EtaExpandLevPoly
 
-simplArg :: SimplEnv -> DupFlag -> StaticEnv -> CoreExpr
+simplArg :: SimplEnv -> DupFlag
+         -> OutType                 -- Type of the function applied to this arg
+         -> StaticEnv -> CoreExpr   -- Expression with its static envt
          -> SimplM (DupFlag, StaticEnv, OutExpr)
-simplArg env dup_flag arg_env arg
+simplArg env dup_flag fun_ty arg_env arg
   | isSimplified dup_flag
   = return (dup_flag, arg_env, arg)
   | otherwise
   = do { let arg_env' = arg_env `setInScopeFromE` env
-       ; arg' <- simplExpr arg_env'  arg
+       ; arg' <- simplExprC arg_env' arg (mkBoringStop (funArgTy fun_ty))
        ; return (Simplified, zapSubstEnv arg_env', arg') }
          -- Return a StaticEnv that includes the in-scope set from 'env',
          -- because arg' may well mention those variables (#20639)
@@ -2029,6 +2033,21 @@ zap the SubstEnv.  This is VITAL.  Consider
 
 We'll clone the inner \x, adding x->x' in the id_subst Then when we
 inline y, we must *not* replace x by x' in the inlined copy!!
+
+Note [Fast path for data constructors]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+For applications of a data constructor worker, the full glory of
+rebuildCall is a waste of effort;
+* They never inline, obviously
+* They have no rewrite rules
+* They are not strict (see Note [Data-con worker strictness]
+  in GHC.Core.DataCon)
+So it's fine to zoom straight to `rebuild` which just rebuilds the
+call in a very straightforward way.
+
+Some programs have a /lot/ of data constructors in the source program
+(compiler/perf/T9961 is an example), so this fast path can be very
+valuable.
 -}
 
 simplVar :: SimplEnv -> InVar -> SimplM OutExpr
@@ -2046,6 +2065,9 @@ simplVar env var
 
 simplIdF :: SimplEnv -> InId -> SimplCont -> SimplM (SimplFloats, OutExpr)
 simplIdF env var cont
+  | isDataConWorkId var         -- See Note [Fast path for data constructors]
+  = rebuild env (Var var) cont
+  | otherwise
   = case substId env var of
       ContEx tvs cvs ids e -> simplExprF env' e cont
         -- Don't trimJoinCont; haven't already simplified e,
@@ -2315,6 +2337,8 @@ field of the ArgInfo record is the state of a little state-machine:
   If we inline `f` before simplifying `BIG` well use preInlineUnconditionally,
   and we'll simplify BIG once, at x's occurrence, rather than twice.
 
+* GHC.Core.Opt.Simplify.Utils. mkRewriteCall: if there are no rules, and no
+  unfolding, we can skip both TryRules and TryInlining, which saves work.
 
 Note [Avoid redundant simplification]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -3645,7 +3669,7 @@ mkDupableContWithDmds env dmds
     do  { let (dmd:cont_dmds) = dmds   -- Never fails
         ; (floats1, cont') <- mkDupableContWithDmds env cont_dmds cont
         ; let env' = env `setInScopeFromF` floats1
-        ; (_, se', arg') <- simplArg env' dup se arg
+        ; (_, se', arg') <- simplArg env' dup hole_ty se arg
         ; (let_floats2, arg'') <- makeTrivial env NotTopLevel dmd (fsLit "karg") arg'
         ; let all_floats = floats1 `addLetFloats` let_floats2
         ; return ( all_floats


=====================================
compiler/GHC/Core/Opt/Simplify/Monad.hs
=====================================
@@ -27,8 +27,8 @@ import GHC.Types.Name      ( mkSystemVarName )
 import GHC.Types.Id        ( Id, mkSysLocalOrCoVarM )
 import GHC.Types.Id.Info   ( IdDetails(..), vanillaIdInfo, setArityInfo )
 import GHC.Core.Type       ( Type, Mult )
-import GHC.Core            ( RuleEnv(..) )
 import GHC.Core.Opt.Stats
+import GHC.Core.Rules
 import GHC.Core.Utils      ( mkLamTypes )
 import GHC.Types.Unique.Supply
 import GHC.Driver.Flags


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -53,7 +53,7 @@ import GHC.Core.Ppr
 import GHC.Core.TyCo.Ppr ( pprParendType )
 import GHC.Core.FVs
 import GHC.Core.Utils
-import GHC.Core.Rules( getRules )
+import GHC.Core.Rules( RuleEnv, getRules )
 import GHC.Core.Opt.Arity
 import GHC.Core.Unfold
 import GHC.Core.Unfold.Make
@@ -425,12 +425,22 @@ decArgCount :: RewriteCall -> RewriteCall
 decArgCount (TryRules n rules) = TryRules (n-1) rules
 decArgCount rew                = rew
 
-mkTryRules :: [CoreRule] -> RewriteCall
+mkRewriteCall :: Id -> RuleEnv -> RewriteCall
 -- See Note [Rewrite rules and inlining] in GHC.Core.Opt.Simplify.Iteration
-mkTryRules [] = TryInlining
-mkTryRules rs = TryRules n_required rs
+-- We try to skip any unnecessary stages:
+--    No rules     => skip TryRules
+--    No unfolding => skip TryInlining
+-- This skipping is "just" for efficiency.  But rebuildCall is
+-- quite a heavy hammer, so skipping stages is a good plan.
+-- And it's extremely simple to do.
+mkRewriteCall fun rule_env
+  | not (null rules) = TryRules n_required rules
+  | canUnfold unf    = TryInlining
+  | otherwise        = TryNothing
   where
-    n_required = maximum (map ruleArity rs)
+    n_required = maximum (map ruleArity rules)
+    rules = getRules rule_env fun
+    unf   = idUnfolding fun
 
 {-
 ************************************************************************
@@ -604,21 +614,23 @@ mkArgInfo :: SimplEnv -> RuleEnv -> Id -> SimplCont -> ArgInfo
 mkArgInfo env rule_base fun cont
   | n_val_args < idArity fun            -- Note [Unsaturated functions]
   = ArgInfo { ai_fun = fun, ai_args = []
-            , ai_rewrite = fun_rules
+            , ai_rewrite = fun_rewrite
             , ai_encl = False
             , ai_dmds = vanilla_dmds
             , ai_discs = vanilla_discounts }
   | otherwise
   = ArgInfo { ai_fun   = fun
             , ai_args  = []
-            , ai_rewrite = fun_rules
-            , ai_encl  = notNull rules || contHasRules cont
+            , ai_rewrite = fun_rewrite
+            , ai_encl  = fun_has_rules || contHasRules cont
             , ai_dmds  = add_type_strictness (idType fun) arg_dmds
             , ai_discs = arg_discounts }
   where
-    rules      = getRules rule_base fun
-    fun_rules  = mkTryRules rules
-    n_val_args = countValArgs cont
+    n_val_args    = countValArgs cont
+    fun_rewrite   = mkRewriteCall fun rule_base
+    fun_has_rules = case fun_rewrite of
+                      TryRules {} -> True
+                      _           -> False
 
     vanilla_discounts, arg_discounts :: [Int]
     vanilla_discounts = repeat 0


=====================================
compiler/GHC/Core/Opt/Specialise.hs
=====================================
@@ -17,6 +17,7 @@ import GHC.Driver.Config.Core.Rules ( initRuleOpts )
 
 import GHC.Core.Type  hiding( substTy, substCo, extendTvSubst, zapSubst )
 import GHC.Core.Multiplicity
+import GHC.Core.SimpleOpt( defaultSimpleOpts, simpleOptExprWith )
 import GHC.Core.Predicate
 import GHC.Core.Coercion( Coercion )
 import GHC.Core.Opt.Monad
@@ -636,9 +637,11 @@ Hence, the invariant is this:
 -- | Specialise calls to type-class overloaded functions occurring in a program.
 specProgram :: ModGuts -> CoreM ModGuts
 specProgram guts@(ModGuts { mg_module = this_mod
-                          , mg_rules = local_rules
-                          , mg_binds = binds })
-  = do { dflags <- getDynFlags
+                          , mg_rules  = local_rules
+                          , mg_binds  = binds })
+  = do { dflags   <- getDynFlags
+       ; rule_env <- initRuleEnv guts
+                     -- See Note [Fire rules in the specialiser]
 
               -- We need to start with a Subst that knows all the things
               -- that are in scope, so that the substitution engine doesn't
@@ -650,6 +653,7 @@ specProgram guts@(ModGuts { mg_module = this_mod
                                       --    mkInScopeSetList $
                                       --  bindersOfBinds binds
                           , se_module = this_mod
+                          , se_rules  = rule_env
                           , se_dflags = dflags }
 
              go []           = return ([], emptyUDs)
@@ -660,7 +664,7 @@ specProgram guts@(ModGuts { mg_module = this_mod
              -- Specialise the bindings of this module
        ; (binds', uds) <- runSpecM (go binds)
 
-       ; (spec_rules, spec_binds) <- specImports top_env local_rules uds
+       ; (spec_rules, spec_binds) <- specImports top_env uds
 
        ; return (guts { mg_binds = spec_binds ++ binds'
                       , mg_rules = spec_rules ++ local_rules }) }
@@ -725,21 +729,15 @@ specialisation (see canSpecImport):
 -}
 
 specImports :: SpecEnv
-            -> [CoreRule]
             -> UsageDetails
             -> CoreM ([CoreRule], [CoreBind])
-specImports top_env local_rules
-            (MkUD { ud_binds = dict_binds, ud_calls = calls })
+specImports top_env (MkUD { ud_binds = dict_binds, ud_calls = calls })
   | not $ gopt Opt_CrossModuleSpecialise (se_dflags top_env)
     -- See Note [Disabling cross-module specialisation]
   = return ([], wrapDictBinds dict_binds [])
 
   | otherwise
-  = do { hpt_rules <- getRuleBase
-       ; let rule_base = extendRuleBaseList hpt_rules local_rules
-
-       ; (spec_rules, spec_binds) <- spec_imports top_env [] rule_base
-                                                  dict_binds calls
+  = do { (_env, spec_rules, spec_binds) <- spec_imports top_env [] dict_binds calls
 
              -- Don't forget to wrap the specialized bindings with
              -- bindings for the needed dictionaries.
@@ -757,89 +755,91 @@ specImports top_env local_rules
 spec_imports :: SpecEnv          -- Passed in so that all top-level Ids are in scope
              -> [Id]             -- Stack of imported functions being specialised
                                  -- See Note [specImport call stack]
-             -> RuleBase         -- Rules from this module and the home package
-                                 -- (but not external packages, which can change)
              -> FloatedDictBinds -- Dict bindings, used /only/ for filterCalls
                                  -- See Note [Avoiding loops in specImports]
              -> CallDetails      -- Calls for imported things
-             -> CoreM ( [CoreRule]   -- New rules
+             -> CoreM ( SpecEnv      -- Env contains the new rules
+                      , [CoreRule]   -- New rules
                       , [CoreBind] ) -- Specialised bindings
-spec_imports top_env callers rule_base dict_binds calls
+spec_imports env callers dict_binds calls
   = do { let import_calls = dVarEnvElts calls
 --       ; debugTraceMsg (text "specImports {" <+>
 --                         vcat [ text "calls:" <+> ppr import_calls
 --                              , text "dict_binds:" <+> ppr dict_binds ])
-       ; (rules, spec_binds) <- go rule_base import_calls
+       ; (env, rules, spec_binds) <- go env import_calls
 --       ; debugTraceMsg (text "End specImports }" <+> ppr import_calls)
 
-       ; return (rules, spec_binds) }
+       ; return (env, rules, spec_binds) }
   where
-    go :: RuleBase -> [CallInfoSet] -> CoreM ([CoreRule], [CoreBind])
-    go _ [] = return ([], [])
-    go rb (cis : other_calls)
+    go :: SpecEnv -> [CallInfoSet] -> CoreM (SpecEnv, [CoreRule], [CoreBind])
+    go env [] = return (env, [], [])
+    go env (cis : other_calls)
       = do { -- debugTraceMsg (text "specImport {" <+> ppr cis)
-           ; (rules1, spec_binds1) <- spec_import top_env callers rb dict_binds cis
+           ; (env, rules1, spec_binds1) <- spec_import env callers dict_binds cis
            ; -- debugTraceMsg (text "specImport }" <+> ppr cis)
 
-           ; (rules2, spec_binds2) <- go (extendRuleBaseList rb rules1) other_calls
-           ; return (rules1 ++ rules2, spec_binds1 ++ spec_binds2) }
+           ; (env, rules2, spec_binds2) <- go env other_calls
+           ; return (env, rules1 ++ rules2, spec_binds1 ++ spec_binds2) }
 
 spec_import :: SpecEnv               -- Passed in so that all top-level Ids are in scope
             -> [Id]                  -- Stack of imported functions being specialised
                                      -- See Note [specImport call stack]
-            -> RuleBase              -- Rules from this module
             -> FloatedDictBinds      -- Dict bindings, used /only/ for filterCalls
                                      -- See Note [Avoiding loops in specImports]
             -> CallInfoSet           -- Imported function and calls for it
-            -> CoreM ( [CoreRule]    -- New rules
+            -> CoreM ( SpecEnv
+                     , [CoreRule]    -- New rules
                      , [CoreBind] )  -- Specialised bindings
-spec_import top_env callers rb dict_binds cis@(CIS fn _)
+spec_import env callers dict_binds cis@(CIS fn _)
   | isIn "specImport" fn callers
-  = return ([], [])     -- No warning.  This actually happens all the time
-                        -- when specialising a recursive function, because
-                        -- the RHS of the specialised function contains a recursive
-                        -- call to the original function
+  = return (env, [], [])  -- No warning.  This actually happens all the time
+                          -- when specialising a recursive function, because
+                          -- the RHS of the specialised function contains a recursive
+                          -- call to the original function
 
   | null good_calls
-  = return ([], [])
+  = return (env, [], [])
 
   | Just rhs <- canSpecImport dflags fn
   = do {     -- Get rules from the external package state
              -- We keep doing this in case we "page-fault in"
              -- more rules as we go along
-       ; external_rule_base <- getExternalRuleBase
-       ; vis_orphs <- getVisibleOrphanMods
-       ; let rules_for_fn = getRules (RuleEnv [rb, external_rule_base] vis_orphs) fn
+       ; eps_rules <- getExternalRuleBase
+       ; let rule_env = se_rules env `updExternalPackageRules` eps_rules
 
-       ; -- debugTraceMsg (text "specImport1" <+> vcat [ppr fn, ppr good_calls, ppr rhs])
+--       ; debugTraceMsg (text "specImport1" <+> vcat [ppr fn, ppr good_calls
+--                                                    , ppr (getRules rule_env fn), ppr rhs])
        ; (rules1, spec_pairs, MkUD { ud_binds = dict_binds1, ud_calls = new_calls })
-            <- runSpecM $ specCalls True top_env dict_binds
-                             rules_for_fn good_calls fn rhs
+            <- runSpecM $ specCalls True env dict_binds
+                                    (getRules rule_env fn) good_calls fn rhs
 
        ; let spec_binds1 = [NonRec b r | (b,r) <- spec_pairs]
              -- After the rules kick in we may get recursion, but
              -- we rely on a global GlomBinds to sort that out later
              -- See Note [Glom the bindings if imported functions are specialised]
 
+             new_subst = se_subst env `Core.extendSubstInScopeList` map fst spec_pairs
+             new_env   = env { se_rules = rule_env `addLocalRules` rules1
+                             , se_subst = new_subst }
+
               -- Now specialise any cascaded calls
-       ; -- debugTraceMsg (text "specImport 2" <+> (ppr fn $$ ppr rules1 $$ ppr spec_binds1))
-       ; (rules2, spec_binds2) <- spec_imports top_env
-                                               (fn:callers)
-                                               (extendRuleBaseList rb rules1)
-                                               (dict_binds `thenFDBs` dict_binds1)
-                                               new_calls
+--       ; debugTraceMsg (text "specImport 2" <+> (ppr fn $$ ppr rules1 $$ ppr spec_binds1))
+       ; (env, rules2, spec_binds2)
+            <- spec_imports new_env (fn:callers)
+                                    (dict_binds `thenFDBs` dict_binds1)
+                                    new_calls
 
        ; let final_binds = wrapDictBinds dict_binds1 $
                            spec_binds2 ++ spec_binds1
 
-       ; return (rules2 ++ rules1, final_binds) }
+       ; return (env, rules2 ++ rules1, final_binds) }
 
   | otherwise
   = do { tryWarnMissingSpecs dflags callers fn good_calls
-       ; return ([], [])}
+       ; return (env, [], [])}
 
   where
-    dflags = se_dflags top_env
+    dflags = se_dflags env
     good_calls = filterCalls cis dict_binds
        -- SUPER IMPORTANT!  Drop calls that (directly or indirectly) refer to fn
        -- See Note [Avoiding loops in specImports]
@@ -1134,6 +1134,7 @@ data SpecEnv
              --    the RHS of specialised bindings (no type-let!)
 
        , se_module :: Module
+       , se_rules  :: RuleEnv  -- From the home package and this module
        , se_dflags :: DynFlags
      }
 
@@ -1172,8 +1173,8 @@ specExpr env expr@(App {})
        ; (args_out, uds_args) <- mapAndCombineSM (specExpr env) args_in
        ; let env_args = env `bringFloatedDictsIntoScope` ud_binds uds_args
                 -- Some dicts may have floated out of args_in;
-                -- they should be in scope for rewriteClassOps (#21689)
-             (fun_in', args_out') = rewriteClassOps env_args fun_in args_out
+                -- they should be in scope for fireRewriteRules (#21689)
+             (fun_in', args_out') = fireRewriteRules env_args fun_in args_out
        ; (fun_out', uds_fun) <- specExpr env fun_in'
        ; let uds_call = mkCallUDs env fun_out' args_out'
        ; return (fun_out' `mkApps` args_out', uds_fun `thenUDs` uds_call `thenUDs` uds_args) }
@@ -1208,17 +1209,19 @@ specExpr env (Let bind body)
        ; return (foldr Let body' binds', uds) }
 
 -- See Note [Specialisation modulo dictionary selectors]
--- and Note [ClassOp/DFun selection]
-rewriteClassOps :: SpecEnv -> InExpr -> [OutExpr] -> (InExpr, [OutExpr])
-rewriteClassOps env (Var f) args
-  | isClassOpId f -- If we see `op_sel $fCInt`, we rewrite to `$copInt`
-  , Just (rule, expr) <- -- pprTrace "rewriteClassOps" (ppr f $$ ppr args $$ ppr (se_subst env)) $
-                         specLookupRule env f args (idCoreRules f)
-  , let rest_args = drop (ruleArity rule) args -- See Note [Extra args in the target]
---  , pprTrace "class op rewritten" (ppr f <+> ppr args $$ ppr expr <+> ppr rest_args) True
-  , (fun, args) <- collectArgs expr
-  = rewriteClassOps env fun (args++rest_args)
-rewriteClassOps _ fun args = (fun, args)
+--     Note [ClassOp/DFun selection]
+--     Note [Fire rules in the specialiser]
+fireRewriteRules :: SpecEnv -> InExpr -> [OutExpr] -> (InExpr, [OutExpr])
+fireRewriteRules env (Var f) args
+  | Just (rule, expr) <- specLookupRule env f args InitialPhase (getRules (se_rules env) f)
+  , let rest_args    = drop (ruleArity rule) args -- See Note [Extra args in the target]
+        zapped_subst = Core.zapSubst (se_subst env)
+        expr'        = simpleOptExprWith defaultSimpleOpts zapped_subst expr
+                       -- simplOptExpr needed because lookupRule returns
+                       --   (\x y. rhs) arg1 arg2
+  , (fun, args) <- collectArgs expr'
+  = fireRewriteRules env fun (args++rest_args)
+fireRewriteRules _ fun args = (fun, args)
 
 --------------
 specLam :: SpecEnv -> [OutBndr] -> InExpr -> SpecM (OutExpr, UsageDetails)
@@ -1324,7 +1327,67 @@ specCase env scrut case_bndr alts
         where
           (env_rhs, args') = substBndrs env_alt args
 
-{-
+{- Note [Fire rules in the specialiser]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider this (#21851)
+
+    module A where
+      f :: Num b => b -> (b, b)
+      f x = (x + 1, snd (f x))
+      {-# SPECIALIZE f :: Int -> (Int, Int) #-}
+
+    module B (g') where
+      import A
+
+      g :: Num a => a -> a
+      g x = fst (f x)
+      {-# NOINLINE[99] g #-}
+
+      h :: Int -> Int
+      h = g
+
+Note that `f` has the CPR property, and so will worker/wrapper.
+
+The call to `g` in `h` will make us specialise `g @Int`. And the specialised
+version of `g` will contain the call `f @Int`; but in the subsequent run of
+the Simplifier, there will be a competition between:
+* The user-supplied SPECIALISE rule for `f`
+* The inlining of the wrapper for `f`
+In fact, the latter wins -- see Note [Rewrite rules and inlining] in
+GHC.Core.Opt.Simplify.Iteration.  However, it a bit fragile.
+
+Moreover consider (test T21851_2):
+
+    module A
+      f :: (Ord a, Show b) => a -> b -> blah
+      {-# RULE forall b. f @Int @b = wombat #-}
+
+      wombat :: Show b => Int -> b -> blah
+      wombat = blah
+
+    module B
+      import A
+      g :: forall a. Ord a => blah
+      g @a = ...g...f @a @Char....
+
+      h = ....g @Int....
+
+Now, in module B, GHC will specialise `g @Int`, which will lead to a
+call `f @Int @Char`.  If we immediately (in the specialiser) rewrite
+that to `womabat @Char`, we have a chance to specialise `wombat`.
+
+Conclusion: it's treat if the Specialiser fires RULEs itself.
+It's not hard to achieve: see `fireRewriteRules`. The only tricky bit is
+making sure that we have a reasonably up to date EPS rule base. Currently
+we load it up just once, in `initRuleEnv`, called at the beginning of
+`specProgram`.
+
+NB: you might wonder if running rules in the specialiser (this Note)
+renders Note [Rewrite rules and inlining] in the Simplifier redundant.
+That is, if we run rules in the specialiser, does it matter if we make
+rules "win" over inlining in the Simplifier?  Yes, it does!  See the
+discussion in #21851.
+
 Note [Floating dictionaries out of cases]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider
@@ -1415,13 +1478,12 @@ specBind top_lvl env (NonRec fn rhs) do_body
 
              final_binds :: [DictBind]
              -- See Note [From non-recursive to recursive]
-             final_binds
-               | not (isNilOL dump_dbs)
-               , not (null spec_defns)
-               = [recWithDumpedDicts pairs dump_dbs]
-               | otherwise
-               = [mkDB $ NonRec b r | (b,r) <- pairs]
-                 ++ fromOL dump_dbs
+             final_binds | not (isNilOL dump_dbs)
+                         , not (null spec_defns)
+                         = [recWithDumpedDicts pairs dump_dbs]
+                         | otherwise
+                         = [mkDB $ NonRec b r | (b,r) <- pairs]
+                           ++ fromOL dump_dbs
 
        ; if float_all then
              -- Rather than discard the calls mentioning the bound variables
@@ -1553,8 +1615,10 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
     foldlM spec_call ([], [], emptyUDs) calls_for_me
 
   | otherwise   -- No calls or RHS doesn't fit our preconceptions
-  = warnPprTrace (not (exprIsTrivial rhs) && notNull calls_for_me)
+  = warnPprTrace (not (exprIsTrivial rhs) && notNull calls_for_me && not (isClassOpId fn))
           "Missed specialisation opportunity for" (ppr fn $$ trace_doc) $
+          -- isClassOpId: class-op Ids never inline; we specialise them
+          -- through fireRewriteRules. So don't complain about missed opportunities
           -- Note [Specialisation shape]
     -- pprTrace "specCalls: none" (ppr fn <+> ppr calls_for_me) $
     return ([], [], emptyUDs)
@@ -1581,9 +1645,13 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
 
     already_covered :: SpecEnv -> [CoreRule] -> [CoreExpr] -> Bool
     already_covered env new_rules args      -- Note [Specialisations already covered]
-       = isJust (specLookupRule env fn args (new_rules ++ existing_rules))
-         -- NB: we look both in the new_rules (generated by this invocation
-         --     of specCalls), and in existing_rules (passed in to specCalls)
+       = isJust (specLookupRule env fn args (beginPhase inl_act)
+                                (new_rules ++ existing_rules))
+         -- Rules: we look both in the new_rules (generated by this invocation
+         --   of specCalls), and in existing_rules (passed in to specCalls)
+         -- inl_act: is the activation we are going to put in the new SPEC
+         --   rule; so we want to see if it is covered by another rule with
+         --   that same activation.
 
     ----------------------------------------------------------
         -- Specialise to one particular call pattern
@@ -1708,13 +1776,16 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
 
 -- Convenience function for invoking lookupRule from Specialise
 -- The SpecEnv's InScopeSet should include all the Vars in the [CoreExpr]
-specLookupRule :: SpecEnv -> Id -> [CoreExpr] -> [CoreRule] -> Maybe (CoreRule, CoreExpr)
-specLookupRule env fn args rules
-  = lookupRule ropts (in_scope, realIdUnfolding) (const True) fn args rules
+specLookupRule :: SpecEnv -> Id -> [CoreExpr]
+               -> CompilerPhase  -- Look up rules as if we were in this phase
+               -> [CoreRule] -> Maybe (CoreRule, CoreExpr)
+specLookupRule env fn args phase rules
+  = lookupRule ropts (in_scope, realIdUnfolding) is_active fn args rules
   where
-    dflags   = se_dflags env
-    in_scope = getSubstInScope (se_subst env)
-    ropts    = initRuleOpts dflags
+    dflags    = se_dflags env
+    in_scope  = getSubstInScope (se_subst env)
+    ropts     = initRuleOpts dflags
+    is_active = isActive phase
 
 {- Note [Specialising DFuns]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1913,10 +1984,10 @@ We want to specialise this! How? By doing the method-selection rewrite in
 the Specialiser. Hence
 
 1. In the App case of 'specExpr', try to apply the ClassOp/DFun rule on the
-   head of the application, repeatedly, via 'rewriteClassOps'.
+   head of the application, repeatedly, via 'fireRewriteRules'.
 2. Attach an unfolding to freshly-bound dictionary ids such as `$dC` and
    `$dShow` in `bindAuxiliaryDict`, so that we can exploit the unfolding
-   in 'rewriteClassOps' to do the ClassOp/DFun rewrite.
+   in 'fireRewriteRules' to do the ClassOp/DFun rewrite.
 
 NB: Without (2), (1) would be pointless, because 'lookupRule' wouldn't be able
 to look into the RHS of `$dC` to see the DFun.


=====================================
compiler/GHC/Core/Opt/WorkWrap/Utils.hs
=====================================
@@ -15,7 +15,7 @@ module GHC.Core.Opt.WorkWrap.Utils
    , findTypeShape, IsRecDataConResult(..), isRecDataCon
    , mkAbsentFiller
    , isWorkerSmallEnough, dubiousDataConInstArgTys
-   , isGoodWorker, badWorker , goodWorker
+   , boringSplit , usefulSplit
    )
 where
 
@@ -571,23 +571,24 @@ data UnboxingDecision unboxing_info
                             -- returned product was constructed, so unbox it.
   | DropAbsent              -- ^ The argument/field was absent. Drop it.
 
--- Do we want to create workers just for unlifting?
-wwForUnlifting :: WwOpts -> Bool
-wwForUnlifting !opts
+-- | Do we want to create workers just for unlifting?
+wwUseForUnlifting :: WwOpts -> WwUse
+wwUseForUnlifting !opts
     -- Always unlift if possible
-    | wo_unlift_strict opts = goodWorker
+    | wo_unlift_strict opts = usefulSplit
     -- Don't unlift  it would cause additional W/W splits.
-    | otherwise = badWorker
+    | otherwise             = boringSplit
 
-badWorker :: Bool
-badWorker = False
+-- | Is the worker/wrapper split profitable?
+type WwUse = Bool
 
-goodWorker :: Bool
-goodWorker = True
-
-isGoodWorker :: Bool -> Bool
-isGoodWorker = id
+-- | WW split not profitable
+boringSplit :: WwUse
+boringSplit = False
 
+-- | WW split profitable
+usefulSplit :: WwUse
+usefulSplit = True
 
 -- | Unwraps the 'Boxity' decision encoded in the given 'SubDemand' and returns
 -- a 'DataConPatContext' as well the nested demands on fields of the 'DataCon'
@@ -826,7 +827,7 @@ Is this a win?  Not always:
 So there is a flag, `-fworker-wrapper-cbv`, to control whether we do
 w/w on strict arguments (internally `Opt_WorkerWrapperUnlift`).  The
 flag is off by default.  The choice is made in
-GHC.Core.Opt.WorkWrape.Utils.wwForUnlifting
+GHC.Core.Opt.WorkWrape.Utils.wwUseForUnlifting
 
 See also `Note [WW for calling convention]` in GHC.Core.Opt.WorkWrap.Utils
 -}
@@ -843,7 +844,7 @@ mkWWstr :: WwOpts
         -> [Var]                         -- Wrapper args; have their demand info on them
                                          --  *Includes type variables*
         -> [StrictnessMark]              -- Strictness-mark info for arguments
-        -> UniqSM (Bool,                 -- Will this result in a useful worker
+        -> UniqSM (WwUse,                -- Will this result in a useful worker
                    [(Var,StrictnessMark)],      -- Worker args/their call-by-value semantics.
                    CoreExpr -> CoreExpr, -- Wrapper body, lacking the worker call
                                          -- and without its lambdas
@@ -855,7 +856,7 @@ mkWWstr opts args str_marks
   = -- pprTrace "mkWWstr" (ppr args) $
     go args str_marks
   where
-    go [] _ = return (badWorker, [], nop_fn, [])
+    go [] _ = return (boringSplit, [], nop_fn, [])
     go (arg : args) (str:strs)
       = do { (useful1, args1, wrap_fn1, wrap_arg)  <- mkWWstr_one opts arg str
            ; (useful2, args2, wrap_fn2, wrap_args) <- go args strs
@@ -875,7 +876,7 @@ mkWWstr opts args str_marks
 mkWWstr_one :: WwOpts
             -> Var
             -> StrictnessMark
-            -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+            -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
 mkWWstr_one opts arg str_mark =
   -- pprTrace "mkWWstr_one" (ppr arg <+> (if isId arg then ppr arg_ty  $$ ppr arg_dmd else text "type arg")) $
   case canUnboxArg fam_envs arg_ty arg_dmd of
@@ -887,7 +888,7 @@ mkWWstr_one opts arg str_mark =
          -- We can't always handle absence for arbitrary
          -- unlifted types, so we need to choose just the cases we can
          -- (that's what mkAbsentFiller does)
-      -> return (goodWorker, [], nop_fn, absent_filler)
+      -> return (usefulSplit, [], nop_fn, absent_filler)
       | otherwise -> do_nothing
 
     DoUnbox dcpc -> -- pprTrace "mkWWstr_one:1" (ppr (dcpc_dc dcpc) <+> ppr (dcpc_tc_args dcpc) $$ ppr (dcpc_args dcpc)) $
@@ -895,12 +896,12 @@ mkWWstr_one opts arg str_mark =
 
     DontUnbox
       | isStrictDmd arg_dmd || isMarkedStrict str_mark
-      , wwForUnlifting opts  -- See Note [CBV Function Ids]
+      , wwUseForUnlifting opts  -- See Note [CBV Function Ids]
       , not (isFunTy arg_ty)
       , not (isUnliftedType arg_ty) -- Already unlifted!
         -- NB: function arguments have a fixed RuntimeRep,
         -- so it's OK to call isUnliftedType here
-      -> return  (goodWorker, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
+      -> return  (usefulSplit, [(arg, MarkedStrict)], nop_fn, varToCoreExpr arg )
 
       | otherwise -> do_nothing
 
@@ -910,11 +911,11 @@ mkWWstr_one opts arg str_mark =
     arg_dmd    = idDemandInfo arg
     arg_str    | isTyVar arg = NotMarkedStrict -- Type args don't get strictness marks
                | otherwise   = str_mark
-    do_nothing = return (badWorker, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
+    do_nothing = return (boringSplit, [(arg,arg_str)], nop_fn, varToCoreExpr arg)
 
 unbox_one_arg :: WwOpts
-              -> Var-> DataConPatContext Demand
-              -> UniqSM (Bool, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
+              -> Var -> DataConPatContext Demand
+              -> UniqSM (WwUse, [(Var,StrictnessMark)], CoreExpr -> CoreExpr, CoreExpr)
 unbox_one_arg opts arg_var
               DataConPatContext { dcpc_dc = dc, dcpc_tc_args = tc_args
                                 , dcpc_co = co, dcpc_args = ds }
@@ -939,13 +940,14 @@ unbox_one_arg opts arg_var
              -- See Note [Call-by-value for worker args]
              all_str_marks = (map (const NotMarkedStrict) ex_tvs') ++ con_str_marks
 
-       ; (_sub_args_quality, worker_args, wrap_fn, wrap_args)
+       ; (nested_useful, worker_args, wrap_fn, wrap_args)
              <- mkWWstr opts (ex_tvs' ++ arg_ids') all_str_marks
 
        ; let wrap_arg = mkConApp dc (map Type tc_args ++ wrap_args) `mkCast` mkSymCo co
-
-       ; return (goodWorker, worker_args, unbox_fn . wrap_fn, wrap_arg) }
-                          -- Don't pass the arg, rebox instead
+       -- See Note [Unboxing through unboxed tuples]
+       ; return $ if isUnboxedTupleDataCon dc && not nested_useful
+                     then (boringSplit, [(arg_var,NotMarkedStrict)], nop_fn, varToCoreExpr arg_var)
+                     else (usefulSplit, worker_args, unbox_fn . wrap_fn, wrap_arg) }
 
 -- | Tries to find a suitable absent filler to bind the given absent identifier
 -- to. See Note [Absent fillers].
@@ -1195,6 +1197,26 @@ fragile
    because `MkT` is strict in its Int# argument, so we get an absentError
    exception when we shouldn't.  Very annoying!
 
+Note [Unboxing through unboxed tuples]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We should not to a worker/wrapper split just for unboxing the components of
+an unboxed tuple (in the result *or* argument, #22388). Consider
+  boring_res x y = (# y, x #)
+It's entirely pointless to split for the constructed unboxed pair to
+  $wboring_res x y = (# y, x #)
+  boring_res = case $wboring_res x y of (# a, b #) -> (# a, b #)
+`boring_res` will immediately simplify to an alias for `$wboring_res`!
+
+Similarly, the unboxed tuple might occur in argument position
+  boring_arg (# x, y, z #) = (# z, x, y #)
+It's entirely pointless to "unbox" the triple
+  $wboring_arg x y z = (# z, x, y #)
+  boring_arg (# x, y, z #) = $wboring_arg x y z
+because after unarisation, `boring_arg` is just an alias for `$wboring_arg`.
+
+Conclusion: Only consider unboxing an unboxed tuple useful when we will
+also unbox its components. That is governed by the `usefulSplit` mechanism.
+
 ************************************************************************
 *                                                                      *
          Type scrutiny that is specific to demand analysis
@@ -1376,12 +1398,12 @@ mkWWcpr_entry
   :: WwOpts
   -> Type                              -- function body
   -> Cpr                               -- CPR analysis results
-  -> UniqSM (Bool,            -- Is w/w'ing useful?
+  -> UniqSM (WwUse,                    -- Is w/w'ing useful?
              CoreExpr -> CoreExpr,     -- New wrapper. 'nop_fn' if not useful
              CoreExpr -> CoreExpr)     -- New worker.  'nop_fn' if not useful
 -- ^ Entrypoint to CPR W/W. See Note [Worker/wrapper for CPR] for an overview.
 mkWWcpr_entry opts body_ty body_cpr
-  | not (wo_cpr_anal opts) = return (badWorker, nop_fn, nop_fn)
+  | not (wo_cpr_anal opts) = return (boringSplit, nop_fn, nop_fn)
   | otherwise = do
     -- Part (1)
     res_bndr <- mk_res_bndr body_ty
@@ -1398,8 +1420,8 @@ mkWWcpr_entry opts body_ty body_cpr
     let wrap_fn      = unbox_transit_tup rebuilt_result                 -- 3 2
         work_fn body = bind_res_bndr body (work_unpack_res transit_tup) -- 1 2 3
     return $ if not useful
-                then (badWorker, nop_fn, nop_fn)
-                else (goodWorker, wrap_fn, work_fn)
+                then (boringSplit, nop_fn, nop_fn)
+                else (usefulSplit, wrap_fn, work_fn)
 
 -- | Part (1) of Note [Worker/wrapper for CPR].
 mk_res_bndr :: Type -> UniqSM Id
@@ -1411,18 +1433,18 @@ mk_res_bndr body_ty = do
 
 -- | What part (2) of Note [Worker/wrapper for CPR] collects.
 --
---   1. A Bool capturing whether the transformation did anything useful.
+--   1. A 'WwUse' capturing whether the split does anything useful.
 --   2. The list of transit variables (see the Note).
 --   3. The result builder expression for the wrapper.  The original case binder if not useful.
 --   4. The result unpacking expression for the worker. 'nop_fn' if not useful.
-type CprWwResultOne  = (Bool, OrdList Var,  CoreExpr , CoreExpr -> CoreExpr)
-type CprWwResultMany = (Bool, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
+type CprWwResultOne  = (WwUse, OrdList Var,  CoreExpr , CoreExpr -> CoreExpr)
+type CprWwResultMany = (WwUse, OrdList Var, [CoreExpr], CoreExpr -> CoreExpr)
 
 mkWWcpr :: WwOpts -> [Id] -> [Cpr] -> UniqSM CprWwResultMany
 mkWWcpr _opts vars []   =
   -- special case: No CPRs means all top (for example from FlatConCpr),
   -- hence stop WW.
-  return (badWorker, toOL vars, map varToCoreExpr vars, nop_fn)
+  return (boringSplit, toOL vars, map varToCoreExpr vars, nop_fn)
 mkWWcpr opts  vars cprs = do
   -- No existentials in 'vars'. 'canUnboxResult' should have checked that.
   massertPpr (not (any isTyVar vars)) (ppr vars $$ ppr cprs)
@@ -1441,7 +1463,7 @@ mkWWcpr_one opts res_bndr cpr
   , DoUnbox dcpc <- canUnboxResult (wo_fam_envs opts) (idType res_bndr) cpr
   = unbox_one_result opts res_bndr dcpc
   | otherwise
-  = return (badWorker, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
+  = return (boringSplit, unitOL res_bndr, varToCoreExpr res_bndr, nop_fn)
 
 unbox_one_result
   :: WwOpts -> Id -> DataConPatContext Cpr -> UniqSM CprWwResultOne
@@ -1467,11 +1489,10 @@ unbox_one_result opts res_bndr
       -- this_work_unbox_res alt = (case res_bndr |> co of C a b -> <alt>[a,b])
       this_work_unbox_res = mkUnpackCase (Var res_bndr) co cprCaseBndrMult dc arg_ids
 
-  -- Don't try to WW an unboxed tuple return type when there's nothing inside
-  -- to unbox further.
+  -- See Note [Unboxing through unboxed tuples]
   return $ if isUnboxedTupleDataCon dc && not nested_useful
-              then ( badWorker, unitOL res_bndr, Var res_bndr, nop_fn )
-              else ( goodWorker
+              then ( boringSplit, unitOL res_bndr, Var res_bndr, nop_fn )
+              else ( usefulSplit
                    , transit_vars
                    , rebuilt_result
                    , this_work_unbox_res . work_unbox_res


=====================================
compiler/GHC/Core/Rules.hs
=====================================
@@ -12,8 +12,10 @@ module GHC.Core.Rules (
         lookupRule,
 
         -- ** RuleBase, RuleEnv
+        RuleBase, RuleEnv(..), mkRuleEnv, emptyRuleEnv,
+        updExternalPackageRules, addLocalRules, updLocalRules,
         emptyRuleBase, mkRuleBase, extendRuleBaseList,
-        pprRuleBase, extendRuleEnv,
+        pprRuleBase,
 
         -- ** Checking rule applications
         ruleCheckProgram,
@@ -22,6 +24,8 @@ module GHC.Core.Rules (
         extendRuleInfo, addRuleInfo,
         addIdSpecialisations,
 
+        -- ** RuleBase and RuleEnv
+
         -- * Misc. CoreRule helpers
         rulesOfBinds, getRules, pprRulesForUser,
 
@@ -34,6 +38,8 @@ import GHC.Prelude
 
 import GHC.Unit.Module   ( Module )
 import GHC.Unit.Module.Env
+import GHC.Unit.Module.ModGuts( ModGuts(..) )
+import GHC.Unit.Module.Deps( Dependencies(..) )
 
 import GHC.Driver.Session( DynFlags )
 import GHC.Driver.Ppr( showSDoc )
@@ -135,7 +141,7 @@ Note [Overall plumbing for rules]
 * At the moment (c) is carried in a reader-monad way by the GHC.Core.Opt.Monad.
   The HomePackageTable doesn't have a single RuleBase because technically
   we should only be able to "see" rules "below" this module; so we
-  generate a RuleBase for (c) by combing rules from all the modules
+  generate a RuleBase for (c) by combining rules from all the modules
   "below" us.  That's why we can't just select the home-package RuleBase
   from HscEnv.
 
@@ -339,12 +345,106 @@ addIdSpecialisations id rules
 rulesOfBinds :: [CoreBind] -> [CoreRule]
 rulesOfBinds binds = concatMap (concatMap idCoreRules . bindersOf) binds
 
+
+{-
+************************************************************************
+*                                                                      *
+                RuleBase
+*                                                                      *
+************************************************************************
+-}
+
+-- | Gathers a collection of 'CoreRule's. Maps (the name of) an 'Id' to its rules
+type RuleBase = NameEnv [CoreRule]
+        -- The rules are unordered;
+        -- we sort out any overlaps on lookup
+
+emptyRuleBase :: RuleBase
+emptyRuleBase = emptyNameEnv
+
+mkRuleBase :: [CoreRule] -> RuleBase
+mkRuleBase rules = extendRuleBaseList emptyRuleBase rules
+
+extendRuleBaseList :: RuleBase -> [CoreRule] -> RuleBase
+extendRuleBaseList rule_base new_guys
+  = foldl' extendRuleBase rule_base new_guys
+
+extendRuleBase :: RuleBase -> CoreRule -> RuleBase
+extendRuleBase rule_base rule
+  = extendNameEnv_Acc (:) Utils.singleton rule_base (ruleIdName rule) rule
+
+pprRuleBase :: RuleBase -> SDoc
+pprRuleBase rules = pprUFM rules $ \rss ->
+  vcat [ pprRules (tidyRules emptyTidyEnv rs)
+       | rs <- rss ]
+
+-- | A full rule environment which we can apply rules from.  Like a 'RuleBase',
+-- but it also includes the set of visible orphans we use to filter out orphan
+-- rules which are not visible (even though we can see them...)
+-- See Note [Orphans] in GHC.Core
+data RuleEnv
+    = RuleEnv { re_local_rules   :: !RuleBase -- Rules from this module
+              , re_home_rules    :: !RuleBase -- Rule from the home package
+                                              --   (excl this module)
+              , re_eps_rules     :: !RuleBase -- Rules from other packages
+                                              --   see Note [External package rules]
+              , re_visible_orphs :: !ModuleSet
+              }
+
+mkRuleEnv :: ModGuts -> RuleBase -> RuleBase -> RuleEnv
+mkRuleEnv (ModGuts { mg_module = this_mod
+                   , mg_deps   = deps
+                   , mg_rules  = local_rules })
+          eps_rules hpt_rules
+  = RuleEnv { re_local_rules   = mkRuleBase local_rules
+            , re_home_rules    = hpt_rules
+            , re_eps_rules     = eps_rules
+            , re_visible_orphs = mkModuleSet vis_orphs }
+  where
+    vis_orphs = this_mod : dep_orphs deps
+
+updExternalPackageRules :: RuleEnv -> RuleBase -> RuleEnv
+-- Completely over-ride the external rules in RuleEnv
+updExternalPackageRules rule_env eps_rules
+  = rule_env { re_eps_rules = eps_rules }
+
+updLocalRules :: RuleEnv -> [CoreRule] -> RuleEnv
+-- Completely over-ride the local rules in RuleEnv
+updLocalRules rule_env local_rules
+  = rule_env { re_local_rules = mkRuleBase local_rules }
+
+addLocalRules :: RuleEnv -> [CoreRule] -> RuleEnv
+-- Add new local rules
+addLocalRules rule_env rules
+  = rule_env { re_local_rules = extendRuleBaseList (re_local_rules rule_env) rules }
+
+emptyRuleEnv :: RuleEnv
+emptyRuleEnv = RuleEnv { re_local_rules   = emptyNameEnv
+                       , re_home_rules    = emptyNameEnv
+                       , re_eps_rules     = emptyNameEnv
+                       , re_visible_orphs = emptyModuleSet }
+
 getRules :: RuleEnv -> Id -> [CoreRule]
+-- Given a RuleEnv and an Id, find the visible rules for that Id
 -- See Note [Where rules are found]
-getRules (RuleEnv { re_base = rule_base, re_visible_orphs = orphs }) fn
-  = idCoreRules fn ++ concatMap imp_rules rule_base
+getRules (RuleEnv { re_local_rules   = local_rules
+                  , re_home_rules    = home_rules
+                  , re_eps_rules     = eps_rules
+                  , re_visible_orphs = orphs }) fn
+
+  | Just {} <- isDataConId_maybe fn   -- Short cut for data constructor workers
+  = []                                -- and wrappers, which never have any rules
+
+  | otherwise
+  = idCoreRules fn          ++
+    get local_rules         ++
+    find_visible home_rules ++
+    find_visible eps_rules
+
   where
-    imp_rules rb = filter (ruleIsVisible orphs) (lookupNameEnv rb (idName fn) `orElse` [])
+    fn_name = idName fn
+    find_visible rb = filter (ruleIsVisible orphs) (get rb)
+    get rb = lookupNameEnv rb fn_name `orElse` []
 
 ruleIsVisible :: ModuleSet -> CoreRule -> Bool
 ruleIsVisible _ BuiltinRule{} = True
@@ -370,37 +470,28 @@ but that isn't quite right:
        in the module defining the Id (when it's a LocalId), but
        the rules are kept in the global RuleBase
 
+ Note [External package rules]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In Note [Overall plumbing for rules], it is explained that the final
+RuleBase which we must consider is combined from 4 different sources.
 
-************************************************************************
-*                                                                      *
-                RuleBase
-*                                                                      *
-************************************************************************
--}
-
--- RuleBase itself is defined in GHC.Core, along with CoreRule
-
-emptyRuleBase :: RuleBase
-emptyRuleBase = emptyNameEnv
-
-mkRuleBase :: [CoreRule] -> RuleBase
-mkRuleBase rules = extendRuleBaseList emptyRuleBase rules
+During simplifier runs, the fourth source of rules is constantly being updated
+as new interfaces are loaded into the EPS. Therefore just before we check to see
+if any rules match we get the EPS RuleBase and combine it with the existing RuleBase
+and then perform exactly 1 lookup into the new map.
 
-extendRuleBaseList :: RuleBase -> [CoreRule] -> RuleBase
-extendRuleBaseList rule_base new_guys
-  = foldl' extendRuleBase rule_base new_guys
+It is more efficient to avoid combining the environments and store the uncombined
+environments as we can instead perform 1 lookup into each environment and then combine
+the results.
 
-extendRuleBase :: RuleBase -> CoreRule -> RuleBase
-extendRuleBase rule_base rule
-  = extendNameEnv_Acc (:) Utils.singleton rule_base (ruleIdName rule) rule
+Essentially we use the identity:
 
-extendRuleEnv :: RuleEnv -> RuleBase -> RuleEnv
-extendRuleEnv (RuleEnv rules orphs) rb = (RuleEnv (rb:rules) orphs)
+> lookupNameEnv n (plusNameEnv_C (++) rb1 rb2)
+>   = lookupNameEnv n rb1 ++ lookupNameEnv n rb2
 
-pprRuleBase :: RuleBase -> SDoc
-pprRuleBase rules = pprUFM rules $ \rss ->
-  vcat [ pprRules (tidyRules emptyTidyEnv rs)
-       | rs <- rss ]
+The latter being more efficient as we don't construct an intermediate
+map.
+-}
 
 {-
 ************************************************************************
@@ -1575,7 +1666,7 @@ ruleCheckFun env fn args
   | otherwise             = unitBag (ruleAppCheck_help env fn args name_match_rules)
   where
     name_match_rules = filter match (rc_rules env fn)
-    match rule = (rc_pattern env) `isPrefixOf` unpackFS (ruleName rule)
+    match rule = rc_pattern env `isPrefixOf` unpackFS (ruleName rule)
 
 ruleAppCheck_help :: RuleCheckEnv -> Id -> [CoreExpr] -> [CoreRule] -> SDoc
 ruleAppCheck_help env fn args rules


=====================================
compiler/GHC/Driver/Config/Core/Opt/Simplify.hs
=====================================
@@ -6,7 +6,7 @@ module GHC.Driver.Config.Core.Opt.Simplify
 
 import GHC.Prelude
 
-import GHC.Core ( RuleBase )
+import GHC.Core.Rules ( RuleBase )
 import GHC.Core.Opt.Pipeline.Types ( CoreToDo(..) )
 import GHC.Core.Opt.Simplify ( SimplifyExprOpts(..), SimplifyOpts(..) )
 import GHC.Core.Opt.Simplify.Env ( FloatEnable(..), SimplMode(..) )
@@ -40,20 +40,19 @@ initSimplifyExprOpts dflags ic = SimplifyExprOpts
   }
 
 initSimplifyOpts :: DynFlags -> [Var] -> Int -> SimplMode -> RuleBase -> SimplifyOpts
-initSimplifyOpts dflags extra_vars iterations mode rule_base = let
+initSimplifyOpts dflags extra_vars iterations mode hpt_rule_base = let
   -- This is a particularly ugly construction, but we will get rid of it in !8341.
   opts = SimplifyOpts
     { so_dump_core_sizes = not $ gopt Opt_SuppressCoreSizes dflags
-    , so_iterations = iterations
-    , so_mode = mode
+    , so_iterations      = iterations
+    , so_mode            = mode
     , so_pass_result_cfg = if gopt Opt_DoCoreLinting dflags
-      then Just $ initLintPassResultConfig dflags extra_vars (CoreDoSimplify opts)
-      else Nothing
-    , so_rule_base = rule_base
-    , so_top_env_cfg = TopEnvConfig
-        { te_history_size = historySize dflags
-        , te_tick_factor = simplTickFactor dflags
-        }
+                           then Just $ initLintPassResultConfig dflags extra_vars
+                                                            (CoreDoSimplify opts)
+                           else Nothing
+    , so_hpt_rules       = hpt_rule_base
+    , so_top_env_cfg     = TopEnvConfig { te_history_size = historySize dflags
+                                        , te_tick_factor = simplTickFactor dflags }
     }
   in opts
 


=====================================
compiler/GHC/HsToCore/Errors/Ppr.hs
=====================================
@@ -86,7 +86,7 @@ instance Diagnostic DsMessage where
               hang (text "Top-level" <+> text desc <+> text "aren't allowed:") 2 (ppr bind)
     DsUselessSpecialiseForClassMethodSelector poly_id
       -> mkSimpleDecorated $
-           text "Ignoring useless SPECIALISE pragma for NOINLINE function:" <+> quotes (ppr poly_id)
+           text "Ignoring useless SPECIALISE pragma for class selector:" <+> quotes (ppr poly_id)
     DsUselessSpecialiseForNoInlineFunction poly_id
       -> mkSimpleDecorated $
           text "Ignoring useless SPECIALISE pragma for NOINLINE function:" <+> quotes (ppr poly_id)


=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -585,6 +585,7 @@ mkDataConWorkId wkr_name data_con
                    `setInlinePragInfo`     wkr_inline_prag
                    `setUnfoldingInfo`      evaldUnfolding  -- Record that it's evaluated,
                                                            -- even if arity = 0
+          -- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
 
     wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
     wkr_arity = dataConRepArity data_con


=====================================
compiler/GHC/Unit/External.hs
=====================================
@@ -21,11 +21,10 @@ import GHC.Prelude
 import GHC.Unit
 import GHC.Unit.Module.ModIface
 
-import GHC.Core         ( RuleBase )
 import GHC.Core.FamInstEnv
 import GHC.Core.InstEnv ( InstEnv, emptyInstEnv )
 import GHC.Core.Opt.ConstantFold
-import GHC.Core.Rules (mkRuleBase)
+import GHC.Core.Rules ( RuleBase, mkRuleBase)
 
 import GHC.Types.Annotations ( AnnEnv, emptyAnnEnv )
 import GHC.Types.CompleteMatch


=====================================
libraries/base/GHC/Ix.hs
=====================================
@@ -140,12 +140,30 @@ Note [Out-of-bounds error messages]
 The default method for 'index' generates hoplelessIndexError, because
 Ix doesn't have Show as a superclass.  For particular base types we
 can do better, so we override the default method for index.
--}
 
--- Abstract these errors from the relevant index functions so that
--- the guts of the function will be small enough to inline.
+Note [indexError]
+~~~~~~~~~~~~~~~~~
+We abstract the guts of constructing an out-of-bounds error into `indexError`.
+We give it a NOINLINE pragma, because we don't want to duplicate this
+cold-path code.
+
+We give it a SPECIALISE pragma because we really want it to take
+its arguments unboxed, to avoid reboxing code in the caller, and
+perhaps even some reboxing code in the hot path of a caller.
+See Note [Boxity for bottoming functions] in GHC.Core.Opt.DmdAnal.
+
+The SPECIALISE pragma means that at least the Int-indexed case
+of indexError /will/ unbox its arguments.
 
+The [2] phase is because if we don't give an activation we'll get
+the one from the inline pragama (i.e. never) which is a bit silly.
+See Note [Activation pragmas for SPECIALISE] in GHC.HsToCore.Binds.
+-}
+
+-- indexError: see Note [indexError]
 {-# NOINLINE indexError #-}
+{-# SPECIALISE [2] indexError :: (Int,Int) -> Int -> String -> b #-}
+
 indexError :: Show a => (a,a) -> a -> String -> b
 indexError rng i tp
   = errorWithoutStackTrace (showString "Ix{" . showString tp . showString "}.index: Index " .


=====================================
libraries/base/GHC/Real.hs
=====================================
@@ -701,11 +701,14 @@ half of y - 1 can be computed as y `quot` 2, optimising subtraction away.
 
 Note [Inlining (^)
 ~~~~~~~~~~~~~~~~~~
-The INLINABLE pragma allows (^) to be specialised at its call sites.
+The INLINABLE [1] pragma allows (^) to be specialised at its call sites.
 If it is called repeatedly at the same type, that can make a huge
 difference, because of those constants which can be repeatedly
 calculated.
 
+We don't inline until phase 1, to give a chance for the RULES
+"^2/Int" etc to fire first.
+
 Currently the fromInteger calls are not floated because we get
           \d1 d2 x y -> blah
 after the gentle round of simplification.


=====================================
testsuite/tests/simplCore/should_compile/T21851.stderr
=====================================
@@ -15,5 +15,3 @@ g' :: Int -> Int
 g'
   = \ (x :: Int) -> case T21851a.$w$sf x of { (# ww, ww1 #) -> ww }
 
-
-


=====================================
testsuite/tests/simplCore/should_compile/T21851_2.hs
=====================================
@@ -0,0 +1,15 @@
+{-# OPTIONS_GHC -ddump-simpl -dsuppress-uniques -dno-typeable-binds #-}
+
+module T21851_2 where
+
+import T21851_2a
+
+g :: forall a. (Ord a, Num a) => a -> (a,String)
+g n | n < 10    = (0, f n True)
+    | otherwise = g (n-2)
+-- The specialised version of g leads to a specialised
+-- call to (f @Int @Bool).  Then we want to fire f's RULE
+-- and specialise 'wombat'
+
+h = g (3::Int)
+


=====================================
testsuite/tests/simplCore/should_compile/T21851_2.stderr
=====================================
@@ -0,0 +1,120 @@
+[1 of 2] Compiling T21851_2a        ( T21851_2a.hs, T21851_2a.o )
+[2 of 2] Compiling T21851_2         ( T21851_2.hs, T21851_2.o )
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 107, types: 96, coercions: 0, joins: 0/0}
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+lvl :: Integer
+[GblId, Unf=OtherCon []]
+lvl = GHC.Num.Integer.IS 2#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+lvl1 :: Integer
+[GblId, Unf=OtherCon []]
+lvl1 = GHC.Num.Integer.IS 0#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+lvl2 :: Integer
+[GblId, Unf=OtherCon []]
+lvl2 = GHC.Num.Integer.IS 10#
+
+Rec {
+-- RHS size: {terms: 25, types: 5, coercions: 0, joins: 0/0}
+T21851_2.$s$wwombat [InlPrag=[~], Occ=LoopBreaker]
+  :: GHC.Prim.Int# -> Bool -> [Char]
+[GblId, Arity=2, Str=<1L><ML>, Unf=OtherCon []]
+T21851_2.$s$wwombat
+  = \ (ww :: GHC.Prim.Int#) (y :: Bool) ->
+      case ww of ds {
+        __DEFAULT ->
+          case y of {
+            False ->
+              GHC.CString.unpackAppendCString#
+                GHC.Show.$fShowBool3
+                (T21851_2.$s$wwombat (GHC.Prim.-# ds 1#) GHC.Types.False);
+            True ->
+              GHC.CString.unpackAppendCString#
+                GHC.Show.$fShowBool2
+                (T21851_2.$s$wwombat (GHC.Prim.-# ds 1#) GHC.Types.True)
+          };
+        0# -> GHC.Types.[] @Char
+      }
+end Rec }
+
+Rec {
+-- RHS size: {terms: 16, types: 6, coercions: 0, joins: 0/0}
+T21851_2.$w$sg [InlPrag=[2], Occ=LoopBreaker]
+  :: GHC.Prim.Int# -> (# GHC.Prim.Int#, String #)
+[GblId, Arity=1, Str=<L>, Unf=OtherCon []]
+T21851_2.$w$sg
+  = \ (ww :: GHC.Prim.Int#) ->
+      case GHC.Prim.<# ww 10# of {
+        __DEFAULT -> T21851_2.$w$sg (GHC.Prim.-# ww 2#);
+        1# -> (# 0#, T21851_2.$s$wwombat ww GHC.Types.True #)
+      }
+end Rec }
+
+-- RHS size: {terms: 3, types: 3, coercions: 0, joins: 0/0}
+lvl3 :: forall {a}. [Char]
+[GblId]
+lvl3 = \ (@a) -> T21851_2a.$wf GHC.Prim.(##) @a @Bool
+
+Rec {
+-- RHS size: {terms: 27, types: 18, coercions: 0, joins: 0/0}
+T21851_2.$wg [InlPrag=[2], Occ=LoopBreaker]
+  :: forall {a}. (Ord a, Num a) => a -> (# a, String #)
+[GblId[StrictWorker([!])],
+ Arity=3,
+ Str=<SP(A,A,SC(S,C(1,L)),A,A,A,A,A)><LP(A,LC(L,C(1,L)),A,A,A,A,L)><L>,
+ Unf=OtherCon []]
+T21851_2.$wg
+  = \ (@a) ($dOrd :: Ord a) ($dNum :: Num a) (n :: a) ->
+      case < @a $dOrd n (fromInteger @a $dNum lvl2) of {
+        False ->
+          T21851_2.$wg
+            @a $dOrd $dNum (- @a $dNum n (fromInteger @a $dNum lvl));
+        True -> (# fromInteger @a $dNum lvl1, lvl3 @a #)
+      }
+end Rec }
+
+-- RHS size: {terms: 13, types: 16, coercions: 0, joins: 0/0}
+g [InlPrag=[2]] :: forall a. (Ord a, Num a) => a -> (a, String)
+[GblId,
+ Arity=3,
+ Str=<SP(A,A,SC(S,C(1,L)),A,A,A,A,A)><LP(A,LC(L,C(1,L)),A,A,A,A,L)><L>,
+ Cpr=1,
+ Unf=Unf{Src=StableSystem, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=3,unsat_ok=True,boring_ok=False)
+         Tmpl= \ (@a)
+                 ($dOrd [Occ=Once1] :: Ord a)
+                 ($dNum [Occ=Once1] :: Num a)
+                 (n [Occ=Once1] :: a) ->
+                 case T21851_2.$wg @a $dOrd $dNum n of
+                 { (# ww [Occ=Once1], ww1 [Occ=Once1] #) ->
+                 (ww, ww1)
+                 }}]
+g = \ (@a) ($dOrd :: Ord a) ($dNum :: Num a) (n :: a) ->
+      case T21851_2.$wg @a $dOrd $dNum n of { (# ww, ww1 #) ->
+      (ww, ww1)
+      }
+
+-- RHS size: {terms: 8, types: 9, coercions: 0, joins: 0/0}
+h :: (Int, String)
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
+         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 50 10}]
+h = case T21851_2.$w$sg 3# of { (# ww, ww1 #) ->
+    (GHC.Types.I# ww, ww1)
+    }
+
+
+------ Local rules for imported ids --------
+"SPEC/T21851_2 $wwombat @Bool" [2]
+    forall ($dShow :: Show Bool).
+      T21851_2a.$wwombat @Bool $dShow
+      = T21851_2.$s$wwombat
+
+


=====================================
testsuite/tests/simplCore/should_compile/T21851_2a.hs
=====================================
@@ -0,0 +1,11 @@
+module T21851_2a where
+
+f :: (Num a, Show b) => a -> b -> String
+{-# NOINLINE f #-}
+f x y = "no"
+{-# RULES "wombat"  f = wombat #-}
+
+wombat :: Show b => Int -> b -> String
+{-# INLINEABLE wombat #-}
+wombat 0 y = ""
+wombat n y = show y ++ wombat (n-1) y


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -442,3 +442,7 @@ test('T22357',  normal, compile, ['-O'])
 #    Rule fired: SPEC/T17366 f @(Tagged tag) @_ (T17366)
 test('T17366',  normal, multimod_compile, ['T17366', '-O -v0 -ddump-rule-firings'])
 test('T17366_AR',  [grep_errmsg(r'SPEC')], multimod_compile, ['T17366_AR', '-O -v0 -ddump-rule-firings'])
+
+# One module, T21851_2.hs, has OPTIONS_GHC -ddump-simpl
+# Expecting to see $s$wwombat
+test('T21851_2', [grep_errmsg(r'wwombat') ], multimod_compile, ['T21851_2', '-O -dno-typeable-binds -dsuppress-uniques'])


=====================================
testsuite/tests/stranal/should_compile/T22388.hs
=====================================
@@ -0,0 +1,14 @@
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Unboxing through unboxed tuples]
+module T22388 where
+
+-- Don't split, because neither the result not arg cancels away a box.
+boring :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+boring (# x, y, z #) = (# y, z, x #)
+{-# NOINLINE boring #-}
+
+-- Do split, because we get to drop z and pass x and y unboxed
+interesting :: (# Int, Int, Int #) -> (# Int #)
+interesting (# x, y, z #) = let !t = x + y in (# t #)
+{-# NOINLINE interesting #-}


=====================================
testsuite/tests/stranal/should_compile/T22388.stderr
=====================================
@@ -0,0 +1,92 @@
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 48, types: 81, coercions: 0, joins: 0/0}
+
+-- RHS size: {terms: 8, types: 23, coercions: 0, joins: 0/0}
+boring [InlPrag=NOINLINE]
+  :: (# Int, Int, Int #) -> (# Int, Int, Int #)
+[GblId, Arity=1, Str=<1!P(L,L,L)>, Cpr=1, Unf=OtherCon []]
+boring
+  = \ (ds :: (# Int, Int, Int #)) ->
+      case ds of { (# x, y, z #) -> (# y, z, x #) }
+
+-- RHS size: {terms: 5, types: 2, coercions: 0, joins: 0/0}
+T22388.$winteresting [InlPrag=NOINLINE]
+  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
+[GblId, Arity=2, Str=<L><L>, Unf=OtherCon []]
+T22388.$winteresting
+  = \ (ww :: GHC.Prim.Int#) (ww1 :: GHC.Prim.Int#) ->
+      GHC.Prim.+# ww ww1
+
+-- RHS size: {terms: 18, types: 24, coercions: 0, joins: 0/0}
+interesting [InlPrag=NOINLINE[final]]
+  :: (# Int, Int, Int #) -> (# Int #)
+[GblId,
+ Arity=1,
+ Str=<1!P(1!P(L),1!P(L),A)>,
+ Cpr=1(1),
+ Unf=Unf{Src=StableSystem, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
+         Tmpl= \ (ds [Occ=Once1!] :: (# Int, Int, Int #)) ->
+                 case ds of
+                 { (# ww [Occ=Once1!], ww1 [Occ=Once1!], _ [Occ=Dead] #) ->
+                 case ww of { GHC.Types.I# ww3 [Occ=Once1] ->
+                 case ww1 of { GHC.Types.I# ww4 [Occ=Once1] ->
+                 case T22388.$winteresting ww3 ww4 of ww5 [Occ=Once1] { __DEFAULT ->
+                 (# GHC.Types.I# ww5 #)
+                 }
+                 }
+                 }
+                 }}]
+interesting
+  = \ (ds :: (# Int, Int, Int #)) ->
+      case ds of { (# ww, ww1, ww2 #) ->
+      case ww of { GHC.Types.I# ww3 ->
+      case ww1 of { GHC.Types.I# ww4 ->
+      case T22388.$winteresting ww3 ww4 of ww5 { __DEFAULT ->
+      (# GHC.Types.I# ww5 #)
+      }
+      }
+      }
+      }
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule4 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 0}]
+T22388.$trModule4 = "main"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule3 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule3 = GHC.Types.TrNameS T22388.$trModule4
+
+-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule2 :: GHC.Prim.Addr#
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 30 0}]
+T22388.$trModule2 = "T22388"#
+
+-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule1 :: GHC.Types.TrName
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule1 = GHC.Types.TrNameS T22388.$trModule2
+
+-- RHS size: {terms: 3, types: 0, coercions: 0, joins: 0/0}
+T22388.$trModule :: GHC.Types.Module
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
+T22388.$trModule
+  = GHC.Types.Module T22388.$trModule3 T22388.$trModule1
+
+
+


=====================================
testsuite/tests/stranal/should_compile/all.T
=====================================
@@ -86,3 +86,5 @@ test('T21128', [ grep_errmsg(r'let { y = I\#') ], multimod_compile, ['T21128', '
 test('T21265', normal, compile, [''])
 test('EtaExpansion', normal, compile, [''])
 test('T22039', normal, compile, [''])
+# T22388: Should see $winteresting but not $wboring
+test('T22388', [ grep_errmsg(r'^\S+\$w\S+') ], compile, ['-dsuppress-uniques -ddump-simpl'])


=====================================
testsuite/tests/stranal/sigs/T21737.hs
=====================================
@@ -0,0 +1,47 @@
+{-# OPTIONS_GHC -fmax-worker-args=4 #-}
+
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
+
+-- See Note [Worker argument budget]
+module T21737 where
+
+data T = MkT (# Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- We should unbox through the unboxed pair but not T
+{-# NOINLINE f #-}
+f :: Int -> (# Int, Int #) -> T -> Int
+f x (# y, z #) (MkT (# x1, x2, x3, x4 #)) = x + y + z + x1 + x2 + x3 + x4
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the triple *even if* that gets us to 6 args,
+-- because the triple will take 3 registers anyway (not 1)
+-- and we get to unbox a b c.
+yes :: (# Int, Int, Int #) -> Int -> Int -> Int -> Int
+yes (# a, b, c #) d e f = a + b + c + d + e + f
+{-# NOINLINE yes #-}
+
+data U = MkU (# Int, Int, Int, Int, Int, Int #)
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Don't unbox U, because then we'll pass an unboxed 6-tuple, all in registers.
+no :: U -> Int
+no (MkU (# a, b, c, d, e, f #)) = a + b + c + d + e + f
+{-# NOINLINE no #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Hence do not unbox the nested triple.
+boxed :: (Int, Int) -> (Int, (Int, Int, Int)) -> Int
+boxed (a,b) (c, (d,e,f)) = a + b + c + d + e + f
+{-# NOINLINE boxed #-}
+
+-- NB: -fmax-worker-args=4 at the top of this file!
+-- Do split the inner unboxed triple *even if* that gets us to 5 args, because
+-- the function will take 5 args anyway. But don't split the pair!
+unboxed :: (Int, Int) -> (# Int, (# Int, Int, Int #) #) -> Int
+unboxed (a,b) (# c, (# d, e, f #) #) = a + b + c + d + e + f
+{-# NOINLINE unboxed #-}
+
+-- Point: Demand on `x` is lazy and thus Unboxed
+app :: ((# Int, Int #) -> (# Int, Int #)) -> (# Int, Int #) -> (# Int, Int #)
+app g x = g x


=====================================
testsuite/tests/stranal/sigs/T21737.stderr
=====================================
@@ -0,0 +1,30 @@
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+
+
+==================== Cpr signatures ====================
+T21737.app:
+T21737.boxed: 1
+T21737.f: 1
+T21737.no: 1
+T21737.unboxed: 1
+T21737.yes: 1
+
+
+
+==================== Strictness signatures ====================
+T21737.app: <1C(1,L)><L>
+T21737.boxed: <1!P(1!P(L),1!P(L))><1!P(1!P(L),1P(1L,1L,1L))>
+T21737.f: <1!P(L)><1!P(1!P(L),1!P(L))><1P(1P(1L,1L,1L,1L))>
+T21737.no: <1P(1P(1L,1L,1L,1L,1L,1L))>
+T21737.unboxed: <1P(1L,1L)><1!P(1!P(L),1!P(1!P(L),1!P(L),1!P(L)))>
+T21737.yes: <1!P(1!P(L),1!P(L),1!P(L))><1!P(L)><1!P(L)><1!P(L)>
+
+


=====================================
testsuite/tests/stranal/sigs/all.T
=====================================
@@ -38,3 +38,4 @@ test('T21754', normal, compile, [''])
 test('T21888', normal, compile, [''])
 test('T21888a', normal, compile, [''])
 test('T22241', normal, compile, [''])
+test('T21737', normal, compile, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/299d21bbe03d4e054be7702b495f792f99898dfd...13b7c9ef64d7505116373a297a9074500d61764f

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/299d21bbe03d4e054be7702b495f792f99898dfd...13b7c9ef64d7505116373a297a9074500d61764f
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221110/1fd54352/attachment-0001.html>


More information about the ghc-commits mailing list