[Git][ghc/ghc][wip/T21470] 4 commits: Make rewrite rules "win" over inlining

Simon Peyton Jones (@simonpj) gitlab at gitlab.haskell.org
Tue Oct 11 07:43:05 UTC 2022



Simon Peyton Jones pushed to branch wip/T21470 at Glasgow Haskell Compiler / GHC


Commits:
96d32ff2 by Simon Peyton Jones at 2022-10-10T22:30:21+01:00
Make rewrite rules "win" over inlining

If a rewrite rule and a rewrite rule compete in the simplifier, this
patch makes sure that the rewrite rule "win".  That is, in general
a bit fragile, but it's a huge help when making specialisation work
reliably, as #21851 and #22097 showed.

The change is fairly straightforwad, and documented in
   Note [Rewrite rules and inlining]
in GHC.Core.Opt.Simplify.Iteration.

Compile-times change, up and down a bit -- in some cases because
we get better specialisation.  But the payoff (more reliable
specialisation) is large.

Metrics: compile_time/bytes allocated
-----------------------------------------------
    T10421(normal)   +3.7% BAD
   T10421a(normal)   +5.5%
    T13253(normal)   +1.3%
      T14052(ghci)   +1.8%
    T15304(normal)   -1.4%
    T16577(normal)   +3.1% BAD
    T17516(normal)   +2.3%
    T17836(normal)   -1.9%
    T18223(normal)   -1.8%
     T8095(normal)   -1.3%
     T9961(normal)   +2.5% BAD

         geo. mean   +0.0%
         minimum     -1.9%
         maximum     +5.5%

Nofib results are (bytes allocated)

+-------------------------------++----------+
|                               ||tsv (rel) |
+===============================++==========+
|           imaginary/paraffins ||   +0.27% |
|                imaginary/rfib ||   -0.04% |
|                     real/anna ||   +0.02% |
|                      real/fem ||   -0.04% |
|                    real/fluid ||   +1.68% |
|                   real/gamteb ||   -0.34% |
|                       real/gg ||   +1.54% |
|                   real/hidden ||   -0.01% |
|                      real/hpg ||   -0.03% |
|                    real/infer ||   -0.03% |
|                   real/prolog ||   +0.02% |
|                  real/veritas ||   -0.47% |
|       shootout/fannkuch-redux ||   -0.03% |
|         shootout/k-nucleotide ||   -0.02% |
|               shootout/n-body ||   -0.06% |
|        shootout/spectral-norm ||   -0.01% |
|         spectral/cryptarithm2 ||   +1.25% |
|             spectral/fibheaps ||  +18.33% |
|           spectral/last-piece ||   -0.34% |
+===============================++==========+
|                     geom mean ||   +0.17% |

There are extensive notes in !8897 about the regressions.
Briefly

* fibheaps: there was a very delicately balanced inlining that
  tipped over the wrong way after this change.

* cryptarithm2 and paraffins are caused by #22274, which is
  a separate issue really.  (I.e. the right fix is *not* to
  make inlining "win" over rules.)

So I'm accepting these changes

Metric Increase:
    T10421
    T16577
    T9961

- - - - -
ed4b5885 by Joachim Breitner at 2022-10-10T23:16:11-04:00
Utils.JSON: do not escapeJsonString in ToJson String instance

as `escapeJsonString` is used in `renderJSON`, so the `JSString`
constructor is meant to carry the unescaped string.

- - - - -
42112508 by Simon Peyton Jones at 2022-10-11T08:44:53+01:00
Make the specialiser handle polymorphic specialisation

Ticket #13873 unexpectedly showed that a SPECIALISE pragma made a
program run (a lot) slower, because less specialisation took place
overall. It turned out that the specialiser was missing opportunities
because of quantified type variables.

It was quite easy to fix. The story is given in
    Note [Specialising polymorphic dictionaries]

Two other minor fixes in the specialiser

* There is no benefit in specialising data constructor /wrappers/.
  (They can appear overloaded because they are given a dictionary
  to store in the constructor.)  Small guard in canSpecImport.

* There was a buglet in the UnspecArg case of specHeader, in the
  case where there is a dead binder. We need a LitRubbish filler
  for the specUnfolding stuff.  I expanded
  Note [Drop dead args from specialisations] to explain.

There is a 4% increase in compile time for T15164, because we generate
more specialised code.  This seems OK.

Metric Increase:
    T15164

- - - - -
3c4220c9 by Simon Peyton Jones at 2022-10-11T08:44:53+01:00
Fix binder-swap bug

This patch fixes #21229 properly, by avoiding doing a
binder-swap on dictionary Ids.  This is pretty subtle, and explained
in Note [Care with binder-swap on dictionaries].

Test is already in simplCore/should_run/T21229

This allows us to restore a feature to the specialiser that we had
to revert: see Note [Specialising polymorphic dictionaries].
(This is done in a separate patch.)

I also modularised things, using a new function scrutBinderSwap_maybe
in all the places where we are (effectively) doing a binder-swap,
notably

* Simplify.Iteration.addAltUnfoldings
* SpecConstr.extendCaseBndrs

In Simplify.Iteration.addAltUnfoldings I also eliminated a guard
    Many <- idMult case_bndr
because we concluded, in #22123, that it was doing no good.

- - - - -


23 changed files:

- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/Simplify/Utils.hs
- compiler/GHC/Core/Opt/SpecConstr.hs
- compiler/GHC/Core/Opt/Specialise.hs
- compiler/GHC/Core/Subst.hs
- compiler/GHC/Core/Type.hs
- compiler/GHC/Utils/Json.hs
- compiler/GHC/Utils/Monad.hs
- testsuite/tests/lib/integer/Makefile
- testsuite/tests/linters/notes.stdout
- testsuite/tests/numeric/should_compile/T19641.stderr
- + testsuite/tests/simplCore/should_compile/T21851.hs
- + testsuite/tests/simplCore/should_compile/T21851.stderr
- + testsuite/tests/simplCore/should_compile/T21851a.hs
- + testsuite/tests/simplCore/should_compile/T22097.hs
- + testsuite/tests/simplCore/should_compile/T22097.stderr
- + testsuite/tests/simplCore/should_compile/T22097a.hs
- testsuite/tests/simplCore/should_compile/T6056.stderr
- testsuite/tests/simplCore/should_compile/T8331.stderr
- testsuite/tests/simplCore/should_compile/all.T


Changes:

=====================================
compiler/GHC/Core/Opt/OccurAnal.hs
=====================================
@@ -19,7 +19,7 @@ core expression with (hopefully) improved usage information.
 module GHC.Core.Opt.OccurAnal (
     occurAnalysePgm,
     occurAnalyseExpr,
-    zapLambdaBndrs
+    zapLambdaBndrs, scrutBinderSwap_maybe
   ) where
 
 import GHC.Prelude
@@ -27,11 +27,12 @@ import GHC.Prelude
 import GHC.Core
 import GHC.Core.FVs
 import GHC.Core.Utils   ( exprIsTrivial, isDefaultAlt, isExpandableApp,
-                          stripTicksTopE, mkTicks )
+                          mkCastMCo, mkTicks )
 import GHC.Core.Opt.Arity   ( joinRhsArity, isOneShotBndr )
 import GHC.Core.Coercion
+import GHC.Core.Predicate   ( isDictId )
 import GHC.Core.Type
-import GHC.Core.TyCo.FVs( tyCoVarsOfMCo )
+import GHC.Core.TyCo.FVs    ( tyCoVarsOfMCo )
 
 import GHC.Data.Maybe( isJust, orElse )
 import GHC.Data.Graph.Directed ( SCC(..), Node(..)
@@ -2462,8 +2463,8 @@ data OccEnv
 
            -- See Note [The binder-swap substitution]
            -- If  x :-> (y, co)  is in the env,
-           -- then please replace x by (y |> sym mco)
-           -- Invariant of course: idType x = exprType (y |> sym mco)
+           -- then please replace x by (y |> mco)
+           -- Invariant of course: idType x = exprType (y |> mco)
            , occ_bs_env  :: !(VarEnv (OutId, MCoercion))
            , occ_bs_rng  :: !VarSet   -- Vars free in the range of occ_bs_env
                    -- Domain is Global and Local Ids
@@ -2669,7 +2670,7 @@ The binder-swap is implemented by the occ_bs_env field of OccEnv.
 There are two main pieces:
 
 * Given    case x |> co of b { alts }
-  we add [x :-> (b, co)] to the occ_bs_env environment; this is
+  we add [x :-> (b, sym co)] to the occ_bs_env environment; this is
   done by addBndrSwap.
 
 * Then, at an occurrence of a variable, we look up in the occ_bs_env
@@ -2737,30 +2738,8 @@ Some tricky corners:
 (BS5) We have to apply the occ_bs_env substitution uniformly,
       including to (local) rules and unfoldings.
 
-Historical note
----------------
-We used to do the binder-swap transformation by introducing
-a proxy let-binding, thus;
-
-   case x of b { pi -> ri }
-      ==>
-   case x of b { pi -> let x = b in ri }
-
-But that had two problems:
-
-1. If 'x' is an imported GlobalId, we'd end up with a GlobalId
-   on the LHS of a let-binding which isn't allowed.  We worked
-   around this for a while by "localising" x, but it turned
-   out to be very painful #16296,
-
-2. In CorePrep we use the occurrence analyser to do dead-code
-   elimination (see Note [Dead code in CorePrep]).  But that
-   occasionally led to an unlifted let-binding
-       case x of b { DEFAULT -> let x::Int# = b in ... }
-   which disobeys one of CorePrep's output invariants (no unlifted
-   let-bindings) -- see #5433.
-
-Doing a substitution (via occ_bs_env) is much better.
+(BS6) We must be very careful with dictionaries.
+      See Note [Care with binder-swap on dictionaries]
 
 Note [Case of cast]
 ~~~~~~~~~~~~~~~~~~~
@@ -2770,6 +2749,54 @@ We'd like to eliminate the inner case.  That is the motivation for
 equation (2) in Note [Binder swap].  When we get to the inner case, we
 inline x, cancel the casts, and away we go.
 
+Note [Care with binder-swap on dictionaries]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This Note explains why we need isDictId in scrutBinderSwap_maybe.
+Consider this tricky example (#21229, #21470):
+
+  class Sing (b :: Bool) where sing :: Bool
+  instance Sing 'True  where sing = True
+  instance Sing 'False where sing = False
+
+  f :: forall a. Sing a => blah
+
+  h = \ @(a :: Bool) ($dSing :: Sing a)
+      let the_co =  Main.N:Sing[0] <a> :: Sing a ~R# Bool
+      case ($dSing |> the_co) of wild
+        True  -> f @'True (True |> sym the_co)
+        False -> f @a     dSing
+
+Now do a binder-swap on the case-expression:
+
+  h = \ @(a :: Bool) ($dSing :: Sing a)
+      let the_co =  Main.N:Sing[0] <a> :: Sing a ~R# Bool
+      case ($dSing |> the_co) of wild
+        True  -> f @'True (True |> sym the_co)
+        False -> f @a     (wild |> sym the_co)
+
+And now substitute `False` for `wild` (since wild=False in the False branch):
+
+  h = \ @(a :: Bool) ($dSing :: Sing a)
+      let the_co =  Main.N:Sing[0] <a> :: Sing a ~R# Bool
+      case ($dSing |> the_co) of wild
+        True  -> f @'True (True  |> sym the_co)
+        False -> f @a     (False |> sym the_co)
+
+And now we have a problem.  The specialiser will specialise (f @a d)a (for all
+vtypes a and dictionaries d!!) with the dictionary (False |> sym the_co), using
+Note [Specialising polymorphic dictionaries] in GHC.Core.Opt.Specialise.
+
+The real problem is the binder-swap.  It swaps a dictionary variable $dSing
+(of kind Constraint) for a term variable wild (of kind Type).  And that is
+dangerous: a dictionary is a /singleton/ type whereas a general term variable is
+not.  In this particular example, Bool is most certainly not a singleton type!
+
+Conclusion:
+  for a /dictionary variable/ do not perform
+  the clever cast version of the binder-swap
+
+Hence the subtle isDictId in scrutBinderSwap_maybe.
+
 Note [Zap case binders in proxy bindings]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 From the original
@@ -2784,8 +2811,87 @@ binding x = cb.  See #5028.
 NB: the OccInfo on /occurrences/ really doesn't matter much; the simplifier
 doesn't use it. So this is only to satisfy the perhaps-over-picky Lint.
 
+-}
+
+addBndrSwap :: OutExpr -> Id -> OccEnv -> OccEnv
+-- See Note [The binder-swap substitution]
+addBndrSwap scrut case_bndr
+            env@(OccEnv { occ_bs_env = swap_env, occ_bs_rng = rng_vars })
+  | Just (scrut_var, mco) <- scrutBinderSwap_maybe scrut
+  , scrut_var /= case_bndr
+      -- Consider: case x of x { ... }
+      -- Do not add [x :-> x] to occ_bs_env, else lookupBndrSwap will loop
+  = env { occ_bs_env = extendVarEnv swap_env scrut_var (case_bndr', mco)
+        , occ_bs_rng = rng_vars `extendVarSet` case_bndr'
+                       `unionVarSet` tyCoVarsOfMCo mco }
+
+  | otherwise
+  = env
+  where
+    case_bndr' = zapIdOccInfo case_bndr
+                 -- See Note [Zap case binders in proxy bindings]
+
+scrutBinderSwap_maybe :: OutExpr -> Maybe (OutVar, MCoercion)
+-- If (scrutBinderSwap_maybe e = Just (v, mco), then
+--    v = e |> mco
+-- See Note [Case of cast]
+-- See Note [Care with binder-swap on dictionaries]
+--
+-- We use this same function in SpecConstr, and Simplify.Iteration,
+-- when something binder-swap-like is happening
+scrutBinderSwap_maybe (Var v)    = Just (v, MRefl)
+scrutBinderSwap_maybe (Cast (Var v) co)
+  | not (isDictId v)             = Just (v, MCo (mkSymCo co))
+        -- Cast: see Note [Case of cast]
+        -- isDictId: see Note [Care with binder-swap on dictionaries]
+        -- The isDictId rejects a Constraint/Constraint binder-swap, perhaps
+        -- over-conservatively. But I have never seen one, so I'm leaving
+        -- the code as simple as possible. Losing the binder-swap in a
+        -- rare case probably has very low impact.
+scrutBinderSwap_maybe (Tick _ e) = scrutBinderSwap_maybe e  -- Drop ticks
+scrutBinderSwap_maybe _          = Nothing
+
+lookupBndrSwap :: OccEnv -> Id -> (CoreExpr, Id)
+-- See Note [The binder-swap substitution]
+-- Returns an expression of the same type as Id
+lookupBndrSwap env@(OccEnv { occ_bs_env = bs_env })  bndr
+  = case lookupVarEnv bs_env bndr of {
+       Nothing           -> (Var bndr, bndr) ;
+       Just (bndr1, mco) ->
+
+    -- Why do we iterate here?
+    -- See (BS2) in Note [The binder-swap substitution]
+    case lookupBndrSwap env bndr1 of
+      (fun, fun_id) -> (mkCastMCo fun mco, fun_id) }
+
+
+{- Historical note [Proxy let-bindings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We used to do the binder-swap transformation by introducing
+a proxy let-binding, thus;
+
+   case x of b { pi -> ri }
+      ==>
+   case x of b { pi -> let x = b in ri }
+
+But that had two problems:
+
+1. If 'x' is an imported GlobalId, we'd end up with a GlobalId
+   on the LHS of a let-binding which isn't allowed.  We worked
+   around this for a while by "localising" x, but it turned
+   out to be very painful #16296,
+
+2. In CorePrep we use the occurrence analyser to do dead-code
+   elimination (see Note [Dead code in CorePrep]).  But that
+   occasionally led to an unlifted let-binding
+       case x of b { DEFAULT -> let x::Int# = b in ... }
+   which disobeys one of CorePrep's output invariants (no unlifted
+   let-bindings) -- see #5433.
+
+Doing a substitution (via occ_bs_env) is much better.
+
 Historical Note [no-case-of-case]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We *used* to suppress the binder-swap in case expressions when
 -fno-case-of-case is on.  Old remarks:
     "This happens in the first simplifier pass,
@@ -2844,53 +2950,8 @@ binder-swap in OccAnal:
 It's fixed by doing the binder-swap in OccAnal because we can do the
 binder-swap unconditionally and still get occurrence analysis
 information right.
--}
 
-addBndrSwap :: OutExpr -> Id -> OccEnv -> OccEnv
--- See Note [The binder-swap substitution]
-addBndrSwap scrut case_bndr
-            env@(OccEnv { occ_bs_env = swap_env, occ_bs_rng = rng_vars })
-  | Just (scrut_var, mco) <- get_scrut_var (stripTicksTopE (const True) scrut)
-  , scrut_var /= case_bndr
-      -- Consider: case x of x { ... }
-      -- Do not add [x :-> x] to occ_bs_env, else lookupBndrSwap will loop
-  = env { occ_bs_env = extendVarEnv swap_env scrut_var (case_bndr', mco)
-        , occ_bs_rng = rng_vars `extendVarSet` case_bndr'
-                       `unionVarSet` tyCoVarsOfMCo mco }
-
-  | otherwise
-  = env
-  where
-    get_scrut_var :: OutExpr -> Maybe (OutVar, MCoercion)
-    get_scrut_var (Var v)           = Just (v, MRefl)
-    get_scrut_var (Cast (Var v) co) = Just (v, MCo co) -- See Note [Case of cast]
-    get_scrut_var _                 = Nothing
-
-    case_bndr' = zapIdOccInfo case_bndr
-                 -- See Note [Zap case binders in proxy bindings]
 
-lookupBndrSwap :: OccEnv -> Id -> (CoreExpr, Id)
--- See Note [The binder-swap substitution]
--- Returns an expression of the same type as Id
-lookupBndrSwap env@(OccEnv { occ_bs_env = bs_env })  bndr
-  = case lookupVarEnv bs_env bndr of {
-       Nothing           -> (Var bndr, bndr) ;
-       Just (bndr1, mco) ->
-
-    -- Why do we iterate here?
-    -- See (BS2) in Note [The binder-swap substitution]
-    case lookupBndrSwap env bndr1 of
-      (fun, fun_id) -> (add_cast fun mco, fun_id) }
-
-  where
-    add_cast fun MRefl    = fun
-    add_cast fun (MCo co) = Cast fun (mkSymCo co)
-    -- We must switch that 'co' to 'sym co';
-    -- see the comment with occ_bs_env
-    -- No need to test for isReflCo, because 'co' came from
-    -- a (Cast e co) and hence is unlikely to be Refl
-
-{-
 ************************************************************************
 *                                                                      *
 \subsection[OccurAnal-types]{OccEnv}


=====================================
compiler/GHC/Core/Opt/SetLevels.hs
=====================================
@@ -51,17 +51,6 @@
   The simplifier tries to get rid of occurrences of x, in favour of wild,
   in the hope that there will only be one remaining occurrence of x, namely
   the scrutinee of the case, and we can inline it.
-
-  This can only work if @wild@ is an unrestricted binder. Indeed, even with the
-  extended typing rule (in the linter) for case expressions, if
-       case x of wild % 1 { p -> e}
-  is well-typed, then
-       case x of wild % 1 { p -> e[wild\x] }
-  is only well-typed if @e[wild\x] = e@ (that is, if @wild@ is not used in @e@
-  at all). In which case, it is, of course, pointless to do the substitution
-  anyway. So for a linear binder (and really anything which isn't unrestricted),
-  doing this substitution would either produce ill-typed terms or be the
-  identity.
 -}
 
 module GHC.Core.Opt.SetLevels (
@@ -1602,7 +1591,9 @@ extendCaseBndrEnv :: LevelEnv
                   -> LevelEnv
 extendCaseBndrEnv le@(LE { le_subst = subst, le_env = id_env })
                   case_bndr (Var scrut_var)
-    | Many <- varMult case_bndr
+  -- We could use OccurAnal. scrutBinderSwap_maybe here, and perhaps
+  -- get a bit more floating.  But we didn't in the past and it's
+  -- an unforced change, so I'm leaving it.
   = le { le_subst   = extendSubstWithVar subst case_bndr scrut_var
        , le_env     = add_id id_env (case_bndr, scrut_var) }
 extendCaseBndrEnv env _ _ = env


=====================================
compiler/GHC/Core/Opt/Simplify/Env.hs
=====================================
@@ -71,7 +71,8 @@ import GHC.Core.Make            ( mkWildValBinder, mkCoreLet )
 import GHC.Builtin.Types
 import GHC.Core.TyCo.Rep        ( TyCoBinder(..) )
 import qualified GHC.Core.Type as Type
-import GHC.Core.Type hiding     ( substTy, substTyVar, substTyVarBndr, extendTvSubst, extendCvSubst )
+import GHC.Core.Type hiding     ( substTy, substTyVar, substTyVarBndr, substCo
+                                , extendTvSubst, extendCvSubst )
 import qualified GHC.Core.Coercion as Coercion
 import GHC.Core.Coercion hiding ( substCo, substCoVar, substCoVarBndr )
 import GHC.Platform ( Platform )


=====================================
compiler/GHC/Core/Opt/Simplify/Iteration.hs
=====================================
@@ -22,7 +22,7 @@ import GHC.Core.Opt.Simplify.Monad
 import GHC.Core.Type hiding ( substTy, substTyVar, extendTvSubst, extendCvSubst )
 import GHC.Core.Opt.Simplify.Env
 import GHC.Core.Opt.Simplify.Utils
-import GHC.Core.Opt.OccurAnal ( occurAnalyseExpr, zapLambdaBndrs )
+import GHC.Core.Opt.OccurAnal ( occurAnalyseExpr, zapLambdaBndrs, scrutBinderSwap_maybe )
 import GHC.Core.Make       ( FloatBind, mkImpossibleExpr, castBottomExpr )
 import qualified GHC.Core.Make
 import GHC.Core.Coercion hiding ( substCo, substCoVar )
@@ -1919,7 +1919,9 @@ wrapJoinCont env cont thing_inside
 
 
 --------------------
-trimJoinCont :: Id -> Maybe JoinArity -> SimplCont -> SimplCont
+trimJoinCont :: Id         -- Used only in error message
+             -> Maybe JoinArity
+             -> SimplCont -> SimplCont
 -- Drop outer context from join point invocation (jump)
 -- See Note [Join points and case-of-case]
 
@@ -2017,6 +2019,17 @@ outside.  Surprisingly tricky!
                      Variables
 *                                                                      *
 ************************************************************************
+
+Note [zapSubstEnv]
+~~~~~~~~~~~~~~~~~~
+When simplifying something that has already been simplified, be sure to
+zap the SubstEnv.  This is VITAL.  Consider
+     let x = e in
+     let y = \z -> ...x... in
+     \ x -> ...y...
+
+We'll clone the inner \x, adding x->x' in the id_subst Then when we
+inline y, we must *not* replace x by x' in the inlined copy!!
 -}
 
 simplVar :: SimplEnv -> InVar -> SimplM OutExpr
@@ -2035,86 +2048,28 @@ simplVar env var
 simplIdF :: SimplEnv -> InId -> SimplCont -> SimplM (SimplFloats, OutExpr)
 simplIdF env var cont
   = case substId env var of
-      ContEx tvs cvs ids e ->
-          let env' = setSubstEnv env tvs cvs ids
-          in simplExprF env' e cont
-          -- Don't trim; haven't already simplified e,
-          -- so the cont is not embodied in e
-
-      DoneId var1 -> do
-          logger <- getLogger
-          let cont' = trimJoinCont var (isJoinId_maybe var1) cont
-          completeCall logger env var1 cont'
-
-      DoneEx e mb_join ->
-          let env' = zapSubstEnv env
-              cont' = trimJoinCont var mb_join cont
-          in simplExprF env' e cont'
-              -- Note [zapSubstEnv]
-              -- ~~~~~~~~~~~~~~~~~~
-              -- The template is already simplified, so don't re-substitute.
-              -- This is VITAL.  Consider
-              --      let x = e in
-              --      let y = \z -> ...x... in
-              --      \ x -> ...y...
-              -- We'll clone the inner \x, adding x->x' in the id_subst
-              -- Then when we inline y, we must *not* replace x by x' in
-              -- the inlined copy!!
-
----------------------------------------------------------
---      Dealing with a call site
-
-completeCall :: Logger -> SimplEnv -> OutId -> SimplCont -> SimplM (SimplFloats, OutExpr)
-completeCall logger env var cont
-  | Just expr <- callSiteInline logger uf_opts case_depth var active_unf
-                                lone_variable arg_infos interesting_cont
-  -- Inline the variable's RHS
-  = do { checkedTick (UnfoldingDone var)
-       ; dump_inline expr cont
-       ; let env1 = zapSubstEnv env
-       ; simplExprF env1 expr cont }
+      ContEx tvs cvs ids e -> simplExprF env' e cont
+        -- Don't trimJoinCont; haven't already simplified e,
+        -- so the cont is not embodied in e
+        where
+          env' = setSubstEnv env tvs cvs ids
 
-  | otherwise
-  -- Don't inline; instead rebuild the call
-  = do { rule_base <- getSimplRules
-       ; let rules = getRules rule_base var
-             info = mkArgInfo env var rules
-                              n_val_args call_cont
-       ; rebuildCall env info cont }
+      DoneId var1 ->
+        do { rule_base <- getSimplRules
+           ; let cont' = trimJoinCont var1 (isJoinId_maybe var1) cont
+                 info  = mkArgInfo env rule_base var1 cont'
+           ; rebuildCall env info cont' }
 
-  where
-    uf_opts    = seUnfoldingOpts env
-    case_depth = seCaseDepth env
-    (lone_variable, arg_infos, call_cont) = contArgs cont
-    n_val_args       = length arg_infos
-    interesting_cont = interestingCallContext env call_cont
-    active_unf       = activeUnfolding (seMode env) var
-
-    log_inlining doc
-      = liftIO $ logDumpFile logger (mkDumpStyle alwaysQualify)
-           Opt_D_dump_inlinings
-           "" FormatText doc
+      DoneEx e mb_join -> simplExprF env' e cont'
+        where
+          cont' = trimJoinCont var mb_join cont
+          env'  = zapSubstEnv env  -- See Note [zapSubstEnv]
 
-    dump_inline unfolding cont
-      | not (logHasDumpFlag logger Opt_D_dump_inlinings) = return ()
-      | not (logHasDumpFlag logger Opt_D_verbose_core2core)
-      = when (isExternalName (idName var)) $
-            log_inlining $
-                sep [text "Inlining done:", nest 4 (ppr var)]
-      | otherwise
-      = log_inlining $
-           sep [text "Inlining done: " <> ppr var,
-                nest 4 (vcat [text "Inlined fn: " <+> nest 2 (ppr unfolding),
-                              text "Cont:  " <+> ppr cont])]
+---------------------------------------------------------
+--      Dealing with a call site
 
-rebuildCall :: SimplEnv
-            -> ArgInfo
-            -> SimplCont
+rebuildCall :: SimplEnv -> ArgInfo -> SimplCont
             -> SimplM (SimplFloats, OutExpr)
--- We decided not to inline, so
---    - simplify the arguments
---    - try rewrite rules
---    - and rebuild
 
 ---------- Bottoming applications --------------
 rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args, ai_dmds = [] }) cont
@@ -2137,27 +2092,48 @@ rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args, ai_dmds = [] }) con
     res     = argInfoExpr fun rev_args
     cont_ty = contResultType cont
 
----------- Try rewrite RULES --------------
--- See Note [Trying rewrite rules]
+---------- Try inlining, if ai_rewrite = TryInlining --------
+-- In the TryInlining case we try inlining immediately, before simplifying
+-- any (more) arguments. Why?  See Note [Rewrite rules and inlining].
+--
+-- If there are rewrite rules we'll skip this case until we have
+-- simplified enough args to satisfy nr_wanted==0 in the TryRules case below
+-- Then we'll try the rules, and if that fails, we'll do TryInlining
 rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args
-                              , ai_rules = Just (nr_wanted, rules) }) cont
+                              , ai_rewrite = TryInlining }) cont
+  = do { logger <- getLogger
+       ; let full_cont = pushSimplifiedRevArgs env rev_args cont
+       ; mb_inline <- tryInlining env logger fun full_cont
+       ; case mb_inline of
+            Just expr -> do { checkedTick (UnfoldingDone fun)
+                            ; let env1 = zapSubstEnv env
+                            ; simplExprF env1 expr full_cont }
+            Nothing -> rebuildCall env (info { ai_rewrite = TryNothing }) cont
+       }
+
+---------- Try rewrite RULES, if ai_rewrite = TryRules --------------
+-- See Note [Rewrite rules and inlining]
+-- See also Note [Trying rewrite rules]
+rebuildCall env info@(ArgInfo { ai_fun = fun, ai_args = rev_args
+                              , ai_rewrite = TryRules nr_wanted rules }) cont
   | nr_wanted == 0 || no_more_args
-  , let info' = info { ai_rules = Nothing }
   = -- We've accumulated a simplified call in <fun,rev_args>
     -- so try rewrite rules; see Note [RULES apply to simplified arguments]
     -- See also Note [Rules for recursive functions]
     do { mb_match <- tryRules env rules fun (reverse rev_args) cont
        ; case mb_match of
              Just (env', rhs, cont') -> simplExprF env' rhs cont'
-             Nothing                 -> rebuildCall env info' cont }
+             Nothing -> rebuildCall env (info { ai_rewrite = TryInlining }) cont }
   where
+    -- If we have run out of arguments, just try the rules; there might
+    -- be some with lower arity.  Casts get in the way -- they aren't
+    -- allowed on rule LHSs
     no_more_args = case cont of
                       ApplyToTy  {} -> False
                       ApplyToVal {} -> False
                       _             -> True
 
-
----------- Simplify applications and casts --------------
+---------- Simplify type applications and casts --------------
 rebuildCall env info (CastIt co cont)
   = rebuildCall env (addCastTo info co) cont
 
@@ -2202,6 +2178,7 @@ rebuildCall env (ArgInfo { ai_fun = fun_id, ai_args = rev_args })
              call' = mkApps (Var fun_id) [mkTyArg rr', mkTyArg ty', arg']
        ; return (emptyFloats env, call') }
 
+---------- Simplify value arguments --------------------
 rebuildCall env fun_info
             (ApplyToVal { sc_arg = arg, sc_env = arg_se
                         , sc_dup = dup_flag, sc_hole_ty = fun_ty
@@ -2237,6 +2214,42 @@ rebuildCall env fun_info
 rebuildCall env (ArgInfo { ai_fun = fun, ai_args = rev_args }) cont
   = rebuild env (argInfoExpr fun rev_args) cont
 
+-----------------------------------
+tryInlining :: SimplEnv -> Logger -> OutId -> SimplCont -> SimplM (Maybe OutExpr)
+tryInlining env logger var cont
+  | Just expr <- callSiteInline logger uf_opts case_depth var active_unf
+                                lone_variable arg_infos interesting_cont
+  = do { dump_inline expr cont
+       ; return (Just expr) }
+
+  | otherwise
+  = return Nothing
+
+  where
+    uf_opts    = seUnfoldingOpts env
+    case_depth = seCaseDepth env
+    (lone_variable, arg_infos, call_cont) = contArgs cont
+    interesting_cont = interestingCallContext env call_cont
+    active_unf       = activeUnfolding (seMode env) var
+
+    log_inlining doc
+      = liftIO $ logDumpFile logger (mkDumpStyle alwaysQualify)
+           Opt_D_dump_inlinings
+           "" FormatText doc
+
+    dump_inline unfolding cont
+      | not (logHasDumpFlag logger Opt_D_dump_inlinings) = return ()
+      | not (logHasDumpFlag logger Opt_D_verbose_core2core)
+      = when (isExternalName (idName var)) $
+            log_inlining $
+                sep [text "Inlining done:", nest 4 (ppr var)]
+      | otherwise
+      = log_inlining $
+           sep [text "Inlining done: " <> ppr var,
+                nest 4 (vcat [text "Inlined fn: " <+> nest 2 (ppr unfolding),
+                              text "Cont:  " <+> ppr cont])]
+
+
 {- Note [Trying rewrite rules]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider an application (f e1 e2 e3) where the e1,e2,e3 are not yet
@@ -2272,6 +2285,38 @@ makes a particularly big difference when superclass selectors are involved:
         op ($p1 ($p2 (df d)))
 We want all this to unravel in one sweep.
 
+Note [Rewrite rules and inlining]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In general we try to arrange that inlining is disabled (via a pragma) if
+a rewrite rule should apply, so that the rule has a decent chance to fire
+before we inline the function.
+
+But it turns out that (especially when type-class specialisation or
+SpecConstr is involved) it is very helpful for the the rewrite rule to
+"win" over inlining when both are active at once: see #21851, #22097.
+
+The simplifier arranges to do this, as follows. In effect, the ai_rewrite
+field of the ArgInfo record is the state of a little state-machine:
+
+* mkArgInfo sets the ai_rewrite field to TryRules if there are any rewrite
+  rules avaialable for that function.
+
+* rebuildCall simplifies arguments until enough are simplified to match the
+  rule with greatest arity.  See Note [RULES apply to simplified arguments]
+  and the first field of `TryRules`.
+
+  But no more! As soon as we have simplified enough arguments to satisfy the
+  maximum-arity rules, we try the rules; see Note [Trying rewrite rules].
+
+* Once we have tried rules (or immediately if there are no rules) set
+  ai_rewrite to TryInlining, and the Simplifier will try to inline the
+  function.  We want to try this immediately (before simplifying any (more)
+  arguments). Why? Consider
+      f BIG      where   f = \x{OneOcc}. ...x...
+  If we inline `f` before simplifying `BIG` well use preInlineUnconditionally,
+  and we'll simplify BIG once, at x's occurrence, rather than twice.
+
+
 Note [Avoid redundant simplification]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Because RULES apply to simplified arguments, there's a danger of repeatedly
@@ -2327,7 +2372,8 @@ See Note [No free join points in arityType] in GHC.Core.Opt.Arity
 -}
 
 tryRules :: SimplEnv -> [CoreRule]
-         -> Id -> [ArgSpec]
+         -> Id
+         -> [ArgSpec]   -- In /normal, forward/ order
          -> SimplCont
          -> SimplM (Maybe (SimplEnv, CoreExpr, SimplCont))
 
@@ -3240,19 +3286,21 @@ zapIdOccInfoAndSetEvald str v =
                          -- see Note [Case alternative occ info]
 
 addAltUnfoldings :: SimplEnv -> Maybe OutExpr -> OutId -> OutExpr -> SimplM SimplEnv
-addAltUnfoldings env scrut case_bndr con_app
+addAltUnfoldings env mb_scrut case_bndr con_app
   = do { let con_app_unf = mk_simple_unf con_app
              env1 = addBinderUnfolding env case_bndr con_app_unf
 
              -- See Note [Add unfolding for scrutinee]
-             env2 | Many <- idMult case_bndr = case scrut of
-                      Just (Var v)           -> addBinderUnfolding env1 v con_app_unf
-                      Just (Cast (Var v) co) -> addBinderUnfolding env1 v $
-                                                mk_simple_unf (Cast con_app (mkSymCo co))
-                      _                      -> env1
+             env2 | Just scrut <- mb_scrut
+                  , Just (v,mco) <- scrutBinderSwap_maybe scrut
+                  = addBinderUnfolding env1 v $
+                       if isReflMCo mco  -- isReflMCo: avoid calling mk_simple_unf
+                       then con_app_unf  --            twice in the common case
+                       else mk_simple_unf (mkCastMCo con_app mco)
+
                   | otherwise = env1
 
-       ; traceSmpl "addAltUnf" (vcat [ppr case_bndr <+> ppr scrut, ppr con_app])
+       ; traceSmpl "addAltUnf" (vcat [ppr case_bndr <+> ppr mb_scrut, ppr con_app])
        ; return env2 }
   where
     -- Force the opts, so that the whole SimplEnv isn't retained
@@ -3315,9 +3363,6 @@ it's also good for case-elimination -- suppose that 'f' was inlined
 and did multi-level case analysis, then we'd solve it in one
 simplifier sweep instead of two.
 
-Exactly the same issue arises in GHC.Core.Opt.SpecConstr;
-see Note [Add scrutinee to ValueEnv too] in GHC.Core.Opt.SpecConstr
-
 HOWEVER, given
   case x of y { Just a -> r1; Nothing -> r2 }
 we do not want to add the unfolding x -> y to 'x', which might seem cool,
@@ -3328,8 +3373,11 @@ piece of information.
 So instead we add the unfolding x -> Just a, and x -> Nothing in the
 respective RHSs.
 
-Since this transformation is tantamount to a binder swap, the same caveat as in
-Note [Suppressing binder-swaps on linear case] in OccurAnal apply.
+Since this transformation is tantamount to a binder swap, we use
+GHC.Core.Opt.OccurAnal.scrutBinderSwap_maybe to do the check.
+
+Exactly the same issue arises in GHC.Core.Opt.SpecConstr;
+see Note [Add scrutinee to ValueEnv too] in GHC.Core.Opt.SpecConstr
 
 
 ************************************************************************
@@ -3668,7 +3716,7 @@ mkDupableStrictBind env arg_bndr join_rhs res_ty
   | otherwise
   = do { join_bndr <- newJoinId [arg_bndr] res_ty
        ; let arg_info = ArgInfo { ai_fun   = join_bndr
-                                , ai_rules = Nothing, ai_args  = []
+                                , ai_rewrite = TryNothing, ai_args  = []
                                 , ai_encl  = False, ai_dmds  = repeat topDmd
                                 , ai_discs = repeat 0 }
        ; return ( addJoinFloats (emptyFloats env) $


=====================================
compiler/GHC/Core/Opt/Simplify/Utils.hs
=====================================
@@ -30,9 +30,10 @@ module GHC.Core.Opt.Simplify.Utils (
         interestingCallContext,
 
         -- ArgInfo
-        ArgInfo(..), ArgSpec(..), mkArgInfo,
+        ArgInfo(..), ArgSpec(..), RewriteCall(..), mkArgInfo,
         addValArgTo, addCastTo, addTyArgTo,
-        argInfoExpr, argInfoAppArgs, pushSimplifiedArgs,
+        argInfoExpr, argInfoAppArgs,
+        pushSimplifiedArgs, pushSimplifiedRevArgs,
         isStrictArgInfo, lazyArgContext,
 
         abstractFloats,
@@ -52,6 +53,7 @@ import GHC.Core.Ppr
 import GHC.Core.TyCo.Ppr ( pprParendType )
 import GHC.Core.FVs
 import GHC.Core.Utils
+import GHC.Core.Rules( getRules )
 import GHC.Core.Opt.Arity
 import GHC.Core.Unfold
 import GHC.Core.Unfold.Make
@@ -210,6 +212,7 @@ data SimplCont
 
 type StaticEnv = SimplEnv       -- Just the static part is relevant
 
+-- See Note [DupFlag invariants]
 data DupFlag = NoDup       -- Unsimplified, might be big
              | Simplified  -- Simplified
              | OkToDup     -- Simplified and small
@@ -226,8 +229,9 @@ perhapsSubstTy dup env ty
 {- Note [StaticEnv invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 We pair up an InExpr or InAlts with a StaticEnv, which establishes the
-lexical scope for that InExpr.  When we simplify that InExpr/InAlts, we
-use
+lexical scope for that InExpr.
+
+When we simplify that InExpr/InAlts, we use
   - Its captured StaticEnv
   - Overriding its InScopeSet with the larger one at the
     simplification point.
@@ -244,13 +248,14 @@ isn't big enough.
 
 Note [DupFlag invariants]
 ~~~~~~~~~~~~~~~~~~~~~~~~~
-In both (ApplyToVal dup _ env k)
-   and  (Select dup _ _ env k)
+In both ApplyToVal { se_dup = dup, se_env = env, se_cont = k}
+   and  Select { se_dup = dup, se_env = env, se_cont = k}
 the following invariants hold
 
   (a) if dup = OkToDup, then continuation k is also ok-to-dup
-  (b) if dup = OkToDup or Simplified, the subst-env is empty
-      (and hence no need to re-simplify)
+  (b) if dup = OkToDup or Simplified, the subst-env is empty,
+               or at least is always ignored; the payload is
+               already an OutThing
 -}
 
 instance Outputable DupFlag where
@@ -309,7 +314,8 @@ data ArgInfo
         ai_fun   :: OutId,      -- The function
         ai_args  :: [ArgSpec],  -- ...applied to these args (which are in *reverse* order)
 
-        ai_rules :: FunRules,   -- Rules for this function
+        ai_rewrite :: RewriteCall,  -- What transformation to try next for this call
+             -- See Note [Rewrite rules and inlining] in GHC.Core.Opt.Simplify.Iteration
 
         ai_encl :: Bool,        -- Flag saying whether this function
                                 -- or an enclosing one has rules (recursively)
@@ -325,6 +331,12 @@ data ArgInfo
                                 --   Always infinite
     }
 
+data RewriteCall  -- What rewriting to try next for this call
+                  -- See Note [Rewrite rules and inlining] in GHC.Core.Opt.Simplify.Iteration
+  = TryRules FullArgCount [CoreRule]
+  | TryInlining
+  | TryNothing
+
 data ArgSpec
   = ValArg { as_dmd  :: Demand        -- Demand placed on this argument
            , as_arg  :: OutExpr       -- Apply to this (coercion or value); c.f. ApplyToVal
@@ -349,20 +361,20 @@ instance Outputable ArgSpec where
 
 addValArgTo :: ArgInfo ->  OutExpr -> OutType -> ArgInfo
 addValArgTo ai arg hole_ty
-  | ArgInfo { ai_dmds = dmd:dmds, ai_discs = _:discs, ai_rules = rules } <- ai
+  | ArgInfo { ai_dmds = dmd:dmds, ai_discs = _:discs, ai_rewrite = rew } <- ai
       -- Pop the top demand and and discounts off
   , let arg_spec = ValArg { as_arg = arg, as_hole_ty = hole_ty, as_dmd = dmd }
-  = ai { ai_args  = arg_spec : ai_args ai
-       , ai_dmds  = dmds
-       , ai_discs = discs
-       , ai_rules = decRules rules }
+  = ai { ai_args    = arg_spec : ai_args ai
+       , ai_dmds    = dmds
+       , ai_discs   = discs
+       , ai_rewrite = decArgCount rew }
   | otherwise
   = pprPanic "addValArgTo" (ppr ai $$ ppr arg)
     -- There should always be enough demands and discounts
 
 addTyArgTo :: ArgInfo -> OutType -> OutType -> ArgInfo
-addTyArgTo ai arg_ty hole_ty = ai { ai_args = arg_spec : ai_args ai
-                                  , ai_rules = decRules (ai_rules ai) }
+addTyArgTo ai arg_ty hole_ty = ai { ai_args    = arg_spec : ai_args ai
+                                  , ai_rewrite = decArgCount (ai_rewrite ai) }
   where
     arg_spec = TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty }
 
@@ -381,19 +393,22 @@ argInfoAppArgs (CastBy {}                : _)  = []  -- Stop at a cast
 argInfoAppArgs (ValArg { as_arg = arg }  : as) = arg     : argInfoAppArgs as
 argInfoAppArgs (TyArg { as_arg_ty = ty } : as) = Type ty : argInfoAppArgs as
 
-pushSimplifiedArgs :: SimplEnv -> [ArgSpec] -> SimplCont -> SimplCont
-pushSimplifiedArgs _env []           k = k
-pushSimplifiedArgs env  (arg : args) k
-  = case arg of
-      TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty }
-               -> ApplyToTy  { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = rest }
-      ValArg { as_arg = arg, as_hole_ty = hole_ty }
-             -> ApplyToVal { sc_arg = arg, sc_env = env, sc_dup = Simplified
-                           , sc_hole_ty = hole_ty, sc_cont = rest }
-      CastBy c -> CastIt c rest
-  where
-    rest = pushSimplifiedArgs env args k
-           -- The env has an empty SubstEnv
+pushSimplifiedArgs, pushSimplifiedRevArgs
+  :: SimplEnv
+  -> [ArgSpec]   -- In normal, forward order for pushSimplifiedArgs,
+                 -- in /reverse/ order for pushSimplifiedRevArgs
+  -> SimplCont -> SimplCont
+pushSimplifiedArgs    env args cont = foldr  (pushSimplifiedArg env)             cont args
+pushSimplifiedRevArgs env args cont = foldl' (\k a -> pushSimplifiedArg env a k) cont args
+
+pushSimplifiedArg :: SimplEnv -> ArgSpec -> SimplCont -> SimplCont
+pushSimplifiedArg _env (TyArg { as_arg_ty = arg_ty, as_hole_ty = hole_ty }) cont
+  = ApplyToTy  { sc_arg_ty = arg_ty, sc_hole_ty = hole_ty, sc_cont = cont }
+pushSimplifiedArg env (ValArg { as_arg = arg, as_hole_ty = hole_ty }) cont
+  = ApplyToVal { sc_arg = arg, sc_env = env, sc_dup = Simplified
+                 -- The SubstEnv will be ignored since sc_dup=Simplified
+               , sc_hole_ty = hole_ty, sc_cont = cont }
+pushSimplifiedArg _ (CastBy c) cont = CastIt c cont
 
 argInfoExpr :: OutId -> [ArgSpec] -> OutExpr
 -- NB: the [ArgSpec] is reversed so that the first arg
@@ -406,18 +421,14 @@ argInfoExpr fun rev_args
     go (TyArg { as_arg_ty = ty } : as) = go as `App` Type ty
     go (CastBy co                : as) = mkCast (go as) co
 
+decArgCount :: RewriteCall -> RewriteCall
+decArgCount (TryRules n rules) = TryRules (n-1) rules
+decArgCount rew                = rew
 
-type FunRules = Maybe (Int, [CoreRule]) -- Remaining rules for this function
-     -- Nothing => No rules
-     -- Just (n, rules) => some rules, requiring at least n more type/value args
-
-decRules :: FunRules -> FunRules
-decRules (Just (n, rules)) = Just (n-1, rules)
-decRules Nothing           = Nothing
-
-mkFunRules :: [CoreRule] -> FunRules
-mkFunRules [] = Nothing
-mkFunRules rs = Just (n_required, rs)
+mkTryRules :: [CoreRule] -> RewriteCall
+-- See Note [Rewrite rules and inlining] in GHC.Core.Opt.Simplify.Iteration
+mkTryRules [] = TryInlining
+mkTryRules rs = TryRules n_required rs
   where
     n_required = maximum (map ruleArity rs)
 
@@ -516,6 +527,7 @@ contHoleScaling (StrictArg { sc_fun_ty = fun_ty, sc_cont = k })
 contHoleScaling (ApplyToTy { sc_cont = k }) = contHoleScaling k
 contHoleScaling (ApplyToVal { sc_cont = k }) = contHoleScaling k
 contHoleScaling (TickIt _ k) = contHoleScaling k
+
 -------------------
 countArgs :: SimplCont -> Int
 -- Count all arguments, including types, coercions,
@@ -525,6 +537,14 @@ countArgs (ApplyToVal { sc_cont = cont }) = 1 + countArgs cont
 countArgs (CastIt _ cont)                 = countArgs cont
 countArgs _                               = 0
 
+countValArgs :: SimplCont -> Int
+-- Count value arguments only
+countValArgs (ApplyToTy  { sc_cont = cont }) = 1 + countValArgs cont
+countValArgs (ApplyToVal { sc_cont = cont }) = 1 + countValArgs cont
+countValArgs (CastIt _ cont)                 = countValArgs cont
+countValArgs _                               = 0
+
+-------------------
 contArgs :: SimplCont -> (Bool, [ArgSummary], SimplCont)
 -- Summarises value args, discards type args and coercions
 -- The returned continuation of the call is only used to
@@ -579,29 +599,26 @@ contEvalContext k = case k of
     -- and case binder dmds, see addCaseBndrDmd. No priority right now.
 
 -------------------
-mkArgInfo :: SimplEnv
-          -> Id
-          -> [CoreRule] -- Rules for function
-          -> Int        -- Number of value args
-          -> SimplCont  -- Context of the call
-          -> ArgInfo
-
-mkArgInfo env fun rules n_val_args call_cont
+mkArgInfo :: SimplEnv -> RuleEnv -> Id -> SimplCont -> ArgInfo
+
+mkArgInfo env rule_base fun cont
   | n_val_args < idArity fun            -- Note [Unsaturated functions]
   = ArgInfo { ai_fun = fun, ai_args = []
-            , ai_rules = fun_rules
+            , ai_rewrite = fun_rules
             , ai_encl = False
             , ai_dmds = vanilla_dmds
             , ai_discs = vanilla_discounts }
   | otherwise
   = ArgInfo { ai_fun   = fun
             , ai_args  = []
-            , ai_rules = fun_rules
-            , ai_encl  = interestingArgContext rules call_cont
+            , ai_rewrite = fun_rules
+            , ai_encl  = notNull rules || contHasRules cont
             , ai_dmds  = add_type_strictness (idType fun) arg_dmds
             , ai_discs = arg_discounts }
   where
-    fun_rules = mkFunRules rules
+    rules      = getRules rule_base fun
+    fun_rules  = mkTryRules rules
+    n_val_args = countValArgs cont
 
     vanilla_discounts, arg_discounts :: [Int]
     vanilla_discounts = repeat 0
@@ -814,7 +831,7 @@ interestingCallContext env cont
         -- a build it's *great* to inline it here.  So we must ensure that
         -- the context for (f x) is not totally uninteresting.
 
-interestingArgContext :: [CoreRule] -> SimplCont -> Bool
+contHasRules :: SimplCont -> Bool
 -- If the argument has form (f x y), where x,y are boring,
 -- and f is marked INLINE, then we don't want to inline f.
 -- But if the context of the argument is
@@ -822,33 +839,29 @@ interestingArgContext :: [CoreRule] -> SimplCont -> Bool
 -- where g has rules, then we *do* want to inline f, in case it
 -- exposes a rule that might fire.  Similarly, if the context is
 --      h (g (f x x))
--- where h has rules, then we do want to inline f; hence the
--- call_cont argument to interestingArgContext
+-- where h has rules, then we do want to inline f.  So contHasRules
+-- tries to see if the context of the f-call is a call to a function
+-- with rules.
 --
--- The ai-rules flag makes this happen; if it's
+-- The ai_encl flag makes this happen; if it's
 -- set, the inliner gets just enough keener to inline f
 -- regardless of how boring f's arguments are, if it's marked INLINE
 --
 -- The alternative would be to *always* inline an INLINE function,
 -- regardless of how boring its context is; but that seems overkill
 -- For example, it'd mean that wrapper functions were always inlined
---
--- The call_cont passed to interestingArgContext is the context of
--- the call itself, e.g. g <hole> in the example above
-interestingArgContext rules call_cont
-  = notNull rules || enclosing_fn_has_rules
+contHasRules cont
+  = go cont
   where
-    enclosing_fn_has_rules = go call_cont
-
-    go (Select {})                  = False
-    go (ApplyToVal {})              = False  -- Shouldn't really happen
-    go (ApplyToTy  {})              = False  -- Ditto
-    go (StrictArg { sc_fun = fun }) = ai_encl fun
-    go (StrictBind {})              = False      -- ??
-    go (CastIt _ c)                 = go c
-    go (Stop _ RuleArgCtxt _)       = True
-    go (Stop _ _ _)                 = False
-    go (TickIt _ c)                 = go c
+    go (ApplyToVal { sc_cont = cont }) = go cont
+    go (ApplyToTy  { sc_cont = cont }) = go cont
+    go (CastIt _ cont)                 = go cont
+    go (StrictArg { sc_fun = fun })    = ai_encl fun
+    go (Stop _ RuleArgCtxt _)          = True
+    go (TickIt _ c)                    = go c
+    go (Select {})                     = False
+    go (StrictBind {})                 = False      -- ??
+    go (Stop _ _ _)                    = False
 
 {- Note [Interesting arguments]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/Opt/SpecConstr.hs
=====================================
@@ -35,6 +35,7 @@ import GHC.Core.Unfold
 import GHC.Core.FVs     ( exprsFreeVarsList, exprFreeVars )
 import GHC.Core.Opt.Monad
 import GHC.Core.Opt.WorkWrap.Utils
+import GHC.Core.Opt.OccurAnal( scrutBinderSwap_maybe )
 import GHC.Core.DataCon
 import GHC.Core.Class( classTyVars )
 import GHC.Core.Coercion hiding( substCo )
@@ -1073,8 +1074,8 @@ extendCaseBndrs env scrut case_bndr con alt_bndrs
    = (env2, alt_bndrs')
  where
    live_case_bndr = not (isDeadBinder case_bndr)
-   env1 | Var v <- stripTicksTopE (const True) scrut
-                         = extendValEnv env v cval
+   env1 | Just (v, mco) <- scrutBinderSwap_maybe scrut
+        , isReflMCo mco  = extendValEnv env v cval
         | otherwise      = env  -- See Note [Add scrutinee to ValueEnv too]
    env2 | live_case_bndr = extendValEnv env1 case_bndr cval
         | otherwise      = env1
@@ -1164,6 +1165,10 @@ though the simplifier has systematically replaced uses of 'x' with 'y'
 and 'b' with 'c' in the code.  The use of 'b' in the ValueEnv came
 from outside the case.  See #4908 for the live example.
 
+It's very like the binder-swap story, so we use scrutBinderSwap_maybe
+to identify suitable scrutinees -- but only if there is no cast
+(isReflMCo) because that's all that the ValueEnv allows.
+
 Note [Avoiding exponential blowup]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The sc_count field of the ScEnv says how many times we are prepared to


=====================================
compiler/GHC/Core/Opt/Specialise.hs
=====================================
@@ -15,9 +15,7 @@ import GHC.Driver.Config
 import GHC.Driver.Config.Diagnostic
 import GHC.Driver.Config.Core.Rules ( initRuleOpts )
 
-import GHC.Tc.Utils.TcType hiding( substTy )
-
-import GHC.Core.Type  hiding( substTy, extendTvSubstList, zapSubst )
+import GHC.Core.Type  hiding( substTy, substCo, extendTvSubstList, zapSubst )
 import GHC.Core.Multiplicity
 import GHC.Core.Predicate
 import GHC.Core.Coercion( Coercion )
@@ -25,12 +23,15 @@ import GHC.Core.Opt.Monad
 import qualified GHC.Core.Subst as Core
 import GHC.Core.Unfold.Make
 import GHC.Core
+import GHC.Core.Make      ( mkLitRubbish )
+import GHC.Core.Unify     ( tcMatchTy )
 import GHC.Core.Rules
 import GHC.Core.Utils     ( exprIsTrivial
                           , mkCast, exprType
                           , stripTicksTop )
 import GHC.Core.FVs
-import GHC.Core.TyCo.Rep (TyCoBinder (..))
+import GHC.Core.TyCo.Rep ( TyCoBinder (..) )
+import GHC.Core.TyCo.FVs ( tyCoVarsOfTypeList )
 import GHC.Core.Opt.Arity( collectBindersPushingCo )
 
 import GHC.Builtin.Types  ( unboxedUnitTy )
@@ -531,6 +532,48 @@ like
         f :: Eq [(a,b)] => ...
 
 
+Note [Specialisation and overlapping instances]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Here is at tricky case (see a comment in MR !8916):
+
+    module A where
+      class C a where
+        meth :: a -> String
+      instance {-# OVERLAPPABLE #-} C (Maybe a) where
+        meth _ = "Maybe"
+
+      {-# SPECIALISE f :: Maybe a -> Bool -> String #-}
+      f :: C a => a -> Bool -> String
+      f a True = f a False
+      f a _    = meth a
+
+    module B where
+      import A
+
+      instance C (Maybe Int) where
+        meth _ = "Int"
+
+      main = putStrLn $ f (Just 42 :: Maybe Int) True
+
+Running main without optimisations yields "Int", the correct answer.
+Activating optimisations yields "Maybe" due to a rewrite rule in module
+A generated by the SPECIALISE pragma:
+
+    RULE "USPEC f" forall a (d :: C a). f @a d = $sf
+
+In B we get the call (f @(Maybe Int) (d :: C (Maybe Int))), and
+that rewrites to $sf, but that isn't really right.
+
+Overlapping instances mean that `C (Maybe Int)` is not a singleton
+type: there two distinct dictionaries that have this type.  And that
+spells trouble for specialistion, which really asssumes singleton
+types.
+
+For now, we just accept this problem, but it may bite us one day.
+One solution would be to decline to expose any specialisation rules
+to an importing module -- but that seems a bit drastic.
+
+
 ************************************************************************
 *                                                                      *
 \subsubsection{The new specialiser}
@@ -802,8 +845,12 @@ spec_import top_env callers rb dict_binds cis@(CIS fn _)
 
 canSpecImport :: DynFlags -> Id -> Maybe CoreExpr
 canSpecImport dflags fn
+  | isDataConWrapId fn
+  = Nothing   -- Don't specialise data-con wrappers, even if they
+              -- have dict args; there is no benefit.
+
   | CoreUnfolding { uf_tmpl = rhs } <- unf
-    -- See Note [Specialising imported functions] point (1).
+    -- CoreUnfolding: see Note [Specialising imported functions] point (1).
   , isAnyInlinePragma (idInlinePragma fn)
     -- See Note [Specialising imported functions] point (2).
   = Just rhs
@@ -1506,12 +1553,12 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
 
   | otherwise   -- No calls or RHS doesn't fit our preconceptions
   = warnPprTrace (not (exprIsTrivial rhs) && notNull calls_for_me)
-          "Missed specialisation opportunity" (ppr fn $$ _trace_doc) $
+          "Missed specialisation opportunity for" (ppr fn $$ trace_doc) $
           -- Note [Specialisation shape]
     -- pprTrace "specCalls: none" (ppr fn <+> ppr calls_for_me) $
     return ([], [], emptyUDs)
   where
-    _trace_doc = sep [ ppr rhs_bndrs, ppr (idInlineActivation fn) ]
+    trace_doc = sep [ ppr rhs_bndrs, ppr (idInlineActivation fn) ]
 
     fn_type   = idType fn
     fn_arity  = idArity fn
@@ -1575,8 +1622,16 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
              else
         do { -- Run the specialiser on the specialised RHS
              -- The "1" suffix is before we maybe add the void arg
-           ; (spec_rhs1, rhs_uds) <- specLam rhs_env2 (spec_bndrs1 ++ leftover_bndrs) rhs_body
-           ; let spec_fn_ty1 = exprType spec_rhs1
+           ; (rhs_body', rhs_uds) <- specExpr rhs_env2 rhs_body
+                -- Add the { d1' = dx1; d2' = dx2 } usage stuff
+                -- to the rhs_uds; see Note [Specialising Calls]
+           ; let rhs_uds_w_dx = foldr consDictBind rhs_uds dx_binds
+                 spec_rhs_bndrs  = spec_bndrs1 ++ leftover_bndrs
+                 (spec_uds, dumped_dbs) = dumpUDs spec_rhs_bndrs rhs_uds_w_dx
+                 spec_rhs1 = mkLams spec_rhs_bndrs $
+                             wrapDictBindsE dumped_dbs rhs_body'
+
+                 spec_fn_ty1 = exprType spec_rhs1
 
                  -- Maybe add a void arg to the specialised function,
                  -- to avoid unlifted bindings
@@ -1610,10 +1665,6 @@ specCalls spec_imp env dict_binds existing_rules calls_for_me fn rhs
                                     herald fn rule_bndrs rule_lhs_args
                                     (mkVarApps (Var spec_fn) spec_bndrs)
 
-                -- Add the { d1' = dx1; d2' = dx2 } usage stuff
-                -- See Note [Specialising Calls]
-                spec_uds = foldr consDictBind rhs_uds dx_binds
-
                 simpl_opts = initSimpleOpts dflags
 
                 --------------------------------------
@@ -1796,11 +1847,23 @@ in the specialisation:
     {-# RULE "SPEC f @Int" forall x. f @Int x $dShow = $sf #-}
 
 This doesn’t save us much, since the arg would be removed later by
-worker/wrapper, anyway, but it’s easy to do. Note, however, that we
-only drop dead arguments if:
+worker/wrapper, anyway, but it’s easy to do.
 
-  1. We don’t specialise on them.
-  2. They come before an argument we do specialise on.
+Wrinkles
+
+* Note that we only drop dead arguments if:
+    1. We don’t specialise on them.
+    2. They come before an argument we do specialise on.
+  Doing the latter would require eta-expanding the RULE, which could
+  make it match less often, so it’s not worth it. Doing the former could
+  be more useful --- it would stop us from generating pointless
+  specialisations --- but it’s more involved to implement and unclear if
+  it actually provides much benefit in practice.
+
+* If the function has a stable unfolding, specHeader has to come up with
+  arguments to pass to that stable unfolding, when building the stable
+  unfolding of the specialised function: this is the last field in specHeader's
+  big result tuple.
 
   The right thing to do is to produce a LitRubbish; it should rapidly
   disappear.  Rather like GHC.Core.Opt.WorkWrap.Utils.mk_absent_let.
@@ -2251,11 +2314,11 @@ instance Outputable SpecArg where
   ppr (SpecDict d) = text "SpecDict" <+> ppr d
   ppr UnspecArg    = text "UnspecArg"
 
-specArgFreeVars :: SpecArg -> VarSet
-specArgFreeVars (SpecType ty) = tyCoVarsOfType ty
-specArgFreeVars (SpecDict dx) = exprFreeVars dx
-specArgFreeVars UnspecType    = emptyVarSet
-specArgFreeVars UnspecArg     = emptyVarSet
+specArgFreeIds :: SpecArg -> IdSet
+specArgFreeIds (SpecType {}) = emptyVarSet
+specArgFreeIds (SpecDict dx) = exprFreeIds dx
+specArgFreeIds UnspecType    = emptyVarSet
+specArgFreeIds UnspecArg     = emptyVarSet
 
 isSpecDict :: SpecArg -> Bool
 isSpecDict (SpecDict {}) = True
@@ -2325,24 +2388,30 @@ specHeader
               , [OutBndr]    -- Binders for $sf
               , [DictBind]   -- Auxiliary dictionary bindings
               , [OutExpr]    -- Specialised arguments for unfolding
-                             -- Same length as "args for LHS of rule"
+                             -- Same length as "Args for LHS of rule"
               )
 
 -- We want to specialise on type 'T1', and so we must construct a substitution
 -- 'a->T1', as well as a LHS argument for the resulting RULE and unfolding
 -- details.
-specHeader env (bndr : bndrs) (SpecType t : args)
-  = do { let env' = extendTvSubstList env [(bndr, t)]
-       ; (useful, env'', leftover_bndrs, rule_bs, rule_es, bs', dx, spec_args)
-            <- specHeader env' bndrs args
+specHeader env (bndr : bndrs) (SpecType ty : args)
+  = do { let in_scope = Core.getSubstInScope (se_subst env)
+             qvars    = scopedSort $
+                        filterOut (`elemInScopeSet` in_scope) $
+                        tyCoVarsOfTypeList ty
+             (env1, qvars') = substBndrs env qvars
+             ty'            = substTy env1 ty
+             env2           = extendTvSubstList env1 [(bndr, ty')]
+       ; (useful, env3, leftover_bndrs, rule_bs, rule_es, bs', dx, spec_args)
+            <- specHeader env2 bndrs args
        ; pure ( useful
-              , env''
+              , env3
               , leftover_bndrs
-              , rule_bs
-              , Type t : rule_es
-              , bs'
+              , qvars' ++ rule_bs
+              , Type ty' : rule_es
+              , qvars' ++ bs'
               , dx
-              , Type t : spec_args
+              , Type ty' : spec_args
               )
        }
 
@@ -2369,6 +2438,7 @@ specHeader env (bndr : bndrs) (UnspecType : args)
 -- a wildcard binder to match the dictionary (See Note [Specialising Calls] for
 -- the nitty-gritty), as a LHS rule and unfolding details.
 specHeader env (bndr : bndrs) (SpecDict d : args)
+  | not (isDeadBinder bndr)
   = do { (env1, bndr') <- newDictBndr env bndr -- See Note [Zap occ info in rule binders]
        ; let (env2, dx_bind, spec_dict) = bindAuxiliaryDict env1 bndr bndr' d
        ; (_, env3, leftover_bndrs, rule_bs, rule_es, bs', dx, spec_args)
@@ -2385,29 +2455,44 @@ specHeader env (bndr : bndrs) (SpecDict d : args)
               )
        }
 
--- Finally, we have the unspecialised argument 'i'. We need to produce
--- a binder, LHS and RHS argument for the RULE, and a binder for the
--- specialised body.
+-- Finally, we don't want to specialise on this argument 'i':
+--   - It's an UnSpecArg, or
+--   - It's a dead dictionary
+-- We need to produce a binder, LHS and RHS argument for the RULE, and
+-- a binder for the specialised body.
 --
 -- NB: Calls to 'specHeader' will trim off any trailing 'UnspecArg's, which is
 -- why 'i' doesn't appear in our RULE above. But we have no guarantee that
 -- there aren't 'UnspecArg's which come /before/ all of the dictionaries, so
 -- this case must be here.
-specHeader env (bndr : bndrs) (UnspecArg : args)
+specHeader env (bndr : bndrs) (_ : args)
+    -- The "_" can be UnSpecArg, or SpecDict where the bndr is dead
   = do { -- see Note [Zap occ info in rule binders]
          let (env', bndr') = substBndr env (zapIdOccInfo bndr)
        ; (useful, env'', leftover_bndrs, rule_bs, rule_es, bs', dx, spec_args)
              <- specHeader env' bndrs args
+
+       ; let bndr_ty = idType bndr'
+
+             -- See Note [Drop dead args from specialisations]
+             -- C.f. GHC.Core.Opt.WorkWrap.Utils.mk_absent_let
+             (mb_spec_bndr, spec_arg)
+                | isDeadBinder bndr
+                , Just lit_expr <- mkLitRubbish bndr_ty
+                = (Nothing, lit_expr)
+                | otherwise
+                = (Just bndr', varToCoreExpr bndr')
+
        ; pure ( useful
               , env''
               , leftover_bndrs
               , bndr' : rule_bs
               , varToCoreExpr bndr' : rule_es
-              , if isDeadBinder bndr
-                  then bs' -- see Note [Drop dead args from specialisations]
-                  else bndr' : bs'
+              , case mb_spec_bndr of
+                  Just b' -> b' : bs'
+                  Nothing -> bs'
               , dx
-              , varToCoreExpr bndr' : spec_args
+              , spec_arg : spec_args
               )
        }
 
@@ -2533,6 +2618,88 @@ successfully specialise 'f'.
 
 So the DictBinds in (ud_binds :: OrdList DictBind) may contain
 non-dictionary bindings too.
+
+Note [Specialising polymorphic dictionaries]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider
+    class M a where { foo :: a -> Int }
+
+    instance M (ST s) where ...
+    -- dMST :: forall s. M (ST s)
+
+    wimwam :: forall a. M a => a -> Int
+    wimwam = /\a \(d::M a). body
+
+    f :: ST s -> Int
+    f = /\s \(x::ST s). wimwam @(ST s) (dMST @s) dx + 1
+
+We'd like to specialise wimwam at (ST s), thus
+    $swimwam :: forall s. ST s -> Int
+    $swimwam = /\s. body[ST s/a, (dMST @s)/d]
+
+    RULE forall s (d :: M (ST s)).
+         wimwam @(ST s) d = $swimwam @s
+
+Here are the moving parts:
+
+* We must /not/ dump the CallInfo
+    CIS wimwam (CI { ci_key = [@(ST s), dMST @s]
+                   , ci_fvs = {dMST} })
+  when we come to the /\s.  Instead, we simply let it continue to float
+  upwards. Hence ci_fvs is an IdSet, listing the /Ids/ that
+  are free in the call, but not the /TyVars/.  Hence using specArgFreeIds
+  in singleCall.
+
+  NB to be fully kosher we should explicitly quantifying the CallInfo
+  over 's', but we don't bother.  This would matter if there was an
+  enclosing binding of the same 's', which I don't expect to happen.
+
+* Whe we come to specialise the call, we must remember to quantify
+  over 's'.  That is done in the SpecType case of specHeader, where
+  we add 's' (called qvars) to the binders of the RULE and the specialised
+  function.
+
+* If we have f :: forall m. Monoid m => blah, and two calls
+     (f @(Endo b)      (d :: Monoid (Endo b))
+     (f @(Endo (c->c)) (d :: Monoid (Endo (c->c)))
+  we want to generate a specialisation only for the first.  The second
+  is just a substitution instance of the first, with no greater specialisation.
+  Hence the call to `remove_dups` in `filterCalls`.
+
+All this arose in #13873, in the unexpected form that a SPECIALISE
+pragma made the program slower!  The reason was that the specialised
+function $sinsertWith arising from the pragma looked rather like `f`
+above, and failed to specialise a call in its body like wimwam.
+Without the pragma, the original call to `insertWith` was completely
+monomorpic, and specialised in one go.
+
+Wrinkles.
+
+* With -XOverlappingInstances you might worry about this:
+    class C a where ...
+    instance C (Maybe Int) where ...   -- $df1 :: C (Maybe Int)
+    instance C (Maybe a)   where ...   -- $df2 :: forall a. C (Maybe a)
+
+    f :: C a => blah
+    f = rhs
+
+    g = /\a.  ...(f @(Maybe a) ($df2 a))...
+    h = ...f @(Maybe Int) $df1
+
+  There are two calls to f, but with different evidence.  This patch will
+  combine them into one.  But it's OK: this code will never arise unless you
+  use -XIncoherentInstances.  Even with -XOverlappingInstances, GHC tries hard
+  to keep dictionaries as singleton types.  But that goes out of the window
+  with -XIncoherentInstances -- and that is true even with ordianry type-class
+  specialisation (at least if any inlining has taken place).
+
+  GHC makes very few guarantees when you use -XIncoherentInstances, and its
+  not worth crippling the normal case for the incoherent corner.  (The best
+  thing might be to switch off specialisation altogether if incoherence is
+  involved... but incoherence is a property of an instance, not a class, so
+  it's a hard test to make.)
+
+  But see Note [Specialisation and overlapping instances].
 -}
 
 instance Outputable DictBind where
@@ -2571,8 +2738,9 @@ data CallInfoSet = CIS Id (Bag CallInfo)
 data CallInfo
   = CI { ci_key  :: [SpecArg]   -- All arguments
        , ci_fvs  :: IdSet       -- Free Ids of the ci_key call
-                                -- _not_ including the main id itself, of course
+                                -- /not/ including the main id itself, of course
                                 -- NB: excluding tyvars:
+                                --     See Note [Specialising polymorphic dictionaries]
     }
 
 type DictExpr = CoreExpr
@@ -2621,7 +2789,7 @@ singleCall id args
                      unitBag (CI { ci_key  = args -- used to be tys
                                  , ci_fvs  = call_fvs }) }
   where
-    call_fvs = foldr (unionVarSet . specArgFreeVars) emptyVarSet args
+    call_fvs = foldr (unionVarSet . specArgFreeIds) emptyVarSet args
         -- The type args (tys) are guaranteed to be part of the dictionary
         -- types, because they are just the constrained types,
         -- and the dictionary is therefore sure to be bound
@@ -2951,15 +3119,15 @@ callsForMe fn uds at MkUD { ud_binds = orig_dbs, ud_calls = orig_calls }
 
 ----------------------
 filterCalls :: CallInfoSet -> FloatedDictBinds -> [CallInfo]
--- Remove dominated calls
+-- Remove dominated calls (Note [Specialising polymorphic dictionaries])
 -- and loopy DFuns (Note [Avoiding loops (DFuns)])
 filterCalls (CIS fn call_bag) (FDB { fdb_binds = dbs })
   | isDFunId fn  -- Note [Avoiding loops (DFuns)] applies only to DFuns
-  = filter ok_call unfiltered_calls
+  = filter ok_call de_dupd_calls
   | otherwise         -- Do not apply it to non-DFuns
-  = unfiltered_calls  -- See Note [Avoiding loops (non-DFuns)]
+  = de_dupd_calls  -- See Note [Avoiding loops (non-DFuns)]
   where
-    unfiltered_calls = bagToList call_bag
+    de_dupd_calls = remove_dups call_bag
 
     dump_set = foldl' go (unitVarSet fn) dbs
       -- This dump-set could also be computed by splitDictBinds
@@ -2973,6 +3141,29 @@ filterCalls (CIS fn call_bag) (FDB { fdb_binds = dbs })
 
     ok_call (CI { ci_fvs = fvs }) = fvs `disjointVarSet` dump_set
 
+remove_dups :: Bag CallInfo -> [CallInfo]
+remove_dups calls = foldr add [] calls
+  where
+    add :: CallInfo -> [CallInfo] -> [CallInfo]
+    add ci [] = [ci]
+    add ci1 (ci2:cis) | ci2 `beats_or_same` ci1 = ci2:cis
+                      | ci1 `beats_or_same` ci2 = ci1:cis
+                      | otherwise               = ci2 : add ci1 cis
+
+beats_or_same :: CallInfo -> CallInfo -> Bool
+beats_or_same (CI { ci_key = args1 }) (CI { ci_key = args2 })
+  = go args1 args2
+  where
+    go [] _ = True
+    go (arg1:args1) (arg2:args2) = go_arg arg1 arg2 && go args1 args2
+    go (_:_)        []           = False
+
+    go_arg (SpecType ty1) (SpecType ty2) = isJust (tcMatchTy ty1 ty2)
+    go_arg UnspecType     UnspecType     = True
+    go_arg (SpecDict {})  (SpecDict {})  = True
+    go_arg UnspecArg      UnspecArg      = True
+    go_arg _              _              = False
+
 ----------------------
 splitDictBinds :: FloatedDictBinds -> IdSet -> (FloatedDictBinds, OrdList DictBind, IdSet)
 -- splitDictBinds dbs bndrs returns
@@ -3003,15 +3194,18 @@ splitDictBinds (FDB { fdb_binds = dbs, fdb_bndrs = bs }) bndr_set
 
 ----------------------
 deleteCallsMentioning :: VarSet -> CallDetails -> CallDetails
--- Remove calls *mentioning* bs in any way
-deleteCallsMentioning bs calls
+-- Remove calls mentioning any Id in bndrs
+-- NB: The call is allowed to mention TyVars in bndrs
+--     Note [Specialising polymorphic dictionaries]
+--     ci_fvs are just the free /Ids/
+deleteCallsMentioning bndrs calls
   = mapDVarEnv (ciSetFilter keep_call) calls
   where
-    keep_call (CI { ci_fvs = fvs }) = fvs `disjointVarSet` bs
+    keep_call (CI { ci_fvs = fvs }) = fvs `disjointVarSet` bndrs
 
 deleteCallsFor :: [Id] -> CallDetails -> CallDetails
--- Remove calls *for* bs
-deleteCallsFor bs calls = delDVarEnvList calls bs
+-- Remove calls *for* bndrs
+deleteCallsFor bndrs calls = delDVarEnvList calls bndrs
 
 {-
 ************************************************************************


=====================================
compiler/GHC/Core/Subst.hs
=====================================
@@ -26,7 +26,8 @@ module GHC.Core.Subst (
         extendIdSubstWithClone,
         extendSubst, extendSubstList, extendSubstWithVar,
         extendSubstInScope, extendSubstInScopeList, extendSubstInScopeSet,
-        isInScope, setInScope, extendTvSubst, extendCvSubst,
+        isInScope, setInScope, getSubstInScope,
+        extendTvSubst, extendCvSubst,
         delBndr, delBndrs, zapSubst,
 
         -- ** Substituting and cloning binders
@@ -41,7 +42,6 @@ import GHC.Core
 import GHC.Core.FVs
 import GHC.Core.Seq
 import GHC.Core.Utils
-import GHC.Core.TyCo.Subst ( substCo )
 
         -- We are defining local versions
 import GHC.Core.Type hiding ( substTy )


=====================================
compiler/GHC/Core/Type.hs
=====================================
@@ -219,7 +219,7 @@ module GHC.Core.Type (
         substTyAddInScope,
         substTyUnchecked, substTysUnchecked, substScaledTyUnchecked, substScaledTysUnchecked,
         substThetaUnchecked, substTyWithUnchecked,
-        substCoUnchecked, substCoWithUnchecked,
+        substCo, substCoUnchecked, substCoWithUnchecked,
         substTyVarBndr, substTyVarBndrs, substTyVar, substTyVars,
         substVarBndr, substVarBndrs,
         substTyCoBndr,


=====================================
compiler/GHC/Utils/Json.hs
=====================================
@@ -14,6 +14,7 @@ data JsonDoc where
   JSBool :: Bool -> JsonDoc
   JSInt  :: Int  -> JsonDoc
   JSString :: String -> JsonDoc
+    -- ^ The 'String' is unescaped
   JSArray :: [JsonDoc] -> JsonDoc
   JSObject :: [(String, JsonDoc)] -> JsonDoc
 
@@ -57,7 +58,7 @@ class ToJson a where
   json :: a -> JsonDoc
 
 instance ToJson String where
-  json = JSString . escapeJsonString
+  json = JSString
 
 instance ToJson Int where
   json = JSInt


=====================================
compiler/GHC/Utils/Monad.hs
=====================================
@@ -163,7 +163,10 @@ mapSndM = traverse . traverse
 -- | Monadic version of concatMap
 concatMapM :: (Monad m, Traversable f) => (a -> m [b]) -> f a -> m [b]
 concatMapM f xs = liftM concat (mapM f xs)
-{-# SPECIALIZE concatMapM :: (Monad m) => (a -> m [b]) -> [a] -> m [b] #-}
+{-# INLINE concatMapM #-}
+-- It's better to inline to inline this than to specialise
+--     concatMapM :: (Monad m) => (a -> m [b]) -> [a] -> m [b]
+-- Inlining cuts compiler allocation by around 1%
 
 -- | Applicative version of mapMaybe
 mapMaybeM :: Applicative m => (a -> m (Maybe b)) -> [a] -> m [b]


=====================================
testsuite/tests/lib/integer/Makefile
=====================================
@@ -11,8 +11,9 @@ CHECK2 = grep -q -- '$1' folding.simpl || \
 
 .PHONY: integerConstantFolding
 integerConstantFolding:
-	'$(TEST_HC)' -Wall -v0 -O --make integerConstantFolding -fforce-recomp -ddump-simpl > folding.simpl
+	'$(TEST_HC)' -Wall -v0 -O --make integerConstantFolding -fforce-recomp -ddump-simpl -dno-debug-output > folding.simpl
 # All the 100nnn values should be constant-folded away
+# -dno-debug-output suppresses a "Glomming" message
 	! grep -q '\<100[0-9][0-9][0-9]\>' folding.simpl || { echo "Unfolded values found"; grep '\<100[0-9][0-9][0-9]\>' folding.simpl; }
 	$(call CHECK,\<200007\>,plusInteger)
 	$(call CHECK,\<683234160\>,timesInteger)
@@ -64,8 +65,9 @@ IntegerConversionRules:
 
 .PHONY: naturalConstantFolding
 naturalConstantFolding:
-	'$(TEST_HC)' -Wall -v0 -O --make naturalConstantFolding -fforce-recomp -ddump-simpl > folding.simpl
+	'$(TEST_HC)' -Wall -v0 -O --make naturalConstantFolding -fforce-recomp -ddump-simpl -dno-debug-output > folding.simpl
 # All the 100nnn values should be constant-folded away
+# -dno-debug-output suppresses a "Glomming" message
 	! grep -q '\<100[0-9][0-9][0-9]\>' folding.simpl || { echo "Unfolded values found"; grep '\<100[0-9][0-9][0-9]\>' folding.simpl; }
 	# Bit arithmetic
 	$(call CHECK,\<532\>,andNatural)


=====================================
testsuite/tests/linters/notes.stdout
=====================================
@@ -2,7 +2,6 @@ ref    compiler/GHC/Core/Coercion/Axiom.hs:461:2:     Note [RoughMap and rm_empt
 ref    compiler/GHC/Core/Opt/OccurAnal.hs:857:15:     Note [Loop breaking]
 ref    compiler/GHC/Core/Opt/SetLevels.hs:1580:30:     Note [Top level scope]
 ref    compiler/GHC/Core/Opt/Simplify/Iteration.hs:2675:13:     Note [Case binder next]
-ref    compiler/GHC/Core/Opt/Simplify/Iteration.hs:3303:0:     Note [Suppressing binder-swaps on linear case]
 ref    compiler/GHC/Core/Opt/Simplify/Iteration.hs:3854:8:     Note [Lambda-bound unfoldings]
 ref    compiler/GHC/Core/Opt/Simplify/Utils.hs:1257:37:     Note [Gentle mode]
 ref    compiler/GHC/Core/Opt/Specialise.hs:1623:28:     Note [Arity decrease]


=====================================
testsuite/tests/numeric/should_compile/T19641.stderr
=====================================
@@ -3,30 +3,30 @@
 Result size of Tidy Core
   = {terms: 22, types: 20, coercions: 0, joins: 0/0}
 
-integer_to_int
+natural_to_word
   = \ eta ->
       case eta of {
-        IS ipv -> Just (I# ipv);
-        IP x1 -> Nothing;
-        IN ds -> Nothing
+        NS x1 -> Just (W# x1);
+        NB ds -> Nothing
       }
 
-natural_to_word
+integer_to_int
   = \ eta ->
       case eta of {
-        NS x1 -> Just (W# x1);
-        NB ds -> Nothing
+        IS ipv -> Just (I# ipv);
+        IP x1 -> Nothing;
+        IN ds -> Nothing
       }
 
 
 ------ Local rules for imported ids --------
-"SPEC/Test toIntegralSized @Natural @Word"
-    forall $dIntegral $dIntegral1 $dBits $dBits1.
-      toIntegralSized $dIntegral $dIntegral1 $dBits $dBits1
-      = natural_to_word
 "SPEC/Test toIntegralSized @Integer @Int"
     forall $dIntegral $dIntegral1 $dBits $dBits1.
       toIntegralSized $dIntegral $dIntegral1 $dBits $dBits1
       = integer_to_int
+"SPEC/Test toIntegralSized @Natural @Word"
+    forall $dIntegral $dIntegral1 $dBits $dBits1.
+      toIntegralSized $dIntegral $dIntegral1 $dBits $dBits1
+      = natural_to_word
 
 


=====================================
testsuite/tests/simplCore/should_compile/T21851.hs
=====================================
@@ -0,0 +1,15 @@
+{-# OPTIONS_GHC -ddump-simpl #-}
+
+module T21851 (g') where
+import T21851a
+
+g :: Num a => a -> a
+g x = fst (f x)
+{-# NOINLINE[99] g #-}
+
+g' :: Int -> Int
+g' = g
+
+-- We should see a call to a /specialised/ verion of `f`,
+-- something like
+-- g' = \ (x :: Int) -> case T21851a.$w$sf x of { (# ww, ww1 #) -> ww }


=====================================
testsuite/tests/simplCore/should_compile/T21851.stderr
=====================================
@@ -0,0 +1,19 @@
+[1 of 2] Compiling T21851a          ( T21851a.hs, T21851a.o )
+[2 of 2] Compiling T21851           ( T21851.hs, T21851.o )
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 7, types: 10, coercions: 0, joins: 0/0}
+
+-- RHS size: {terms: 6, types: 8, coercions: 0, joins: 0/0}
+g' :: Int -> Int
+[GblId,
+ Arity=1,
+ Str=<L>,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 30 0}]
+g'
+  = \ (x :: Int) -> case T21851a.$w$sf x of { (# ww, ww1 #) -> ww }
+
+
+


=====================================
testsuite/tests/simplCore/should_compile/T21851a.hs
=====================================
@@ -0,0 +1,5 @@
+module T21851a where
+
+f :: Num b => b -> (b, b) -- note: recursive to prevent inlining
+f x = (x + 1, snd (f x))  -- on such a small example
+{-# SPECIALIZE f :: Int -> (Int, Int) #-}


=====================================
testsuite/tests/simplCore/should_compile/T22097.hs
=====================================
@@ -0,0 +1,7 @@
+{-# OPTIONS_GHC -ddump-simpl #-}
+{-# LANGUAGE TypeApplications #-}
+module T22097 where
+import T22097a ( isEven )
+
+main :: IO ()
+main = print $ isEven @Int 10


=====================================
testsuite/tests/simplCore/should_compile/T22097.stderr
=====================================
@@ -0,0 +1,46 @@
+[1 of 2] Compiling T22097a          ( T22097a.hs, T22097a.o )
+[2 of 2] Compiling T22097           ( T22097.hs, T22097.o )
+
+==================== Tidy Core ====================
+Result size of Tidy Core
+  = {terms: 15, types: 14, coercions: 3, joins: 0/0}
+
+-- RHS size: {terms: 5, types: 1, coercions: 0, joins: 0/0}
+T22097.main2 :: String
+[GblId,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
+         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 30 0}]
+T22097.main2
+  = case T22097a.$wgoEven 10# of { (# #) -> GHC.Show.$fShowBool4 }
+
+-- RHS size: {terms: 6, types: 2, coercions: 0, joins: 0/0}
+T22097.main1
+  :: GHC.Prim.State# GHC.Prim.RealWorld
+     -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
+[GblId,
+ Arity=1,
+ Str=<L>,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 40 0}]
+T22097.main1
+  = \ (eta [OS=OneShot] :: GHC.Prim.State# GHC.Prim.RealWorld) ->
+      GHC.IO.Handle.Text.hPutStr2
+        GHC.IO.Handle.FD.stdout T22097.main2 GHC.Types.True eta
+
+-- RHS size: {terms: 1, types: 0, coercions: 3, joins: 0/0}
+main :: IO ()
+[GblId,
+ Arity=1,
+ Str=<L>,
+ Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
+         WorkFree=True, Expandable=True,
+         Guidance=ALWAYS_IF(arity=0,unsat_ok=True,boring_ok=True)}]
+main
+  = T22097.main1
+    `cast` (Sym (GHC.Types.N:IO[0] <()>_R)
+            :: (GHC.Prim.State# GHC.Prim.RealWorld
+                -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #))
+               ~R# IO ())
+
+
+


=====================================
testsuite/tests/simplCore/should_compile/T22097a.hs
=====================================
@@ -0,0 +1,23 @@
+module T22097a
+  ( isEven, isOdd )
+where
+
+{-# SPECIALIZE isEven :: Int -> Bool #-}
+isEven :: Integral a => a -> Bool
+isEven = fst evenOdd
+
+{-# SPECIALIZE isOdd :: Int -> Bool #-}
+isOdd :: Integral a => a -> Bool
+isOdd = snd evenOdd
+
+evenOdd :: Integral a => (a -> Bool, a -> Bool)
+evenOdd = (goEven, goOdd)
+  where
+    goEven n
+      | n < 0 = goEven (- n)
+      | n > 0 = goOdd (n - 1)
+      | otherwise = True
+
+    goOdd n
+      | n < 0 = goOdd n
+      | otherwise = goEven n


=====================================
testsuite/tests/simplCore/should_compile/T6056.stderr
=====================================
@@ -1,4 +1,4 @@
 Rule fired: SPEC/T6056 $wsmallerAndRest @Int (T6056)
 Rule fired: SPEC/T6056 $wsmallerAndRest @Int (T6056)
 Rule fired: SPEC/T6056 $wsmallerAndRest @Int (T6056)
-Rule fired: SPEC/T6056 $wsmallerAndRest @Int (T6056)
+Rule fired: SPEC/T6056 smallerAndRest @Int (T6056)


=====================================
testsuite/tests/simplCore/should_compile/T8331.stderr
=====================================
@@ -1,5 +1,60 @@
 
 ==================== Tidy Core rules ====================
+"SPEC $c*> @(ST s) @_"
+    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
+      $fApplicativeReaderT_$c*> @(ST s) @r $dApplicative
+      = ($fApplicativeReaderT3 @s @r)
+        `cast` (forall (a :: <*>_N) (b :: <*>_N).
+                <ReaderT r (ST s) a>_R
+                %<'Many>_N ->_R <ReaderT r (ST s) b>_R
+                %<'Many>_N ->_R <r>_R %<'Many>_N ->_R Sym (N:ST[0] <s>_N <b>_R)
+                                ; Sym (N:ReaderT[0] <*>_N <r>_R <ST s>_R <b>_N)
+                :: Coercible
+                     (forall {a} {b}.
+                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> r -> STRep s b)
+                     (forall {a} {b}.
+                      ReaderT r (ST s) a -> ReaderT r (ST s) b -> ReaderT r (ST s) b))
+"SPEC $c>> @(ST s) @_"
+    forall (@s) (@r) ($dMonad :: Monad (ST s)).
+      $fMonadReaderT1 @(ST s) @r $dMonad
+      = $fMonadAbstractIOSTReaderT_$s$c>> @s @r
+"SPEC $cliftA2 @(ST s) @_"
+    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
+      $fApplicativeReaderT_$cliftA2 @(ST s) @r $dApplicative
+      = ($fApplicativeReaderT1 @s @r)
+        `cast` (forall (a :: <*>_N) (b :: <*>_N) (c :: <*>_N).
+                <a -> b -> c>_R
+                %<'Many>_N ->_R <ReaderT r (ST s) a>_R
+                %<'Many>_N ->_R <ReaderT r (ST s) b>_R
+                %<'Many>_N ->_R <r>_R %<'Many>_N ->_R Sym (N:ST[0] <s>_N <c>_R)
+                                ; Sym (N:ReaderT[0] <*>_N <r>_R <ST s>_R <c>_N)
+                :: Coercible
+                     (forall {a} {b} {c}.
+                      (a -> b -> c)
+                      -> ReaderT r (ST s) a -> ReaderT r (ST s) b -> r -> STRep s c)
+                     (forall {a} {b} {c}.
+                      (a -> b -> c)
+                      -> ReaderT r (ST s) a -> ReaderT r (ST s) b -> ReaderT r (ST s) c))
+"SPEC $cp1Applicative @(ST s) @_"
+    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
+      $fApplicativeReaderT_$cp1Applicative @(ST s) @r $dApplicative
+      = $fApplicativeReaderT_$s$fFunctorReaderT @s @r
+"SPEC $cp1Monad @(ST s) @_"
+    forall (@s) (@r) ($dMonad :: Monad (ST s)).
+      $fMonadReaderT_$cp1Monad @(ST s) @r $dMonad
+      = $fApplicativeReaderT_$s$fApplicativeReaderT @s @r
+"SPEC $fApplicativeReaderT @(ST s) @_"
+    forall (@s) (@r) ($dApplicative :: Applicative (ST s)).
+      $fApplicativeReaderT @(ST s) @r $dApplicative
+      = $fApplicativeReaderT_$s$fApplicativeReaderT @s @r
+"SPEC $fFunctorReaderT @(ST s) @_"
+    forall (@s) (@r) ($dFunctor :: Functor (ST s)).
+      $fFunctorReaderT @(ST s) @r $dFunctor
+      = $fApplicativeReaderT_$s$fFunctorReaderT @s @r
+"SPEC $fMonadReaderT @(ST s) @_"
+    forall (@s) (@r) ($dMonad :: Monad (ST s)).
+      $fMonadReaderT @(ST s) @r $dMonad
+      = $fMonadAbstractIOSTReaderT_$s$fMonadReaderT @s @r
 "USPEC useAbstractMonad @(ReaderT Int (ST s))"
     forall (@s)
            ($dMonadAbstractIOST :: MonadAbstractIOST (ReaderT Int (ST s))).


=====================================
testsuite/tests/simplCore/should_compile/all.T
=====================================
@@ -359,7 +359,6 @@ test('T19586', normal, compile, [''])
 
 test('T19599', normal, compile, ['-O -ddump-rules'])
 test('T19599a', normal, compile, ['-O -ddump-rules'])
-test('T13873',  [expect_broken(21229), grep_errmsg(r'SPEC') ], compile, ['-O -ddump-rules'])
 
 # Look for a specialisation rule for wimwam
 test('T19672', normal, compile, ['-O2 -ddump-rules'])
@@ -430,3 +429,10 @@ test('T22028', normal, compile, ['-O -ddump-rule-firings'])
 test('T22114', normal, compile, ['-O'])
 test('T21286',  normal, multimod_compile, ['T21286', '-O -ddump-rule-firings'])
 
+# One module, T21851.hs, has OPTIONS_GHC -ddump-simpl
+test('T21851', [grep_errmsg(r'case.*w\$sf') ], multimod_compile, ['T21851', '-O -dno-typeable-binds -dsuppress-uniques'])
+# One module, T22097.hs, has OPTIONS_GHC -ddump-simpl
+test('T22097', [grep_errmsg(r'case.*wgoEven') ], multimod_compile, ['T22097', '-O -dno-typeable-binds -dsuppress-uniques'])
+
+test('T13873',  [ grep_errmsg(r'SPEC') ], compile, ['-O -ddump-rules'])
+



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/06a937f784301f9522f20cc1a2d3c821e7f3175b...3c4220c96942ff003fe75949af7188ee0438b467

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/06a937f784301f9522f20cc1a2d3c821e7f3175b...3c4220c96942ff003fe75949af7188ee0438b467
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221011/e8726a5a/attachment-0001.html>


More information about the ghc-commits mailing list