[Git][ghc/ghc][wip/T22924] 4 commits: Refresh profiling docs

Mon Feb 13 21:37:31 UTC 2023

Simon Peyton Jones pushed to branch wip/T22924 at Glasgow Haskell Compiler / GHC


Commits:
1e9eac1c by Matthew Pickering at 2023-02-13T11:36:41+01:00
Refresh profiling docs

I went through the whole of the profiling docs and tried to amend them
to reflect current best practices and tooling. In particular I removed
some old references to tools such as hp2any and replaced them with
references to eventlog2html.

- - - - -
da208b9a by Matthew Pickering at 2023-02-13T11:36:41+01:00
docs: Add section about profiling and foreign calls

Previously there was no documentation for how foreign calls interacted
with the profiler. This can be quite confusing for users so getting it
into the user guide is the first step to a potentially better solution.
See the ticket for more insightful discussion.

Fixes #21764

- - - - -
081640f1 by Bodigrim at 2023-02-13T12:51:52-05:00
Document that -fproc-alignment was introduced only in GHC 8.6

- - - - -
2b0d4d28 by Simon Peyton Jones at 2023-02-13T22:37:19+01:00
Narrow the dont-decompose-newtype test

Following #22924 this patch narrows the test that stops
us decomposing newtypes.  The key change is the use of
noGivenNewtypeReprEqs in GHC.Tc.Solver.Canonical.canTyConApp.

We went to and fro on the solution, as you can see in #22924.
The result is carefully documented in
  Note [Decomoposing newtype equalities]

On the way I had revert most of
  commit 3e827c3f74ef76d90d79ab6c4e71aa954a1a6b90
  Author: Richard Eisenberg <rae at cs.brynmawr.edu>
  Date:   Mon Dec 5 10:14:02 2022 -0500

    Do newtype unwrapping in the canonicaliser and rewriter

    See Note [Unwrap newtypes first], which has the details.

It turns out that

(a) 3e827c3f makes GHC behave worse on some recursive newtypes
    (see one of the tests on this commit)
(b) the finer-grained test (namely noGivenNewtypeReprEqs) renders
    3e827c3f unnecessary

- - - - -


14 changed files:

- compiler/GHC/Core/TyCon.hs
- compiler/GHC/Tc/Solver/Canonical.hs
- compiler/GHC/Tc/Solver/InertSet.hs
- compiler/GHC/Tc/Solver/Rewrite.hs
- docs/users_guide/debugging.rst
- + docs/users_guide/images/eventlog_profile.png
- docs/users_guide/profiling.rst
- + testsuite/tests/typecheck/should_compile/T22924.hs
- testsuite/tests/typecheck/should_compile/all.T
- + testsuite/tests/typecheck/should_fail/T22924a.hs
- + testsuite/tests/typecheck/should_fail/T22924a.stderr
- + testsuite/tests/typecheck/should_fail/T22924b.hs
- + testsuite/tests/typecheck/should_fail/T22924b.stderr
- testsuite/tests/typecheck/should_fail/all.T


Changes:

=====================================
compiler/GHC/Core/TyCon.hs
=====================================
@@ -1974,7 +1974,7 @@ isInjectiveTyCon :: TyCon -> Role -> Bool
 isInjectiveTyCon (TyCon { tyConDetails = details }) role
   = go details role
   where
-    go _                             Phantom          = True -- Vacuously; (t1 ~P t2) holes for all t1, t2!
+    go _                             Phantom          = True -- Vacuously; (t1 ~P t2) holds for all t1, t2!
     go (AlgTyCon {})                 Nominal          = True
     go (AlgTyCon {algTcRhs = rhs})   Representational
       = isGenInjAlgRhs rhs


=====================================
compiler/GHC/Tc/Solver/Canonical.hs
=====================================
@@ -1084,7 +1084,7 @@ can_eq_nc' _rewritten _rdr_env _envs ev eq_rel
 
 -- Decompose type constructor applications
 -- NB: we have expanded type synonyms already
-can_eq_nc' rewritten _rdr_env _envs ev eq_rel ty1 _ ty2 _
+can_eq_nc' _rewritten _rdr_env _envs ev eq_rel ty1 _ ty2 _
   | Just (tc1, tys1) <- tcSplitTyConApp_maybe ty1
   , Just (tc2, tys2) <- tcSplitTyConApp_maybe ty2
    -- we want to catch e.g. Maybe Int ~ (Int -> Int) here for better
@@ -1092,7 +1092,7 @@ can_eq_nc' rewritten _rdr_env _envs ev eq_rel ty1 _ ty2 _
    -- hence no direct match on TyConApp
   , not (isTypeFamilyTyCon tc1)
   , not (isTypeFamilyTyCon tc2)
-  = canTyConApp rewritten ev eq_rel tc1 tys1 tc2 tys2
+  = canTyConApp ev eq_rel tc1 tys1 tc2 tys2
 
 can_eq_nc' _rewritten _rdr_env _envs ev eq_rel
            s1@(ForAllTy (Bndr _ vis1) _) _
@@ -1114,8 +1114,12 @@ can_eq_nc' True _rdr_env _envs ev NomEq ty1 _ ty2 _
 -------------------
 
 -- No similarity in type structure detected. Rewrite and try again.
-can_eq_nc' False _rdr_env _envs ev eq_rel _ ps_ty1 _ ps_ty2
-  = rewrite_and_try_again ev eq_rel ps_ty1 ps_ty2
+can_eq_nc' False rdr_env envs ev eq_rel _ ps_ty1 _ ps_ty2
+  = -- Rewrite the two types and try again
+    do { (redn1@(Reduction _ xi1), rewriters1) <- rewrite ev ps_ty1
+       ; (redn2@(Reduction _ xi2), rewriters2) <- rewrite ev ps_ty2
+       ; new_ev <- rewriteEqEvidence (rewriters1 S.<> rewriters2) ev NotSwapped redn1 redn2
+       ; can_eq_nc' True rdr_env envs new_ev eq_rel xi1 xi1 xi2 xi2 }
 
 ----------------------------
 -- Look for a canonical LHS. See Note [Canonical LHS].
@@ -1153,15 +1157,6 @@ can_eq_nc' True _rdr_env _envs ev eq_rel _ ps_ty1 _ ps_ty2
           -- No need to call canEqFailure/canEqHardFailure because they
           -- rewrite, and the types involved here are already rewritten

--- Rewrite the two types and try again
-rewrite_and_try_again :: CtEvidence -> EqRel -> TcType -> TcType -> TcS (StopOrContinue Ct)
-rewrite_and_try_again ev eq_rel ty1 ty2
-  = do { (redn1@(Reduction _ xi1), rewriters1) <- rewrite ev ty1
-       ; (redn2@(Reduction _ xi2), rewriters2) <- rewrite ev ty2
-       ; new_ev <- rewriteEqEvidence (rewriters1 S.<> rewriters2) ev NotSwapped redn1 redn2
-       ; rdr_env <- getGlobalRdrEnvTcS
-       ; envs <- getFamInstEnvs
-       ; can_eq_nc' True rdr_env envs new_ev eq_rel xi1 xi1 xi2 xi2 }
 
 {- Note [Unsolved equalities]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1407,62 +1402,41 @@ which is easier to satisfy.
 Conclusion: we must unwrap newtypes before decomposing them. This happens
 in `can_eq_newtype_nc`
 
-But even this is challenging. Here are two cases to consider:
-
-Case 1:
-
-  newtype Age = MkAge Int
-  [G] c
-  [W] w1 :: IO Age ~R# IO Int
-
-Case 2:
-
-  newtype A = MkA [A]
-  [W] A ~R# [A]
-
-For Case 1, recall that IO is an abstract newtype. Then read Note
-[Decomposing newtype equalities]. According to that Note, we should not
-decompose w1, because we have an Irred Given. Yet we still want to solve
-the wanted!  We can do so by unwrapping the (non-abstract) Age newtype
-underneath the IO, giving
-   [W] w2 :: IO Int ~R# IO Int
-   w1 = (IO unwrap-Age ; w2)
-where unwrap-Age :: Age ~R# Int. Now we case solve w2 by reflexivity;
-see Note [Eager reflexivity check].
-
-Conclusion: unwrap newtypes (deeply, inside types) in the rewriter:
-specifically in GHC.Tc.Solver.Rewrite.rewrite_newtype_app.
-
-Yet for Case 2, deep rewriting would be a disaster: we would loop.
-  [W] A ~R# [A] ---> {unwrap}
-                     [W] [A] ~R# [[A]]
-                ---> {decompose}
-                     [W] A ~R# [A]
-
-In this case, we just want to unwrap newtypes /at the top level/, allowing us
-to succeed via Note [Eager reflexivity check]:
-  [W] A ~R# [A] ---> {unwrap at top level only}
-                     [W] [A] ~R# [A]
-                ---> {reflexivity} success
-
-Conclusion: to satisfy Case 1 and Case 2, we unwrap
-* /both/ at top level, in can_eq_nc'
-* /and/ deeply, in the rewriter, rewrite_newtype_app
-
-The former unwraps outer newtypes (when the data constructor is in scope).
-The latter unwraps deeply -- but it won't be invoked in Case 2, when we can
-recognize an equality between the types [A] and [A] before rewriting
-deeply.
-
-This "before" business is delicate -- there is still a real risk of a loop
-in the type checker with recursive newtypes -- but I think we're doomed to do
-*something* delicate, as we're really trying to solve for equirecursive
-type equality. Bottom line for users: recursive newtypes are dangerous.
-See also Section 5.3.1 and 5.3.4 of
+We did flirt with making the /rewriter/ expand newtypes, rather than
+doing it in `can_eq_newtype_nc`.   But with recursive newtypes we want
+to be super-careful about expanding!
+
+   newtype A = MkA [A]   -- Recursive!
+
+   f :: A -> [A]
+   f = coerce
+
+We have [W] A ~R# [A].  If we rewrite [A], it'll expand to
+   [[[[[...]]]]]
+and blow the reduction stack.  See Note [Newtypes can blow the stack]
+in GHC.Tc.Solver.Rewrite.  But if we expand only the /top level/ of
+both sides, we get
+   [W] [A] ~R# [A]
+which we can, just, solve by reflexivity.
+
+So we simply unwrap, on-demand, at top level, in `can_eq_newtype_nc`.
+
+This is all very delicate. There is a real risk of a loop in the type checker
+with recursive newtypes -- but I think we're doomed to do *something*
+delicate, as we're really trying to solve for equirecursive type
+equality. Bottom line for users: recursive newtypes do not play well with type
+inference for representational equality.  See also Section 5.3.1 and 5.3.4 of
 "Safe Zero-cost Coercions for Haskell" (JFP 2016).
 
-Another approach -- which we ultimately decided against -- is described in
-Note [Decomposing newtypes a bit more aggressively].
+See also Note [Decomposing newtype equalities].
+
+--- Historical side note ---
+
+We flirted with doing /both/ unwrap-at-top-level /and/ rewrite-deeply;
+see #22519.  But that didn't work: see discussion in #22924. Specifically
+we got a loop with a minor variation:
+   f2 :: a -> [A]
+   f2 = coerce
 
 Note [Eager reflexivity check]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1492,6 +1466,24 @@ we do a reflexivity check.
 
 (This would be sound in the nominal case, but unnecessary, and I [Richard
 E.] am worried that it would slow down the common case.)
+
+ Note [Newtypes can blow the stack]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have
+
+  newtype X = MkX (Int -> X)
+  newtype Y = MkY (Int -> Y)
+
+and now wish to prove
+
+  [W] X ~R Y
+
+This Wanted will loop, expanding out the newtypes ever deeper looking
+for a solid match or a solid discrepancy. Indeed, there is something
+appropriate to this looping, because X and Y *do* have the same representation,
+in the limit -- they're both (Fix ((->) Int)). However, no finitely-sized
+coercion will ever witness it. This loop won't actually cause GHC to hang,
+though, because we check our depth in `can_eq_newtype_nc`.
 -}
 
 ------------------------
@@ -1598,8 +1590,7 @@ canEqCast rewritten ev eq_rel swapped ty1 co1 ty2 ps_ty2
     role = eqRelRole eq_rel
 
 ------------------------
-canTyConApp :: Bool   -- True <=> the types have been rewritten
-            -> CtEvidence -> EqRel
+canTyConApp :: CtEvidence -> EqRel
             -> TyCon -> [TcType]
             -> TyCon -> [TcType]
             -> TcS (StopOrContinue Ct)
@@ -1607,17 +1598,13 @@ canTyConApp :: Bool   -- True <=> the types have been rewritten
 -- See Note [Decomposing Dependent TyCons and Processing Wanted Equalities]
 -- Neither tc1 nor tc2 is a saturated funTyCon, nor a type family
 -- But they can be data families.
-canTyConApp rewritten ev eq_rel tc1 tys1 tc2 tys2
+canTyConApp ev eq_rel tc1 tys1 tc2 tys2
   | tc1 == tc2
   , tys1 `equalLength` tys2
   = do { inerts <- getTcSInerts
        ; if can_decompose inerts
          then canDecomposableTyConAppOK ev eq_rel tc1 tys1 tys2
-         else if rewritten
-              then canEqFailure ev eq_rel ty1 ty2
-              else rewrite_and_try_again ev eq_rel ty1 ty2 }
-              -- Why rewrite and try again?  See Case 1
-              -- of Note [Unwrap newtypes first]
+         else canEqFailure ev eq_rel ty1 ty2 }
 
   -- See Note [Skolem abstract data] in GHC.Core.Tycon
   | tyConSkolem tc1 || tyConSkolem tc2
@@ -1641,7 +1628,7 @@ canTyConApp rewritten ev eq_rel tc1 tys1 tc2 tys2
     ty2 = mkTyConApp tc2 tys2
 
      -- See Note [Decomposing TyConApp equalities]
-     -- Note [Decomposing newtypes a bit more aggressively]
+     -- and Note [Decomposing newtype equalities]
     can_decompose inerts
       =  isInjectiveTyCon tc1 (eqRelRole eq_rel)
       || (assert (eq_rel == ReprEq) $
@@ -1650,7 +1637,8 @@ canTyConApp rewritten ev eq_rel tc1 tys1 tc2 tys2
           -- Moreover isInjectiveTyCon is True for Representational
           --   for algebraic data types.  So we are down to newtypes
           --   and data families.
-          ctEvFlavour ev == Wanted && noGivenIrreds inerts)
+          ctEvFlavour ev == Wanted && noGivenNewtypeReprEqs tc1 inerts)
+             -- See Note [Decomposing newtype equalities] (EX2)
 
 {-
 Note [Use canEqFailure in canDecomposableTyConApp]
@@ -1838,13 +1826,13 @@ Example is wrinkle {1} in Note [Decomposing TyConApp equalities].
 
 For a Wanted with r=R, since newtypes are not injective at representational
 role, decomposition is sound, but we may lose completeness.  Nevertheless,
-if the newtype is abstraction (so can't be unwrapped) we can only solve
+if the newtype is abstract (so can't be unwrapped) we can only solve
 the equality by (a) using a Given or (b) decomposition.  If (a) is impossible
-(e.g. no Givens) then (b) is safe.
+(e.g. no Givens) then (b) is safe albeit potentially incomplete.
 
-Conclusion: decompose newtypes (at role R) only if there are no usable Givens.
+There are two ways in which decomposing (N ty1) ~r (N ty2) could be incomplete:
 
-* Incompleteness example (EX1)
+* Incompleteness example (EX1): unwrap first
       newtype Nt a = MkNt (Id a)
       type family Id a where Id a = a
 
@@ -1856,39 +1844,68 @@ Conclusion: decompose newtypes (at role R) only if there are no usable Givens.
 
   Conclusion: always unwrap newtypes before attempting to decompose
   them.  This is done in can_eq_nc'.  Of course, we can't unwrap if the data
-  constructor isn't in scope.  See See Note [Unwrap newtypes first].
+  constructor isn't in scope.  See Note [Unwrap newtypes first].
 
-* Incompleteness example (EX2)
+* Incompleteness example (EX2): available Givens
       newtype Nt a = Mk Bool         -- NB: a is not used in the RHS,
       type role Nt representational  -- but the user gives it an R role anyway
 
-  If we have [W] Nt alpha ~R Nt beta, we *don't* want to decompose to
-  [W] alpha ~R beta, because it's possible that alpha and beta aren't
-  representationally equal.
+      [G] Nt t1 ~R Nt t2
+      [W] Nt alpha ~R Nt beta
 
-  and maybe there is a Given (Nt t1 ~R Nt t2), just waiting to be used, if we
-  figure out (elsewhere) that alpha:=t1 and beta:=t2.  This is somewhat
-  similar to the question of overlapping Givens for class constraints: see
-  Note [Instance and Given overlap] in GHC.Tc.Solver.Interact.
+  We *don't* want to decompose to [W] alpha ~R beta, because it's possible
+  that alpha and beta aren't representationally equal.  And if we figure
+  out (elsewhere) that alpha:=t1 and beta:=t2, we can solve the Wanted
+  from the Given.  This is somewhat similar to the question of overlapping
+  Givens for class constraints: see Note [Instance and Given overlap] in
+  GHC.Tc.Solver.Interact.
 
   Conclusion: don't decompose [W] N s ~R N t, if there are any Given
   equalities that could later solve it.
 
-  But what does "any Given equalities that could later solve it" mean, precisely?
-  It must be a Given constraint that could turn into N s ~ N t.  But that
-  could include [G] (a b) ~ (c d), or even just [G] c.  But it'll definitely
-  be an CIrredCan.  So we settle for having no CIrredCans at all, which is
-  conservative but safe. See noGivenIrreds and #22331.
+  But what precisely does it mean to say "any Given equalities that could
+  later solve it"?
+
+  In #22924 we had
+     [G] f a ~R# a     [W] Const (f a) a ~R# Const a a
+  where Const is an abstract newtype.  If we decomposed the newtype, we
+  could solve.  Not-decomposing on the grounds that (f a ~R# a) might turn
+  into (Const (f a) a ~R# Const a a) seems a bit silly.
+
+  In #22331 we had
+     [G] N a ~R# N b   [W] N b ~R# N a
+  (where N is abstract so we can't unwrap). Here we really /don't/ want to
+  decompose, because the /only/ way to solve the Wanted is from that Given
+  (with a Sym).
+
+  In #22519 we had
+     [G] a <= b     [W] IO Age ~R# IO Int
+
+  (where IO is abstract so we can't unwrap, and newtype Age = Int; and (<=)
+  is a type-level comparison on Nats).  Here we /must/ decompose, despite the
+  existence of an Irred Given, or we will simply be stuck.  (Side note: We
+  flirted with deep-rewriting of newtypes (see discussion on #22519 and
+  !9623) but that turned out not to solve #22924, and also makes type
+  inference loop more often on recursive newtypes.)
+
+  The currently-implemented compromise is this:
+
+    we decompose [W] N s ~R# N t unless there is a [G] N s' ~ N t'
+
+  that is, a Given Irred equality with both sides headed with N.
+  See the call to noGivenNewtypeReprEqs in canTyConApp.
+
+  This is not perfect.  In principle a Given like [G] (a b) ~ (c d), or
+  even just [G] c, could later turn into N s ~ N t.  But since the free
+  vars of a Given are skolems, or at least untouchable unification
+  variables, this is extremely unlikely to happen.
 
-  Well not 100.0% safe. There could be a CDictCan with some un-expanded
-  superclasses; but only in some very obscure recursive-superclass
-  situations.
+  Another worry: there could, just, be a CDictCan with some
+  un-expanded equality superclasses; but only in some very obscure
+  recursive-superclass situations.
 
-If there are no Irred Givens (which is quite common) then we will
-successfuly decompose [W] (IO Age) ~R (IO Int), and solve it.  But
-that won't happen and [W] (IO Age) ~R (IO Int) will be stuck.
-We /could/, however, be a bit more aggressive about decomposition;
-see Note [Decomposing newtypes a bit more aggressively].
+   Yet another approach (!) is desribed in
+   Note [Decomposing newtypes a bit more aggressively].
 
 Remember: decomposing Wanteds is always /sound/. This Note is
 only about /completeness/.
@@ -1896,7 +1913,8 @@ only about /completeness/.
 Note [Decomposing newtypes a bit more aggressively]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 IMPORTANT: the ideas in this Note are *not* implemented. Instead, the
-current approach is detailed in Note [Unwrap newtypes first].
+current approach is detailed in Note [Decomposing newtype equalities]
+and Note [Unwrap newtypes first].
 For more details about the ideas in this Note see
   * GHC propoosal: https://github.com/ghc-proposals/ghc-proposals/pull/549
   * issue #22441


=====================================
compiler/GHC/Tc/Solver/InertSet.hs
=====================================
@@ -21,7 +21,7 @@ module GHC.Tc.Solver.InertSet (
     addInertItem,
 
     noMatchableGivenDicts,
-    noGivenIrreds,
+    noGivenNewtypeReprEqs,
     mightEqualLater,
     prohibitedSuperClassSolve,
 
@@ -1537,9 +1537,22 @@ isOuterTyVar tclvl tv
     -- becomes "outer" even though its level numbers says it isn't.
   | otherwise  = False  -- Coercion variables; doesn't much matter
 
-noGivenIrreds :: InertSet -> Bool
-noGivenIrreds (IS { inert_cans = inert_cans })
-  = isEmptyBag (inert_irreds inert_cans)
+noGivenNewtypeReprEqs :: TyCon -> InertSet -> Bool
+-- True <=> there is no Irred looking like (N tys1 ~ N tys2)
+-- See Note [Decomposing newtype equalities] (EX2) in GHC.Tc.Solver.Canonical
+--     This is the only call site.
+noGivenNewtypeReprEqs tc inerts
+  = not (anyBag might_help (inert_irreds (inert_cans inerts)))
+  where
+    might_help ct
+      = case classifyPredType (ctPred ct) of
+          EqPred ReprEq t1 t2
+             | Just (tc1,_) <- tcSplitTyConApp_maybe t1
+             , tc == tc1
+             , Just (tc2,_) <- tcSplitTyConApp_maybe t2
+             , tc == tc2
+             -> True
+          _  -> False
 
 -- | Returns True iff there are no Given constraints that might,
 -- potentially, match the given class consraint. This is used when checking to see if a


=====================================
compiler/GHC/Tc/Solver/Rewrite.hs
=====================================
@@ -42,7 +42,6 @@ import GHC.Builtin.Types (tYPETyCon)
 import Data.List ( find )
 import GHC.Data.List.Infinite (Infinite)
 import qualified GHC.Data.List.Infinite as Inf
-import GHC.Tc.Instance.Family (tcTopNormaliseNewTypeTF_maybe)
 
 {-
 ************************************************************************
@@ -225,10 +224,10 @@ rewrite ev ty
        ; return result }
 
 -- | See Note [Rewriting]
--- This variant of 'rewrite' rewrites w.r.t. nominal equality only,
--- as this is better than full rewriting for error messages. Specifically,
--- we want to avoid unwrapping newtypes, as doing so can end up causing
--- an otherwise-unnecessary stack overflow.
+-- `rewriteForErrors` is a variant of 'rewrite' that rewrites
+-- w.r.t. nominal equality only, as this is better than full rewriting
+-- for error messages. (This was important when we flirted with rewriting
+-- newtypes but perhaps less so now.)
 rewriteForErrors :: CtEvidence -> TcType
                  -> TcS (Reduction, RewriterSet)
 rewriteForErrors ev ty
@@ -499,27 +498,14 @@ rewrite_one (TyVarTy tv)
 rewrite_one (AppTy ty1 ty2)
   = rewrite_app_tys ty1 [ty2]
 
-rewrite_one ty@(TyConApp tc tys)
+rewrite_one (TyConApp tc tys)
   -- If it's a type family application, try to reduce it
   | isTypeFamilyTyCon tc
   = rewrite_fam_app tc tys
 
-  | otherwise
-  = do { eq_rel <- getEqRel
-       ; if eq_rel == ReprEq
-
-         then -- Rewriting w.r.t. representational equality requires
-              --   unwrapping newtypes; see GHC.Tc.Solver.Canonical.
-              --   Note [Unwrap newtypes first]
-              -- NB: try rewrite_newtype_app even when tc isn't a newtype;
-              -- the allows the possibility of having a newtype buried under
-              -- a synonym. Needed for e.g. T12067.
-              rewrite_newtype_app ty
-
-         else -- For * a normal data type application
-              --     * data family application
-              -- we just recursively rewrite the arguments.
-              rewrite_ty_con_app tc tys }
+  | otherwise -- We just recursively rewrite the arguments.
+              -- See Note [Do not rewrite newtypes]
+  = rewrite_ty_con_app tc tys
 
 rewrite_one (FunTy { ft_af = vis, ft_mult = mult, ft_arg = ty1, ft_res = ty2 })
   = do { arg_redn <- rewrite_one ty1
@@ -678,42 +664,12 @@ rewrite_vector ki roles tys
     fvs                                = tyCoVarsOfType ki
 {-# INLINE rewrite_vector #-}
 
--- Rewrite a (potential) newtype application
--- Precondition: the ambient EqRel is ReprEq
--- Precondition: the type is a TyConApp
--- See Note [Newtypes can blow the stack]
-rewrite_newtype_app :: TcType -> RewriteM Reduction
-rewrite_newtype_app ty@(TyConApp tc tys)
-  = do { rdr_env <- liftTcS getGlobalRdrEnvTcS
-       ; tf_envs <- liftTcS getFamInstEnvs
-       ; case (tcTopNormaliseNewTypeTF_maybe tf_envs rdr_env ty) of
-           Nothing -> -- Non-newtype or abstract newtype
-                      rewrite_ty_con_app tc tys
-
-           Just ((used_ctors, co), ty')   -- co :: ty ~ ty'
-             -> do { liftTcS $ recordUsedGREs used_ctors
-                   ; checkStackDepth ty
-                   ; rewrite_reduction (Reduction co ty') } }
-
-rewrite_newtype_app other_ty = pprPanic "rewrite_newtype_app" (ppr other_ty)
-
-{- Note [Newtypes can blow the stack]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Suppose we have
-
-  newtype X = MkX (Int -> X)
-  newtype Y = MkY (Int -> Y)
-
-and now wish to prove
-
-  [W] X ~R Y
 
-This Wanted will loop, expanding out the newtypes ever deeper looking
-for a solid match or a solid discrepancy. Indeed, there is something
-appropriate to this looping, because X and Y *do* have the same representation,
-in the limit -- they're both (Fix ((->) Int)). However, no finitely-sized
-coercion will ever witness it. This loop won't actually cause GHC to hang,
-though, because we check our depth when unwrapping newtypes.
+{- Note [Do not rewrite newtypes]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We flirted with unwrapping newtypes in the rewriter -- see GHC.Tc.Solver.Canonical
+Note [Unwrap newtypes first]. But that turned out to be a bad idea because
+of recursive newtypes, as that Note says.  So be careful if you re-add it!
 
 Note [Rewriting synonyms]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
docs/users_guide/debugging.rst
=====================================
@@ -1046,6 +1046,8 @@ Checking for consistency
     :shortdesc: Align functions at given boundary.
     :type: dynamic
 
+    :since: 8.6.1
+
     Align functions to multiples of the given value. Only valid values are powers
     of two.
 


=====================================
docs/users_guide/images/eventlog_profile.png
=====================================
Binary files /dev/null and b/docs/users_guide/images/eventlog_profile.png differ


=====================================
docs/users_guide/profiling.rst
=====================================
@@ -10,17 +10,13 @@ Profiling
 
 GHC comes with a time and space profiling system, so that you can answer
 questions like "why is my program so slow?", or "why is my program using
-so much memory?".
+so much memory?". We'll start by describing how to do time profiling.
 
-Profiling a program is a three-step process:
+Time profiling a program is a three-step process:
 
 1. Re-compile your program for profiling with the :ghc-flag:`-prof` option, and
    probably one of the options for adding automatic annotations:
-   :ghc-flag:`-fprof-auto` is the most common [1]_.
-
-   If you are using external packages with :command:`cabal`, you may need to
-   reinstall these packages with profiling support; typically this is
-   done with ``cabal install -p package --reinstall``.
+   :ghc-flag:`-fprof-late` is the recommended option.
 
 2. Having compiled the program for profiling, you now need to run it to
    generate the profile. For example, a simple time profile can be
@@ -37,6 +33,9 @@ Profiling a program is a three-step process:
 3. Examine the generated profiling information, use the information to
    optimise your program, and repeat as necessary.
 
+The time profiler measures the CPU time taken by the Haskell code in your application.
+In particular time taken by safe foreign calls is not tracked by the profiler (see :ref:`prof-foreign-calls`).
+
 .. _cost-centres:
 
 Cost centres and cost-centre stacks
@@ -197,7 +196,10 @@ Inserting cost centres by hand
 Cost centres are just program annotations. When you say ``-fprof-auto``
 to the compiler, it automatically inserts a cost centre annotation
 around every binding not marked INLINE in your program, but you are
-entirely free to add cost centre annotations yourself.
+entirely free to add cost centre annotations yourself. Be careful adding too many
+cost-centre annotations as the optimiser is careful to not move them around or
+remove them, which can severly affect how your program is optimised and hence the
+runtime performance!
 
 The syntax of a cost centre annotation for expressions is ::
 
@@ -311,6 +313,39 @@ and become CAFs. You will probably need to consult the Core
 .. index::
    single: -fprof-cafs
 
+.. _prof-foreign-calls:
+
+Profiling and foreign calls
+---------------------------
+
+Simply put, the profiler includes time spent in unsafe foreign
+calls but ignores time taken in safe foreign calls. For example, time spent blocked on IO
+operations (e.g. ``getLine``) is not accounted for in the profile as ``getLine`` is implemented
+using a safe foreign call.
+
+The profiler estimates CPU time, for Haskell threads within the program only.
+In particular, time "taken" by the program in blocking safe foreign calls
+is not accounted for in time profiles. The runtime has the notion of a virtual
+processor which is known as a "capability". Haskell threads are run on capabilities,
+and the profiler samples the capabilities in order to determine what is being
+executed at a certain time. When a safe foreign call is executed, it's run outside
+the context of a capability; hence the sampling does not account for the time
+taken. Whilst the safe call is executed, other
+Haskell threads are free to run on the capability, and their cost will be attributed
+to the profiler. When the safe call is finished, the blocked, descheduled thread can
+be resumed and rescheduled.
+
+However, the time taken by blocking on unsafe foreign calls is accounted for in the profile.
+This happens because unsafe foreign calls are executed by the same capability
+their calling Haskell thread is running on. Therefore, an unsafe foreign call will
+block the entire capability whilst it is running, and any time the capability is
+sampled the "cost" of the foreign call will be attributed to the calling cost-centre stack.
+
+However, do note that you are not supposed to use unsafe foreign calls for any
+operations which do block! Do not be tempted to replace your safe foreign calls
+with unsafe calls just so they appear in the profile. This prevents GC from
+happening until the foreign call returns, which can be catastrophic for performance.
+
 .. _prof-compiler-options:
 
 Compiler options for profiling
@@ -356,7 +391,9 @@ Automatically placing cost-centres
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 GHC has a number of flags for automatically inserting cost-centres into the
-compiled program.
+compiled program. Use these options carefully because inserting too many cost-centres
+in the wrong places will mean the optimiser will be less effective and the runtime behaviour
+of your profiled program will be different to that of the unprofiled one.
 
 .. ghc-flag:: -fprof-callers=⟨name⟩
     :shortdesc: Auto-add ``SCC``\\ s to all call-sites of the named function.
@@ -618,8 +655,10 @@ enclosed between ``+RTS ... -RTS`` as usual):
 JSON profile format
 ~~~~~~~~~~~~~~~~~~~
 
-When invoked with the :rts-flag:`-pj` flag the runtime will emit the cost-centre
-profile in a machine-readable JSON format. The top-level object of this format
+profile in a machine-readable JSON format. The JSON file can be directly loaded
+into `speedscope.app <https://www.speedscope.app/>`_ to interactively view the profile.
+
+The top-level object of this format
 has the following properties,
 
 ``program`` (string)
@@ -758,8 +797,12 @@ For instance, a simple profile might look like this,
       }
     }
 
+Eventlog profile format
+~~~~~~~~~~~~~~~~~~~~~~~
 
-
+In addition to the ``.prof`` and ``.json`` formats the cost centre definitions
+and samples are also emitted to the :ref:`eventlog <rts-eventlog>`. The format
+of the events is specified in the :ref:`eventlog encodings <eventlog-encodings>` section.
 
 
 .. _prof-heap:
@@ -774,18 +817,35 @@ program holds on to more memory at run-time that it needs to. Space
 leaks lead to slower execution due to heavy garbage collector activity,
 and may even cause the program to run out of memory altogether.
 
+Heap profiling differs from time profiling in the fact that is not always
+necessary to use the profiling runtime to generate a heap profile. There
+are two heap profiling modes (:rts-flag:`-hT` and :rts-flag:`-hi` [1]_) which are always
+available.
+
 To generate a heap profile from your program:
 
-1. Compile the program for profiling (:ref:`prof-compiler-options`).
+1. Assuming you need the profiling runtime, compile the program for profiling (:ref:`prof-compiler-options`).
 
 2. Run it with one of the heap profiling options described below (eg.
-   :rts-flag:`-hc` for a basic producer profile). This generates the file
-   :file:`{prog}.hp`.
+   :rts-flag:`-hc` for a basic producer profile) and enable the eventlog using :rts-flag:`-l <-l ⟨flags⟩>`.
 
-   If the :ref:`event log <rts-eventlog>` is enabled (with the :rts-flag:`-l ⟨flags⟩`
-   runtime system flag) heap samples will additionally be emitted to the GHC
+   Heap samples will be emitted to the GHC
    event log (see :ref:`heap-profiler-events` for details about event format).
 
+3. Render the heap profile using `eventlog2html <https://hackage.haskell.org/package/eventlog2html>`_.
+   This produces an HTML file which contains the visualised profile.
+
+4. Open the rendered interactive profile in your web browser.
+
+For example, here is a heap profile produced of using eventlog profiling on GHC
+compiling the Cabal library. You can read a lot more about eventlog2html on the website.
+
+.. image:: images/eventlog_profile.*
+
+Note that there is the legacy :file:`{prog}.hp` format which has been deprecated
+in favour of eventlog based profiling. In order to render the legacy format, the
+steps are as follows.
+
 3. Run :command:`hp2ps` to produce a Postscript file, :file:`{prog}.ps`. The
    :command:`hp2ps` utility is described in detail in :ref:`hp2ps`.
 
@@ -797,10 +857,6 @@ from GHC's ``nofib`` benchmark suite,
 
 .. image:: images/prof_scc.*
 
-You might also want to take a look at
-`hp2any <https://www.haskell.org/haskellwiki/Hp2any>`__, a more advanced
-suite of tools (not distributed with GHC) for displaying heap profiles.
-
 Note that there might be a big difference between the OS reported memory usage
 of your program and the amount of live data as reported by heap profiling.
 The reasons for the difference are explained in :ref:`hints-os-memory`.
@@ -817,20 +873,14 @@ following RTS options select which break-down to use:
 
 .. rts-flag:: -hT
 
-    Breaks down the graph by heap closure type.
+    Breaks down the graph by heap closure type. This does not require the profiling
+    runtime.
 
 .. rts-flag:: -hc
-              -h
 
     *Requires* :ghc-flag:`-prof`. Breaks down the graph by the cost-centre stack
     which produced the data.
 
-    .. note:: The meaning of the shortened :rts-flag:`-h` is dependent on whether
-              your program was compiled for profiling. When compiled for profiling,
-              :rts-flag:`-h` is equivalent to :rts-flag:`-hc`, but otherwise is
-              equivalent to :rts-flag:`-hT` (see :ref:`rts-profiling`). The :rts-flag:`-h`
-              is deprecated and will be removed in a future release.
-
 .. rts-flag:: -hm
 
     *Requires* :ghc-flag:`-prof`. Break down the live heap by the module
@@ -863,7 +913,7 @@ following RTS options select which break-down to use:
 
     Break down the graph by the address of the info table of a closure. For this
     to produce useful output the program must have been compiled with
-    :ghc-flag:`-finfo-table-map`.
+    :ghc-flag:`-finfo-table-map` but it does not require the profiling runtime.
 
 .. rts-flag:: -l
     :noindex:
@@ -1041,6 +1091,14 @@ This trick isn't foolproof, because there might be other ``B`` closures in
 the heap which aren't the retainers we are interested in, but we've
 found this to be a useful technique in most cases.
 
+Precise Retainer Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to precisely answer questions about why a certain type of closure is
+retained then it is worthwhile using `ghc-debug <https://gitlab.haskell.org/ghc/ghc-debug>`_ which
+has a terminal interface which can be used to easily answer queries such as, what is retaining
+a certain closure.
+
 .. _biography-prof:
 
 Biographical Profiling
@@ -1120,6 +1178,9 @@ reasons for this:
    allocated by foreign libraries, and data allocated by the RTS), and
    ``mmap()``\'d memory are not counted in the heap profile.
 
+For more discussion about understanding how understanding process residency see
+:ref:`hints-os-memory`.
+
 .. _hp2ps:
 
 ``hp2ps`` -- Rendering heap profiles to PostScript
@@ -1242,123 +1303,6 @@ The flags are:
 
     Print out usage information.
 
-.. _manipulating-hp:
-
-Manipulating the ``hp`` file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-(Notes kindly offered by Jan-Willem Maessen.)
-
-The ``FOO.hp`` file produced when you ask for the heap profile of a
-program ``FOO`` is a text file with a particularly simple structure.
-Here's a representative example, with much of the actual data omitted:
-
-.. code-block:: none
-
-    JOB "FOO -hC"
-    DATE "Thu Dec 26 18:17 2002"
-    SAMPLE_UNIT "seconds"
-    VALUE_UNIT "bytes"
-    BEGIN_SAMPLE 0.00
-    END_SAMPLE 0.00
-    BEGIN_SAMPLE 15.07
-      ... sample data ...
-    END_SAMPLE 15.07
-    BEGIN_SAMPLE 30.23
-      ... sample data ...
-    END_SAMPLE 30.23
-    ... etc.
-    BEGIN_SAMPLE 11695.47
-    END_SAMPLE 11695.47
-
-The first four lines (``JOB``, ``DATE``, ``SAMPLE_UNIT``,
-``VALUE_UNIT``) form a header. Each block of lines starting with
-``BEGIN_SAMPLE`` and ending with ``END_SAMPLE`` forms a single sample
-(you can think of this as a vertical slice of your heap profile). The
-hp2ps utility should accept any input with a properly-formatted header
-followed by a series of *complete* samples.
-
-Zooming in on regions of your profile
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-You can look at particular regions of your profile simply by loading a
-copy of the ``.hp`` file into a text editor and deleting the unwanted
-samples. The resulting ``.hp`` file can be run through ``hp2ps`` and
-viewed or printed.
-
-Viewing the heap profile of a running program
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``.hp`` file is generated incrementally as your program runs. In
-principle, running :command:`hp2ps` on the incomplete file should produce a
-snapshot of your program's heap usage. However, the last sample in the
-file may be incomplete, causing :command:`hp2ps` to fail. If you are using a
-machine with UNIX utilities installed, it's not too hard to work around
-this problem (though the resulting command line looks rather Byzantine):
-
-.. code-block:: sh
-
-    head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-
-The command ``fgrep -n END_SAMPLE FOO.hp`` finds the end of every
-complete sample in ``FOO.hp``, and labels each sample with its ending
-line number. We then select the line number of the last complete sample
-using :command:`tail` and :command:`cut`. This is used as a parameter to :command:`head`; the
-result is as if we deleted the final incomplete sample from :file:`FOO.hp`.
-This results in a properly-formatted .hp file which we feed directly to
-:command:`hp2ps`.
-
-Viewing a heap profile in real time
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The :command:`gv` and :command:`ghostview` programs have a "watch file" option
-can be used to view an up-to-date heap profile of your program as it runs.
-Simply generate an incremental heap profile as described in the previous
-section. Run :command:`gv` on your profile:
-
-.. code-block:: sh
-
-      gv -watch -orientation=seascape FOO.ps
-
-If you forget the ``-watch`` flag you can still select "Watch file" from
-the "State" menu. Now each time you generate a new profile ``FOO.ps``
-the view will update automatically.
-
-This can all be encapsulated in a little script:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv -watch -orientation=seascape FOO.ps &
-      while [ 1 ] ; do
-        sleep 10 # We generate a new profile every 10 seconds.
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-      done
-
-Occasionally :command:`gv` will choke as it tries to read an incomplete copy of
-:file:`FOO.ps` (because :command:`hp2ps` is still running as an update occurs). A
-slightly more complicated script works around this problem, by using the
-fact that sending a SIGHUP to gv will cause it to re-read its input
-file:
-
-.. code-block:: sh
-
-      #!/bin/sh
-      head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-        | hp2ps > FOO.ps
-      gv FOO.ps &
-      gvpsnum=$!
-      while [ 1 ] ; do
-        sleep 10
-        head -`fgrep -n END_SAMPLE FOO.hp | tail -1 | cut -d : -f 1` FOO.hp \
-          | hp2ps > FOO.ps
-        kill -HUP $gvpsnum
-      done
-
 .. _prof-threaded:
 
 Profiling Parallel and Concurrent Programs
@@ -1968,10 +1912,9 @@ Notes about ticky profiling
   in some columns. For this reason using an eventlog-based approach should be prefered if
   possible.
 
-
 .. [1]
-   :ghc-flag:`-fprof-auto` was known as ``-auto-all`` prior to
-   GHC 7.4.1.
+   :rts-flag:`-hi` profiling is avaible with the normal runtime but you will need to
+   compile with :ghc-flag:`-finfo-table-map` to interpret the results.
 
 .. [2]
    Note that this policy has changed slightly in GHC 7.4.1 relative to


=====================================
testsuite/tests/typecheck/should_compile/T22924.hs
=====================================
@@ -0,0 +1,9 @@
+{-# LANGUAGE FlexibleInstances #-}
+module G where
+
+import Data.Functor.Const( Const )
+import Data.Coerce
+
+f :: Coercible (f a) a => Const a () -> Const (f a) ()
+f = coerce
+


=====================================
testsuite/tests/typecheck/should_compile/all.T
=====================================
@@ -857,3 +857,4 @@ test('T22647', normal, compile, [''])
 test('T19577', normal, compile, [''])
 test('T22383', normal, compile, [''])
 test('T21501', normal, compile, [''])
+test('T22924', normal, compile, [''])


=====================================
testsuite/tests/typecheck/should_fail/T22924a.hs
=====================================
@@ -0,0 +1,9 @@
+module T22924a where
+
+import Data.Coerce
+
+newtype R = MkR [R]
+
+f :: a -> [R]
+-- Should give a civilised error
+f = coerce


=====================================
testsuite/tests/typecheck/should_fail/T22924a.stderr
=====================================
@@ -0,0 +1,11 @@
+
+T22924a.hs:9:5: error: [GHC-10283]
+    • Couldn't match representation of type ‘a’ with that of ‘[R]’
+        arising from a use of ‘coerce’
+      ‘a’ is a rigid type variable bound by
+        the type signature for:
+          f :: forall a. a -> [R]
+        at T22924a.hs:7:1-13
+    • In the expression: coerce
+      In an equation for ‘f’: f = coerce
+    • Relevant bindings include f :: a -> [R] (bound at T22924a.hs:9:1)


=====================================
testsuite/tests/typecheck/should_fail/T22924b.hs
=====================================
@@ -0,0 +1,10 @@
+module T22924b where
+
+import Data.Coerce
+
+newtype R = MkR [R]
+newtype S = MkS [S]
+
+f :: R -> S
+-- Blows the typechecker reduction stack
+f = coerce


=====================================
testsuite/tests/typecheck/should_fail/T22924b.stderr
=====================================
@@ -0,0 +1,10 @@
+
+T22924b.hs:10:5: error:
+    • Reduction stack overflow; size = 201
+      When simplifying the following type: R
+      Use -freduction-depth=0 to disable this check
+      (any upper bound you could choose might fail unpredictably with
+       minor updates to GHC, so disabling the check is recommended if
+       you're sure that type checking should terminate)
+    • In the expression: coerce
+      In an equation for ‘f’: f = coerce


=====================================
testsuite/tests/typecheck/should_fail/all.T
=====================================
@@ -667,3 +667,5 @@ test('T22570', normal, compile_fail, [''])
 test('T22645', normal, compile_fail, [''])
 test('T20666', normal, compile_fail, [''])
 test('T20666a', normal, compile_fail, [''])
+test('T22924a', normal, compile_fail, [''])
+test('T22924b', normal, compile_fail, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/bc08cdb67f416494b3c920a50fedb0b1942e0fda...2b0d4d28537988cc304ffb5c292455c8492c793e

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/bc08cdb67f416494b3c920a50fedb0b1942e0fda...2b0d4d28537988cc304ffb5c292455c8492c793e
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230213/ae60a959/attachment-0001.html>