[Git][ghc/ghc][wip/DataToTagSmallOp] Fiddle with documentation to address review comments

Matthew Craven (@clyring) gitlab at gitlab.haskell.org
Fri Dec 8 21:57:36 UTC 2023



Matthew Craven pushed to branch wip/DataToTagSmallOp at Glasgow Haskell Compiler / GHC


Commits:
82795106 by Matthew Craven at 2023-12-08T16:56:37-05:00
Fiddle with documentation to address review comments

- - - - -


4 changed files:

- compiler/GHC/Core/Lint.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/StgToCmm/Expr.hs
- compiler/GHC/Tc/Instance/Class.hs


Changes:

=====================================
compiler/GHC/Core/Lint.hs
=====================================
@@ -1131,8 +1131,8 @@ checkTypeDataConOcc what dc
     (text "type data constructor found in a" <+> text what <> colon <+> ppr dc)
 
 {-
--- | Check that a use of dataToTagLarge# satisfies condition DTT2
--- from Note [DataToTag overview] in GHC.Tc.Instance.Class
+-- | Check that a use of dataToTagLarge# satisfies conditions DTT2
+-- and DTT3 from Note [DataToTag overview] in GHC.Tc.Instance.Class
 --
 -- Ignores applications not headed by dataToTagLarge#.
 
@@ -1148,7 +1148,11 @@ checkDataToTagPrimOpTyCon (Var fun_id) args
       Type _levity : Type dty : _rest
         | Just (tc, _) <- splitTyConApp_maybe dty
         , isValidDTT2TyCon tc
-          -> pure ()
+          -> do  platform <- getPlatform
+                 let  numConstrs = tyConFamilySize tc
+                      isSmallOp = op == DataToTagSmallOp
+                 checkL (isSmallFamily platform numConstrs == isSmallOp) $
+                   text "dataToTag# primop-size/tycon-family-size mismatch"
         | otherwise -> failWithL $ text "dataToTagLarge# used at non-ADT type:"
                                    <+> ppr dty
       _ -> failWithL $ text "dataToTagLarge# needs two type arguments but has args:"


=====================================
compiler/GHC/Core/Opt/ConstantFold.hs
=====================================
@@ -1986,7 +1986,9 @@ tagToEnumRule = do
 
 ------------------------------
 dataToTagRule :: RuleM CoreExpr
--- See Note [DataToTag overview] in GHC.Tc.Instance.Class.
+-- Used for both dataToTagSmall# and dataToTagLarge#.
+-- See Note [DataToTag overview] in GHC.Tc.Instance.Class,
+-- particularly wrinkle DTW5.
 dataToTagRule = a `mplus` b
   where
     -- dataToTag (tagToEnum x)   ==>   x
@@ -3556,7 +3558,7 @@ Note [caseRules for dataToTag]
 See also Note [DataToTag overview] in GHC.Tc.Instance.Class.
 
 We want to transform
-  case dataToTagLarge# x of
+  case dataToTagSmall# x of
     DEFAULT -> e1
     1# -> e2
 into
@@ -3571,12 +3573,17 @@ case-flattening and case-of-known-constructor and can be very
 important for code using derived Eq instances.
 
 We can apply this transformation only when we can easily get the
-constructors from the type at which dataToTagLarge# is used.  And we
+constructors from the type at which dataToTagSmall# is used.  And we
 cannot apply this transformation at "type data"-related types without
 breaking invariant I1 from Note [Type data declarations] in
 GHC.Rename.Module.  That leaves exactly the types satisfying condition
 DTT2 from Note [DataToTag overview] in GHC.Tc.Instance.Class.
 
+All of the above applies identically for `dataToTagLarge#`.  And
+thanks to wrinkle DTW5, there is no need to worry about large-tag
+arguments for `dataToTagSmall#`; those cause undefined behavior anyway.
+
+
 Note [Unreachable caseRules alternatives]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Take care if we see something like


=====================================
compiler/GHC/StgToCmm/Expr.hs
=====================================
@@ -74,7 +74,8 @@ cgExpr (StgOpApp (StgPrimOp SeqOp) [StgVarArg a, _] _res_ty) =
   cgIdApp a []
 
 -- dataToTagSmall# :: a_levpoly -> Int#
--- See Note [DataToTag overview] in GHC.Tc.Instance.Class
+-- See Note [DataToTag overview] in GHC.Tc.Instance.Class,
+-- particularly wrinkle DTW4
 cgExpr (StgOpApp (StgPrimOp DataToTagSmallOp) [StgVarArg a] _res_ty) = do
   platform <- getPlatform
   emitComment (mkFastString "dataToTagSmall#")
@@ -88,7 +89,8 @@ cgExpr (StgOpApp (StgPrimOp DataToTagSmallOp) [StgVarArg a] _res_ty) = do
   emitReturn [cmmSubWord platform tag1 (CmmLit $ mkWordCLit platform 1)]
 
 -- dataToTagLarge# :: a_levpoly -> Int#
--- See Note [DataToTag overview] in GHC.Tc.Instance.Class
+-- See Note [DataToTag overview] in GHC.Tc.Instance.Class,
+-- particularly wrinkle DTW4
 cgExpr (StgOpApp (StgPrimOp DataToTagLargeOp) [StgVarArg a] _res_ty) = do
   platform <- getPlatform
   emitComment (mkFastString "dataToTagLarge#")


=====================================
compiler/GHC/Tc/Instance/Class.hs
=====================================
@@ -676,7 +676,7 @@ GHC generates instances like this:
      dataToTag# = dataToTagSmall#
 
 using one of two dedicated primops: `dataToTagSmall#` and `dataToTagLarge#`.
-(These two primops differ only in code generation; see wrinkle DTW4 below.)
+(Why two primops? What's the difference? See wrinkles DTW4 and DTW5.)
 Both primops have the following over-polymorphic type:
 
   dataToTagLarge# :: forall {l::levity} (a::TYPE (BoxedRep l)). a -> Int#
@@ -707,13 +707,19 @@ these conditions:
    But with a little effort we can ensure that every primop
    call we generate in a DataToTag instance satisfies this condition.
 
+(DTT3) If the TyCon in wrinkle DTT2 is a "large data type" with more
+   constructors than fit in pointer tags on the target, then the
+   primop must be dataToTagLarge# and not dataToTagSmall#.
+   Otherwise, the primop must be dataToTagSmall# and not dataToTagLarge#.
+   (See wrinkles DTW4 and DTW5.)
+
 These two primops have special handling in several parts of
 the compiler:
 
 - They have a couple of built-in rewrite rules, implemented in
   GHC.Core.Opt.ConstantFold.dataToTagRule
 
-- The simplifier rewrites most case expressions scrutinizing their result.
+- The simplifier rewrites most case expressions scrutinizing their results.
   See Note [caseRules for dataToTag] in GHC.Core.Opt.ConstantFold.
 
 - Each evaluates its argument; this is implemented via special cases in
@@ -791,7 +797,18 @@ Wrinkles:
   primops in DataToTag instances depending on the number of data
   constructors the relevant TyCon has.
 
-(DTW5) We make no promises about the primops used to implement
+(DTW5) We consider a call `dataToTagSmall# x` to result in undefined
+  behavior whenever the target supports pointer tagging but the actual
+  constructor index for `x` is too large to fit in the pointer's tag
+  bits.  Otherwise, `dataToTagSmall#` behaves identically to
+  `dataToTagLarge#`.
+
+  This allows the rewrites performed in GHC.Core.Opt.ConstantFold to
+  safely treat `dataToTagSmall#` identically to `dataToTagLarge#`:
+  the allowed program behaviors for the former is always a superset of
+  the allowed program behaviors for the latter.
+
+(DTW6) We make no promises about the primops used to implement
   DataToTag instances.  Changes to GHC's representation of algebraic
   data types at runtime may force us to redesign these primops.
   Indeed, accommodating such changes without breaking users of the
@@ -804,6 +821,42 @@ Wrinkles:
   runtime and the index of that constructor is always exposed via
   pointer tagging and via the object's info table.
 
+(DTW7) Currently, the generated module GHC.PrimopWrappers in ghc-prim
+  contains the following non-sense definitions:
+
+    {-# NOINLINE dataToTagSmall# #-}
+    dataToTagSmall# :: a_levpoly -> Int#
+    dataToTagSmall# a1 = GHC.Prim.dataToTagSmall# a1
+    {-# NOINLINE dataToTagLarge# #-}
+    dataToTagLarge# :: a_levpoly -> Int#
+    dataToTagLarge# a1 = GHC.Prim.dataToTagLarge# a1
+
+  Why do these exist? GHCi uses these symbols for... something.  There
+  is on-going work to get rid of them.  See also #24169 and !6245.
+  Their continued existence makes it difficult to do several nice things:
+
+   * As explained in DTW6, the dataToTag# primops are very internal.
+     We would like to hide them from GHC.Prim entirely to prevent
+     their mis-use, but doing so would cause GHC.PrimopWrappers to
+     fail to compile.
+
+   * The primops are applied at the (confusingly monomorphic) type
+     variable `a_levpoly` in the above definitions.  In particular,
+     they do not satisfy conditions DTT2 and DTT3 above.  We would
+     very much like these conditions to be invariants, but while
+     GHC.PrimopWrappers breaks them we cannot do so.
+
+   * This in turn means that `GHC.Core.Opt.ConstantFold.caseRules`
+     must check for condition DTT2 before doing the work described in
+     Note [caseRules for dataToTag].
+
+   * Likewise, wrinkle DTW5 is only necessary because condition DTT3
+     is not an invariant.  Otherwise, invoking the currently-specified
+     undefined behavior of `dataToTagSmall# @ty` would require passing it
+     an argument which will not really have type `ty` at runtime.  And
+     evaluating such an expression is always undefined behavior anyway!
+
+
 
 Historical note:
 During its time as a primop, `dataToTag#` underwent several changes,



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/82795106afb05801c8220c6cc81e667d37303381

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/82795106afb05801c8220c6cc81e667d37303381
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20231208/d7d6d3c7/attachment-0001.html>


More information about the ghc-commits mailing list