[Git][ghc/ghc][wip/T23146] 4 commits: Update Note [Core letrec invariant]

Rodrigo Mesquita (@alt-romes) gitlab at gitlab.haskell.org
Mon May 8 09:28:49 UTC 2023



Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC


Commits:
c7160662 by Rodrigo Mesquita at 2023-05-08T10:28:33+01:00
Update Note [Core letrec invariant]

Authored by @simonpj

- - - - -
ab848cfd by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Rename mkLFImported to importedIdLFInfo

The `mkLFImported` sounded too much like a constructor of sorts, when
really it got the `LFInfo` of an imported Id from its `lf_info` field
when this existed, and otherwise returned a conservative estimate of
that imported Id's LFInfo. This in contrast to functions such as
`mkLFReEntrant` which really are about constructing an `LFInfo`.

- - - - -
a4f4b294 by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Enforce invariant on typePrimRepArgs in the types

As part of the documentation effort in !10165 I came across this
invariant on 'typePrimRepArgs' which is easily expressed at the
type-level through a NonEmpty list.

It allowed us to remove one panic.

- - - - -
14358f34 by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Merge outdated Note [Data con representation] into Note [Data constructor representation]

Introduce new Note [Constructor applications in STG] to better support
the merge, and reference it from the relevant bits in the STG syntax.

- - - - -


10 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Runtime/Heap/Inspect.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/StgToByteCode.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/RepType.hs


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -368,18 +368,37 @@ Note [Core letrec invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The Core letrec invariant:
 
-    The right hand sides of all
-      /top-level/ or /recursive/
-    bindings must be of lifted type
-
-    There is one exception to this rule, top-level @let at s are
-    allowed to bind primitive string literals: see
-    Note [Core top-level string literals].
+  The right hand sides of all /top-level/ or /recursive/
+  bindings must be of lifted type
 
 See "Type#type_classification" in GHC.Core.Type
-for the meaning of "lifted" vs. "unlifted").
-
-For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
+for the meaning of "lifted" vs. "unlifted".
+
+For the non-top-level, non-recursive case see
+Note [Core let-can-float invariant].
+
+At top level, however, there are two exceptions to this rule:
+
+(TL1) A top-level binding is allowed to bind primitive string literal,
+      (which is unlifted).  See Note [Core top-level string literals].
+
+(TL2) In Core, we generate a top-level binding for every non-newtype data
+constructor worker or wrapper
+      e.g.   data T = MkT Int
+      we generate
+             MkT :: Int -> T
+             MkT = \x. MkT x
+      (This binding looks recursive, but isn't; it defines a top-level, curried
+      function whose body just allocates and returns the data constructor.)
+
+      But if (a) the data contructor is nullary and (b) the data type is unlifted,
+      this binding is unlifted.
+      e.g.   data S :: UnliftedType where { S1 :: S, S2 :: S -> S }
+      we generate
+             S1 :: S   -- A top-level unlifted binding
+             S1 = S1
+      We allow this top-level unlifted binding to exist, after CorePrep
+      only.
 
 Note [Core let-can-float invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -141,7 +141,19 @@ becomes
         case e of { T a' b -> let a = I# a' in ... }
 
 To keep ourselves sane, we name the different versions of the data constructor
-differently, as follows.
+differently, as follows in Note [Data Constructor Naming].
+
+The `dcRepType` field of a `DataCon` contains the type of the representation of
+the constructor /worker/, also called the Core representation.
+
+The Core representation may differ from the type of the constructor /wrapper/
+(built by `mkDataConRep`). Besides unpacking (as seen in the example above),
+dictionaries and coercions become explict arguments in the Core representation
+of a constructor.
+
+Note that this representation is still *different* from runtime
+representation. (Which is what STG uses after unarise).
+See Note [Constructor applications in STG] in GHC.Stg.Syntax.
 
 
 Note [Data Constructor Naming]
@@ -209,7 +221,8 @@ Note [Data constructor workers and wrappers]
 * See Note [Data Constructor Naming] for how the worker and wrapper
   are named
 
-* Neither_ the worker _nor_ the wrapper take the dcStupidTheta dicts as arguments
+* The workers don't take the dcStupidTheta dicts as arguments, while the
+  wrappers currently do
 
 * The wrapper (if it exists) takes dcOrigArgTys as its arguments.
   The worker takes dataConRepArgTys as its arguments
@@ -528,7 +541,7 @@ data DataCon
                                 --      forall a x y. (a~(x,y), x~y, Ord x) =>
                                 --        x -> y -> T a
                                 -- (this is *not* of the constructor wrapper Id:
-                                --  see Note [Data con representation] below)
+                                --  see Note [Data constructor representation])
         -- Notice that the existential type parameters come *second*.
         -- Reason: in a case expression we may find:
         --      case (e :: T t) of
@@ -988,51 +1001,6 @@ we consult HsImplBang:
 The boolean flag is used only for this warning.
 See #11270 for motivation.
 
-Note [Data con representation]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The dcRepType field contains the type of the representation of a constructor
-This may differ from the type of the constructor *Id* (built
-by MkId.mkDataConId) for two reasons:
-        a) the constructor Id may be overloaded, but the dictionary isn't stored
-           e.g.    data Eq a => T a = MkT a a
-
-        b) the constructor may store an unboxed version of a strict field.
-
-So whenever this module talks about the representation of a data constructor
-what it means is the DataCon with all Unpacking having been applied.
-We can think of this as the Core representation.
-
-Here's an example illustrating the Core representation:
-        data Ord a => T a = MkT Int! a Void#
-Here
-        T :: Ord a => Int -> a -> Void# -> T a
-but the rep type is
-        Trep :: Int# -> a -> Void# -> T a
-Actually, the unboxed part isn't implemented yet!
-
-Note that this representation is still *different* from runtime
-representation. (Which is what STG uses after unarise).
-
-This is how T would end up being used in STG post-unarise:
-
-  let x = T 1# y
-  in ...
-      case x of
-        T int a -> ...
-
-The Void# argument is dropped and the boxed int is replaced by an unboxed
-one. In essence we only generate binders for runtime relevant values.
-
-We also flatten out unboxed tuples in this process. See the unarise
-pass for details on how this is done. But as an example consider
-`data S = MkS Bool (# Bool | Char #)` which when matched on would
-result in an alternative with three binders like this
-
-    MkS bool tag tpl_field ->
-
-See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
-for the details of this transformation.
-
 
 ************************************************************************
 *                                                                      *


=====================================
compiler/GHC/Runtime/Heap/Inspect.hs
=====================================
@@ -889,12 +889,12 @@ extractSubTerms recurse clos = liftM thdOf3 . go 0 0
            return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
       | otherwise
       = case typePrimRepArgs ty of
-          [rep_ty] ->  do
+          rep_ty :| [] ->  do
             (ptr_i, arr_i, term0)  <- go_rep ptr_i arr_i ty rep_ty
             (ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
             return (ptr_i, arr_i, term0 : terms1)
-          rep_tys -> do
-           (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i rep_tys
+          rep_ty :| rep_tys -> do
+           (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i (rep_ty:rep_tys)
            (ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
            return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
 


=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -36,7 +36,7 @@ import GHC.Core            ( AltCon(..) )
 import GHC.Core.Type
 
 import GHC.StgToCmm.Types
-import GHC.StgToCmm.Closure (mkLFImported)
+import GHC.StgToCmm.Closure (importedIdLFInfo)
 
 import GHC.Stg.Utils
 import GHC.Stg.Syntax as StgSyn
@@ -275,7 +275,7 @@ isTagged v = do
         False -> return $!
                 -- Determine whether it is tagged from the LFInfo of the imported id.
                 -- See Note [The LFInfo of Imported Ids]
-                case mkLFImported v of
+                case importedIdLFInfo v of
                     -- Function, applied not entered.
                     LFReEntrant {}
                         -> True


=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -237,6 +237,52 @@ StgConApp and StgPrimApp --- saturated applications
 
 There are specialised forms of application, for constructors, primitives, and
 literals.
+
+Note [Constructor applications in STG]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+After the unarisation pass:
+* In `StgConApp` and `StgRhsCon` and `StgAlt` we filter out the void arguments,
+  leaving only non-void ones.
+* In `StgApp` and `StgOpApp` we retain void arguments.
+
+We can do this because we know that `StgConApp` and `StgRhsCon` are saturated applications,
+so we lose no information by dropping those void args.  In contrast, in `StgApp` we need the
+ void argument to compare the number of args in the call with the arity of the function.
+
+This is an open design choice.  We could instead choose to treat all these applications
+consistently (keeping the void args).  But for some reason we don't, and this Note simply
+documents that design choice.
+
+As an example, consider:
+
+        data T a = MkT Int! a Void#
+
+The wrapper's representation and the worker's representation (i.e. the
+datacon's Core representation) are respectively:
+
+        $WT :: Int  -> a -> Void# -> T a
+        T   :: Int# -> a -> Void# -> T a
+
+T would end up being used in STG post-unarise as:
+
+  let x = T 1# y
+  in ...
+      case x of
+        T int a -> ...
+
+The Void# argument is dropped. In essence we only generate binders for runtime
+relevant values.
+
+We also flatten out unboxed tuples in this process. See the unarise
+pass for details on how this is done. But as an example consider
+`data S = MkS Bool (# Bool | Char #)` which when matched on would
+result in an alternative with three binders like this
+
+    MkS bool tag tpl_field ->
+
+See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
+for the details of this transformation.
+
 -}
 
   | StgLit      Literal
@@ -245,7 +291,7 @@ literals.
         -- which can't be let-bound
   | StgConApp   DataCon
                 ConstructorNumber
-                [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
+                [StgArg] -- Saturated. See Note [Constructor applications in STG]
                 [Type]   -- See Note [Types in StgConApp] in GHC.Stg.Unarise
 
   | StgOpApp    StgOp    -- Primitive op or foreign call
@@ -422,7 +468,7 @@ important):
                         -- are not allocated.
         ConstructorNumber
         [StgTickish]
-        [StgArg]        -- Args
+        [StgArg]        -- Saturated Args. See Note [Constructor applications in STG]
         Type            -- Type, for rewriting to an StgRhsClosure
 
 -- | Like 'GHC.Hs.Extension.NoExtField', but with an 'Outputable' instance that


=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -81,8 +81,10 @@ import Data.Coerce (coerce)
 import Data.ByteString (ByteString)
 import Data.Map (Map)
 import Data.IntMap (IntMap)
+import Data.List.NonEmpty (NonEmpty(..))
 import qualified Data.Map as Map
 import qualified Data.IntMap as IntMap
+import qualified Data.List.NonEmpty as NE
 import qualified GHC.Data.FiniteMap as Map
 import Data.Ord
 import GHC.Stack.CCS
@@ -296,8 +298,8 @@ argBits platform (rep : args)
   | isFollowableArg rep  = False : argBits platform args
   | otherwise = replicate (argRepSizeW platform rep) True ++ argBits platform args
 
-non_void :: [ArgRep] -> [ArgRep]
-non_void = filter nv
+non_void :: NonEmpty ArgRep -> [ArgRep]
+non_void = NE.filter nv
   where nv V = False
         nv _ = True
 
@@ -464,7 +466,7 @@ returnUnliftedAtom d s p e = do
                  StgLitArg lit -> typePrimRepArgs (literalType lit)
                  StgVarArg i   -> bcIdPrimReps i
     (push, szb) <- pushAtom d p e
-    ret <- returnUnliftedReps d s szb reps
+    ret <- returnUnliftedReps d s szb (NE.toList $! reps)
     return (push `appOL` ret)
 
 -- return an unlifted value from the top of the stack
@@ -867,7 +869,7 @@ doCase d s p scrut bndr alts
         (bndr_size, call_info, args_offsets)
            | ubx_tuple_frame =
                let bndr_ty = primRepCmmType platform
-                   bndr_reps = filter (not.isVoidRep) (bcIdPrimReps bndr)
+                   bndr_reps = NE.filter (not.isVoidRep) (bcIdPrimReps bndr)
                    (call_info, args_offsets) =
                        layoutNativeCall profile NativeTupleReturn 0 bndr_ty bndr_reps
                in ( wordsToBytes platform (nativeCallSize call_info)
@@ -1660,9 +1662,8 @@ maybe_getCCallReturnRep fn_ty
                          (pprType fn_ty)
      in
        case r_reps of
-         []            -> panic "empty typePrimRepArgs"
-         [VoidRep]     -> Nothing
-         [rep]         -> Just rep
+         VoidRep :| [] -> Nothing
+         rep     :| [] -> Just rep
 
                  -- if it was, it would be impossible to create a
                  -- valid return value placeholder on the stack
@@ -2117,7 +2118,7 @@ idSizeCon platform var
     isUnboxedSumType (idType var) =
     wordsToBytes platform .
     WordOff . sum . map (argRepSizeW platform . toArgRep platform) .
-    bcIdPrimReps $ var
+    NE.toList . bcIdPrimReps $ var
   | otherwise = ByteOff (primRepSizeB platform (bcIdPrimRep var))
 
 bcIdArgRep :: Platform -> Id -> ArgRep
@@ -2125,13 +2126,13 @@ bcIdArgRep platform = toArgRep platform . bcIdPrimRep
 
 bcIdPrimRep :: Id -> PrimRep
 bcIdPrimRep id
-  | [rep] <- typePrimRepArgs (idType id)
+  | rep :| [] <- typePrimRepArgs (idType id)
   = rep
   | otherwise
   = pprPanic "bcIdPrimRep" (ppr id <+> dcolon <+> ppr (idType id))
 
 
-bcIdPrimReps :: Id -> [PrimRep]
+bcIdPrimReps :: Id -> NonEmpty PrimRep
 bcIdPrimReps id = typePrimRepArgs (idType id)
 
 repSizeWords :: Platform -> PrimRep -> WordOff
@@ -2189,8 +2190,8 @@ atomRep platform e = toArgRep platform (atomPrimRep e)
 mkStackOffsets :: ByteOff -> [ByteOff] -> [ByteOff]
 mkStackOffsets original_depth szsb = tail (scanl' (+) original_depth szsb)
 
-typeArgReps :: Platform -> Type -> [ArgRep]
-typeArgReps platform = map (toArgRep platform) . typePrimRepArgs
+typeArgReps :: Platform -> Type -> NonEmpty ArgRep
+typeArgReps platform = NE.map (toArgRep platform) . typePrimRepArgs
 
 -- -----------------------------------------------------------------------------
 -- The bytecode generator's monad


=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -28,7 +28,7 @@ module GHC.StgToCmm.Closure (
         LambdaFormInfo,         -- Abstract
         StandardFormInfo,        -- ...ditto...
         mkLFThunk, mkLFReEntrant, mkConLFInfo, mkSelectorLFInfo,
-        mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape,
+        mkApLFInfo, importedIdLFInfo, mkLFArgument, mkLFLetNoEscape,
         mkLFStringLit,
         lfDynTag,
         isLFThunk, isLFReEntrant, lfUpdatable,
@@ -256,10 +256,10 @@ mkApLFInfo id upd_flag arity
         (mightBeFunTy (idType id))
 
 -------------
--- | Make a 'LambdaFormInfo' for an imported Id.
+-- | The 'LambdaFormInfo' of an imported Id.
 --   See Note [The LFInfo of Imported Ids]
-mkLFImported :: Id -> LambdaFormInfo
-mkLFImported id =
+importedIdLFInfo :: Id -> LambdaFormInfo
+importedIdLFInfo id =
     -- See Note [Conveying CAF-info and LFInfo between modules] in
     -- GHC.StgToCmm.Types
     case idLFInfo_maybe id of
@@ -305,7 +305,7 @@ In particular, saturated data constructor applications *must* be unambiguously
 given `LFCon`, and if the LFInfo says LFCon, then it really is a static data
 constructor, and similar for LFReEntrant.
 
-In `mkLFImported`, we construct a LambdaFormInfo for imported Ids as follows:
+In `importedIdLFInfo`, we construct a LambdaFormInfo for imported Ids as follows:
 
 (1) If the `lfInfo` field contains an LFInfo, we use that LFInfo which is
 correct by construction (the invariant being that if it exists, it is correct):


=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -149,7 +149,7 @@ getCgIdInfo id
                       | otherwise
                       = pprPanic "GHC.StgToCmm.Env: label not found" (ppr id <+> dcolon <+> ppr (idType id))
               in return $
-                  litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
+                  litIdInfo platform id (importedIdLFInfo id) (CmmLabel ext_lbl)
           else
               cgLookupPanic id -- Bug, id is neither in local binds nor is external
         }}}


=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -53,7 +53,7 @@ make a conservative assumption, but that is bad: e.g.
   #16559, #15155, and wiki: commentary/rts/haskell-execution/pointer-tagging
 
   Conservative assumption here is made when we import an Id without a
-  LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.mkLFImported.
+  LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.importedIdLFInfo.
 
 So we arrange to always serialise this information into the interface file.  The
 moving parts are:


=====================================
compiler/GHC/Types/RepType.hs
=====================================
@@ -84,12 +84,11 @@ isNvUnaryType ty
   = False
 
 -- INVARIANT: the result list is never empty.
-typePrimRepArgs :: HasDebugCallStack => Type -> [PrimRep]
+typePrimRepArgs :: HasDebugCallStack => Type -> NonEmpty PrimRep
 typePrimRepArgs ty
-  | [] <- reps
-  = [VoidRep]
-  | otherwise
-  = reps
+  = case reps of
+      [] -> VoidRep :| []
+      (x:xs) ->   x :| xs
   where
     reps = typePrimRep ty
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/43b828178afb8eea69db223181fe2cc70f72a642...14358f34c34c27d20cc5b676e0b62a069a45d40f

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/43b828178afb8eea69db223181fe2cc70f72a642...14358f34c34c27d20cc5b676e0b62a069a45d40f
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230508/ce6517ab/attachment-0001.html>


More information about the ghc-commits mailing list