[Git][ghc/ghc][wip/T23146] 4 commits: Update Note [Core letrec invariant]
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Mon May 8 09:28:49 UTC 2023
Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC
Commits:
c7160662 by Rodrigo Mesquita at 2023-05-08T10:28:33+01:00
Update Note [Core letrec invariant]
Authored by @simonpj
- - - - -
ab848cfd by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Rename mkLFImported to importedIdLFInfo
The `mkLFImported` sounded too much like a constructor of sorts, when
really it got the `LFInfo` of an imported Id from its `lf_info` field
when this existed, and otherwise returned a conservative estimate of
that imported Id's LFInfo. This in contrast to functions such as
`mkLFReEntrant` which really are about constructing an `LFInfo`.
- - - - -
a4f4b294 by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Enforce invariant on typePrimRepArgs in the types
As part of the documentation effort in !10165 I came across this
invariant on 'typePrimRepArgs' which is easily expressed at the
type-level through a NonEmpty list.
It allowed us to remove one panic.
- - - - -
14358f34 by Rodrigo Mesquita at 2023-05-08T10:28:36+01:00
Merge outdated Note [Data con representation] into Note [Data constructor representation]
Introduce new Note [Constructor applications in STG] to better support
the merge, and reference it from the relevant bits in the STG syntax.
- - - - -
10 changed files:
- compiler/GHC/Core.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Runtime/Heap/Inspect.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/StgToByteCode.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/RepType.hs
Changes:
=====================================
compiler/GHC/Core.hs
=====================================
@@ -368,18 +368,37 @@ Note [Core letrec invariant]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Core letrec invariant:
- The right hand sides of all
- /top-level/ or /recursive/
- bindings must be of lifted type
-
- There is one exception to this rule, top-level @let at s are
- allowed to bind primitive string literals: see
- Note [Core top-level string literals].
+ The right hand sides of all /top-level/ or /recursive/
+ bindings must be of lifted type
See "Type#type_classification" in GHC.Core.Type
-for the meaning of "lifted" vs. "unlifted").
-
-For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
+for the meaning of "lifted" vs. "unlifted".
+
+For the non-top-level, non-recursive case see
+Note [Core let-can-float invariant].
+
+At top level, however, there are two exceptions to this rule:
+
+(TL1) A top-level binding is allowed to bind primitive string literal,
+ (which is unlifted). See Note [Core top-level string literals].
+
+(TL2) In Core, we generate a top-level binding for every non-newtype data
+constructor worker or wrapper
+ e.g. data T = MkT Int
+ we generate
+ MkT :: Int -> T
+ MkT = \x. MkT x
+ (This binding looks recursive, but isn't; it defines a top-level, curried
+ function whose body just allocates and returns the data constructor.)
+
+ But if (a) the data contructor is nullary and (b) the data type is unlifted,
+ this binding is unlifted.
+ e.g. data S :: UnliftedType where { S1 :: S, S2 :: S -> S }
+ we generate
+ S1 :: S -- A top-level unlifted binding
+ S1 = S1
+ We allow this top-level unlifted binding to exist, after CorePrep
+ only.
Note [Core let-can-float invariant]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -141,7 +141,19 @@ becomes
case e of { T a' b -> let a = I# a' in ... }
To keep ourselves sane, we name the different versions of the data constructor
-differently, as follows.
+differently, as follows in Note [Data Constructor Naming].
+
+The `dcRepType` field of a `DataCon` contains the type of the representation of
+the constructor /worker/, also called the Core representation.
+
+The Core representation may differ from the type of the constructor /wrapper/
+(built by `mkDataConRep`). Besides unpacking (as seen in the example above),
+dictionaries and coercions become explict arguments in the Core representation
+of a constructor.
+
+Note that this representation is still *different* from runtime
+representation. (Which is what STG uses after unarise).
+See Note [Constructor applications in STG] in GHC.Stg.Syntax.
Note [Data Constructor Naming]
@@ -209,7 +221,8 @@ Note [Data constructor workers and wrappers]
* See Note [Data Constructor Naming] for how the worker and wrapper
are named
-* Neither_ the worker _nor_ the wrapper take the dcStupidTheta dicts as arguments
+* The workers don't take the dcStupidTheta dicts as arguments, while the
+ wrappers currently do
* The wrapper (if it exists) takes dcOrigArgTys as its arguments.
The worker takes dataConRepArgTys as its arguments
@@ -528,7 +541,7 @@ data DataCon
-- forall a x y. (a~(x,y), x~y, Ord x) =>
-- x -> y -> T a
-- (this is *not* of the constructor wrapper Id:
- -- see Note [Data con representation] below)
+ -- see Note [Data constructor representation])
-- Notice that the existential type parameters come *second*.
-- Reason: in a case expression we may find:
-- case (e :: T t) of
@@ -988,51 +1001,6 @@ we consult HsImplBang:
The boolean flag is used only for this warning.
See #11270 for motivation.
-Note [Data con representation]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The dcRepType field contains the type of the representation of a constructor
-This may differ from the type of the constructor *Id* (built
-by MkId.mkDataConId) for two reasons:
- a) the constructor Id may be overloaded, but the dictionary isn't stored
- e.g. data Eq a => T a = MkT a a
-
- b) the constructor may store an unboxed version of a strict field.
-
-So whenever this module talks about the representation of a data constructor
-what it means is the DataCon with all Unpacking having been applied.
-We can think of this as the Core representation.
-
-Here's an example illustrating the Core representation:
- data Ord a => T a = MkT Int! a Void#
-Here
- T :: Ord a => Int -> a -> Void# -> T a
-but the rep type is
- Trep :: Int# -> a -> Void# -> T a
-Actually, the unboxed part isn't implemented yet!
-
-Note that this representation is still *different* from runtime
-representation. (Which is what STG uses after unarise).
-
-This is how T would end up being used in STG post-unarise:
-
- let x = T 1# y
- in ...
- case x of
- T int a -> ...
-
-The Void# argument is dropped and the boxed int is replaced by an unboxed
-one. In essence we only generate binders for runtime relevant values.
-
-We also flatten out unboxed tuples in this process. See the unarise
-pass for details on how this is done. But as an example consider
-`data S = MkS Bool (# Bool | Char #)` which when matched on would
-result in an alternative with three binders like this
-
- MkS bool tag tpl_field ->
-
-See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
-for the details of this transformation.
-
************************************************************************
* *
=====================================
compiler/GHC/Runtime/Heap/Inspect.hs
=====================================
@@ -889,12 +889,12 @@ extractSubTerms recurse clos = liftM thdOf3 . go 0 0
return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
| otherwise
= case typePrimRepArgs ty of
- [rep_ty] -> do
+ rep_ty :| [] -> do
(ptr_i, arr_i, term0) <- go_rep ptr_i arr_i ty rep_ty
(ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
return (ptr_i, arr_i, term0 : terms1)
- rep_tys -> do
- (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i rep_tys
+ rep_ty :| rep_tys -> do
+ (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i (rep_ty:rep_tys)
(ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -36,7 +36,7 @@ import GHC.Core ( AltCon(..) )
import GHC.Core.Type
import GHC.StgToCmm.Types
-import GHC.StgToCmm.Closure (mkLFImported)
+import GHC.StgToCmm.Closure (importedIdLFInfo)
import GHC.Stg.Utils
import GHC.Stg.Syntax as StgSyn
@@ -275,7 +275,7 @@ isTagged v = do
False -> return $!
-- Determine whether it is tagged from the LFInfo of the imported id.
-- See Note [The LFInfo of Imported Ids]
- case mkLFImported v of
+ case importedIdLFInfo v of
-- Function, applied not entered.
LFReEntrant {}
-> True
=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -237,6 +237,52 @@ StgConApp and StgPrimApp --- saturated applications
There are specialised forms of application, for constructors, primitives, and
literals.
+
+Note [Constructor applications in STG]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+After the unarisation pass:
+* In `StgConApp` and `StgRhsCon` and `StgAlt` we filter out the void arguments,
+ leaving only non-void ones.
+* In `StgApp` and `StgOpApp` we retain void arguments.
+
+We can do this because we know that `StgConApp` and `StgRhsCon` are saturated applications,
+so we lose no information by dropping those void args. In contrast, in `StgApp` we need the
+ void argument to compare the number of args in the call with the arity of the function.
+
+This is an open design choice. We could instead choose to treat all these applications
+consistently (keeping the void args). But for some reason we don't, and this Note simply
+documents that design choice.
+
+As an example, consider:
+
+ data T a = MkT Int! a Void#
+
+The wrapper's representation and the worker's representation (i.e. the
+datacon's Core representation) are respectively:
+
+ $WT :: Int -> a -> Void# -> T a
+ T :: Int# -> a -> Void# -> T a
+
+T would end up being used in STG post-unarise as:
+
+ let x = T 1# y
+ in ...
+ case x of
+ T int a -> ...
+
+The Void# argument is dropped. In essence we only generate binders for runtime
+relevant values.
+
+We also flatten out unboxed tuples in this process. See the unarise
+pass for details on how this is done. But as an example consider
+`data S = MkS Bool (# Bool | Char #)` which when matched on would
+result in an alternative with three binders like this
+
+ MkS bool tag tpl_field ->
+
+See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
+for the details of this transformation.
+
-}
| StgLit Literal
@@ -245,7 +291,7 @@ literals.
-- which can't be let-bound
| StgConApp DataCon
ConstructorNumber
- [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
+ [StgArg] -- Saturated. See Note [Constructor applications in STG]
[Type] -- See Note [Types in StgConApp] in GHC.Stg.Unarise
| StgOpApp StgOp -- Primitive op or foreign call
@@ -422,7 +468,7 @@ important):
-- are not allocated.
ConstructorNumber
[StgTickish]
- [StgArg] -- Args
+ [StgArg] -- Saturated Args. See Note [Constructor applications in STG]
Type -- Type, for rewriting to an StgRhsClosure
-- | Like 'GHC.Hs.Extension.NoExtField', but with an 'Outputable' instance that
=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -81,8 +81,10 @@ import Data.Coerce (coerce)
import Data.ByteString (ByteString)
import Data.Map (Map)
import Data.IntMap (IntMap)
+import Data.List.NonEmpty (NonEmpty(..))
import qualified Data.Map as Map
import qualified Data.IntMap as IntMap
+import qualified Data.List.NonEmpty as NE
import qualified GHC.Data.FiniteMap as Map
import Data.Ord
import GHC.Stack.CCS
@@ -296,8 +298,8 @@ argBits platform (rep : args)
| isFollowableArg rep = False : argBits platform args
| otherwise = replicate (argRepSizeW platform rep) True ++ argBits platform args
-non_void :: [ArgRep] -> [ArgRep]
-non_void = filter nv
+non_void :: NonEmpty ArgRep -> [ArgRep]
+non_void = NE.filter nv
where nv V = False
nv _ = True
@@ -464,7 +466,7 @@ returnUnliftedAtom d s p e = do
StgLitArg lit -> typePrimRepArgs (literalType lit)
StgVarArg i -> bcIdPrimReps i
(push, szb) <- pushAtom d p e
- ret <- returnUnliftedReps d s szb reps
+ ret <- returnUnliftedReps d s szb (NE.toList $! reps)
return (push `appOL` ret)
-- return an unlifted value from the top of the stack
@@ -867,7 +869,7 @@ doCase d s p scrut bndr alts
(bndr_size, call_info, args_offsets)
| ubx_tuple_frame =
let bndr_ty = primRepCmmType platform
- bndr_reps = filter (not.isVoidRep) (bcIdPrimReps bndr)
+ bndr_reps = NE.filter (not.isVoidRep) (bcIdPrimReps bndr)
(call_info, args_offsets) =
layoutNativeCall profile NativeTupleReturn 0 bndr_ty bndr_reps
in ( wordsToBytes platform (nativeCallSize call_info)
@@ -1660,9 +1662,8 @@ maybe_getCCallReturnRep fn_ty
(pprType fn_ty)
in
case r_reps of
- [] -> panic "empty typePrimRepArgs"
- [VoidRep] -> Nothing
- [rep] -> Just rep
+ VoidRep :| [] -> Nothing
+ rep :| [] -> Just rep
-- if it was, it would be impossible to create a
-- valid return value placeholder on the stack
@@ -2117,7 +2118,7 @@ idSizeCon platform var
isUnboxedSumType (idType var) =
wordsToBytes platform .
WordOff . sum . map (argRepSizeW platform . toArgRep platform) .
- bcIdPrimReps $ var
+ NE.toList . bcIdPrimReps $ var
| otherwise = ByteOff (primRepSizeB platform (bcIdPrimRep var))
bcIdArgRep :: Platform -> Id -> ArgRep
@@ -2125,13 +2126,13 @@ bcIdArgRep platform = toArgRep platform . bcIdPrimRep
bcIdPrimRep :: Id -> PrimRep
bcIdPrimRep id
- | [rep] <- typePrimRepArgs (idType id)
+ | rep :| [] <- typePrimRepArgs (idType id)
= rep
| otherwise
= pprPanic "bcIdPrimRep" (ppr id <+> dcolon <+> ppr (idType id))
-bcIdPrimReps :: Id -> [PrimRep]
+bcIdPrimReps :: Id -> NonEmpty PrimRep
bcIdPrimReps id = typePrimRepArgs (idType id)
repSizeWords :: Platform -> PrimRep -> WordOff
@@ -2189,8 +2190,8 @@ atomRep platform e = toArgRep platform (atomPrimRep e)
mkStackOffsets :: ByteOff -> [ByteOff] -> [ByteOff]
mkStackOffsets original_depth szsb = tail (scanl' (+) original_depth szsb)
-typeArgReps :: Platform -> Type -> [ArgRep]
-typeArgReps platform = map (toArgRep platform) . typePrimRepArgs
+typeArgReps :: Platform -> Type -> NonEmpty ArgRep
+typeArgReps platform = NE.map (toArgRep platform) . typePrimRepArgs
-- -----------------------------------------------------------------------------
-- The bytecode generator's monad
=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -28,7 +28,7 @@ module GHC.StgToCmm.Closure (
LambdaFormInfo, -- Abstract
StandardFormInfo, -- ...ditto...
mkLFThunk, mkLFReEntrant, mkConLFInfo, mkSelectorLFInfo,
- mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape,
+ mkApLFInfo, importedIdLFInfo, mkLFArgument, mkLFLetNoEscape,
mkLFStringLit,
lfDynTag,
isLFThunk, isLFReEntrant, lfUpdatable,
@@ -256,10 +256,10 @@ mkApLFInfo id upd_flag arity
(mightBeFunTy (idType id))
-------------
--- | Make a 'LambdaFormInfo' for an imported Id.
+-- | The 'LambdaFormInfo' of an imported Id.
-- See Note [The LFInfo of Imported Ids]
-mkLFImported :: Id -> LambdaFormInfo
-mkLFImported id =
+importedIdLFInfo :: Id -> LambdaFormInfo
+importedIdLFInfo id =
-- See Note [Conveying CAF-info and LFInfo between modules] in
-- GHC.StgToCmm.Types
case idLFInfo_maybe id of
@@ -305,7 +305,7 @@ In particular, saturated data constructor applications *must* be unambiguously
given `LFCon`, and if the LFInfo says LFCon, then it really is a static data
constructor, and similar for LFReEntrant.
-In `mkLFImported`, we construct a LambdaFormInfo for imported Ids as follows:
+In `importedIdLFInfo`, we construct a LambdaFormInfo for imported Ids as follows:
(1) If the `lfInfo` field contains an LFInfo, we use that LFInfo which is
correct by construction (the invariant being that if it exists, it is correct):
=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -149,7 +149,7 @@ getCgIdInfo id
| otherwise
= pprPanic "GHC.StgToCmm.Env: label not found" (ppr id <+> dcolon <+> ppr (idType id))
in return $
- litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
+ litIdInfo platform id (importedIdLFInfo id) (CmmLabel ext_lbl)
else
cgLookupPanic id -- Bug, id is neither in local binds nor is external
}}}
=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -53,7 +53,7 @@ make a conservative assumption, but that is bad: e.g.
#16559, #15155, and wiki: commentary/rts/haskell-execution/pointer-tagging
Conservative assumption here is made when we import an Id without a
- LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.mkLFImported.
+ LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.importedIdLFInfo.
So we arrange to always serialise this information into the interface file. The
moving parts are:
=====================================
compiler/GHC/Types/RepType.hs
=====================================
@@ -84,12 +84,11 @@ isNvUnaryType ty
= False
-- INVARIANT: the result list is never empty.
-typePrimRepArgs :: HasDebugCallStack => Type -> [PrimRep]
+typePrimRepArgs :: HasDebugCallStack => Type -> NonEmpty PrimRep
typePrimRepArgs ty
- | [] <- reps
- = [VoidRep]
- | otherwise
- = reps
+ = case reps of
+ [] -> VoidRep :| []
+ (x:xs) -> x :| xs
where
reps = typePrimRep ty
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/43b828178afb8eea69db223181fe2cc70f72a642...14358f34c34c27d20cc5b676e0b62a069a45d40f
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/43b828178afb8eea69db223181fe2cc70f72a642...14358f34c34c27d20cc5b676e0b62a069a45d40f
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230508/ce6517ab/attachment-0001.html>
More information about the ghc-commits
mailing list