[Git][ghc/ghc][wip/T23146] 5 commits: Make LFInfos for DataCons on construction
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Tue May 16 13:57:53 UTC 2023
Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC
Commits:
831beea7 by Rodrigo Mesquita at 2023-05-16T14:57:36+01:00
Make LFInfos for DataCons on construction
As a result of the discussion in !10165, we decided to amend the
previous commit which fixed the logic of `mkLFImported` with regard to
datacon workers and wrappers.
Instead of having the logic for the LFInfo of datacons be in
`mkLFImported`, we now construct an LFInfo for all data constructors on
GHC.Types.Id.Make and store it in the `lfInfo` field.
See the new Note [LFInfo of DataCon workers and wrappers] and
ammendments to Note [The LFInfo of Imported Ids]
- - - - -
f4233bf1 by Rodrigo Mesquita at 2023-05-16T14:57:39+01:00
Update Note [Core letrec invariant]
Authored by @simonpj
- - - - -
a3528fad by Rodrigo Mesquita at 2023-05-16T14:57:39+01:00
Rename mkLFImported to importedIdLFInfo
The `mkLFImported` sounded too much like a constructor of sorts, when
really it got the `LFInfo` of an imported Id from its `lf_info` field
when this existed, and otherwise returned a conservative estimate of
that imported Id's LFInfo. This in contrast to functions such as
`mkLFReEntrant` which really are about constructing an `LFInfo`.
- - - - -
aeba0b7a by Rodrigo Mesquita at 2023-05-16T14:57:39+01:00
Enforce invariant on typePrimRepArgs in the types
As part of the documentation effort in !10165 I came across this
invariant on 'typePrimRepArgs' which is easily expressed at the
type-level through a NonEmpty list.
It allowed us to remove one panic.
- - - - -
efacf82b by Rodrigo Mesquita at 2023-05-16T14:57:39+01:00
Merge outdated Note [Data con representation] into Note [Data constructor representation]
Introduce new Note [Constructor applications in STG] to better support
the merge, and reference it from the relevant bits in the STG syntax.
- - - - -
12 changed files:
- compiler/GHC/Core.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Runtime/Heap/Inspect.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/StgToByteCode.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id/Info.hs
- compiler/GHC/Types/Id/Make.hs
- compiler/GHC/Types/RepType.hs
Changes:
=====================================
compiler/GHC/Core.hs
=====================================
@@ -368,18 +368,37 @@ Note [Core letrec invariant]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Core letrec invariant:
- The right hand sides of all
- /top-level/ or /recursive/
- bindings must be of lifted type
-
- There is one exception to this rule, top-level @let at s are
- allowed to bind primitive string literals: see
- Note [Core top-level string literals].
+ The right hand sides of all /top-level/ or /recursive/
+ bindings must be of lifted type
See "Type#type_classification" in GHC.Core.Type
-for the meaning of "lifted" vs. "unlifted").
-
-For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
+for the meaning of "lifted" vs. "unlifted".
+
+For the non-top-level, non-recursive case see
+Note [Core let-can-float invariant].
+
+At top level, however, there are two exceptions to this rule:
+
+(TL1) A top-level binding is allowed to bind primitive string literal,
+ (which is unlifted). See Note [Core top-level string literals].
+
+(TL2) In Core, we generate a top-level binding for every non-newtype data
+constructor worker or wrapper
+ e.g. data T = MkT Int
+ we generate
+ MkT :: Int -> T
+ MkT = \x. MkT x
+ (This binding looks recursive, but isn't; it defines a top-level, curried
+ function whose body just allocates and returns the data constructor.)
+
+ But if (a) the data contructor is nullary and (b) the data type is unlifted,
+ this binding is unlifted.
+ e.g. data S :: UnliftedType where { S1 :: S, S2 :: S -> S }
+ we generate
+ S1 :: S -- A top-level unlifted binding
+ S1 = S1
+ We allow this top-level unlifted binding to exist, after CorePrep
+ only.
Note [Core let-can-float invariant]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -141,7 +141,19 @@ becomes
case e of { T a' b -> let a = I# a' in ... }
To keep ourselves sane, we name the different versions of the data constructor
-differently, as follows.
+differently, as follows in Note [Data Constructor Naming].
+
+The `dcRepType` field of a `DataCon` contains the type of the representation of
+the constructor /worker/, also called the Core representation.
+
+The Core representation may differ from the type of the constructor /wrapper/
+(built by `mkDataConRep`). Besides unpacking (as seen in the example above),
+dictionaries and coercions become explict arguments in the Core representation
+of a constructor.
+
+Note that this representation is still *different* from runtime
+representation. (Which is what STG uses after unarise).
+See Note [Constructor applications in STG] in GHC.Stg.Syntax.
Note [Data Constructor Naming]
@@ -209,7 +221,8 @@ Note [Data constructor workers and wrappers]
* See Note [Data Constructor Naming] for how the worker and wrapper
are named
-* Neither_ the worker _nor_ the wrapper take the dcStupidTheta dicts as arguments
+* The workers don't take the dcStupidTheta dicts as arguments, while the
+ wrappers currently do
* The wrapper (if it exists) takes dcOrigArgTys as its arguments.
The worker takes dataConRepArgTys as its arguments
@@ -528,7 +541,7 @@ data DataCon
-- forall a x y. (a~(x,y), x~y, Ord x) =>
-- x -> y -> T a
-- (this is *not* of the constructor wrapper Id:
- -- see Note [Data con representation] below)
+ -- see Note [Data constructor representation])
-- Notice that the existential type parameters come *second*.
-- Reason: in a case expression we may find:
-- case (e :: T t) of
@@ -586,12 +599,22 @@ Function call 'dataConKindEqSpec' returns [k'~k]
Note [DataCon arities]
~~~~~~~~~~~~~~~~~~~~~~
-A `DataCon`'s source arity and core representation arity may differ:
-`dcSourceArity` does not take constraints into account, but `dcRepArity` does.
-
-The additional arguments taken into account by `dcRepArity` include quantified
-dictionaries and coercion arguments, lifted and unlifted (despite the unlifted
-coercion arguments having a zero-width runtime representation).
+A `DataCon`'s source and core representation may differ, meaning the source
+arity (`dcSourceArity`) and the core representation arity (`dcRepArity`) may
+differ too.
+
+Note that the source arity isn't exactly the number of arguments the data con
+/wrapper/ has, since `dcSourceArity` doesn't count constraints -- which may
+appear in the wrapper through `DatatypeContexts`, or if the constructor stores a
+dictionary. In this sense, the source arity counts the number of non-constraint
+arguments that appear at the source level.
+ On the other hand, the Core representation arity is the number of arguments
+of the data constructor in its Core representation, which is also the number
+of arguments of the data con /worker/.
+
+The arity might differ since `dcRepArity` takes into account arguments such as
+quantified dictionaries and coercion arguments, lifted and unlifted (despite
+the unlifted coercion arguments having a zero-width runtime representation).
For example:
MkT :: Ord a => a -> T a
dcSourceArity = 1
@@ -601,6 +624,15 @@ For example:
dcSourceArity = 0
dcRepArity = 1
+The arity might also differ due to unpacking, for example, consider the
+following datatype and its wrapper and worker's type:
+ data V = MkV !() !Int
+ $WV :: () -> Int -> V
+ V :: Int# -> V
+As you see, because of unpacking we have both dropped the unit argument and
+unboxed the Int. In this case, the source arity (which is the arity of the
+wrapper) is 2, while the Core representation arity (the arity of the worker) is 1.
+
Note [DataCon user type variable binders]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -969,51 +1001,6 @@ we consult HsImplBang:
The boolean flag is used only for this warning.
See #11270 for motivation.
-Note [Data con representation]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The dcRepType field contains the type of the representation of a constructor
-This may differ from the type of the constructor *Id* (built
-by MkId.mkDataConId) for two reasons:
- a) the constructor Id may be overloaded, but the dictionary isn't stored
- e.g. data Eq a => T a = MkT a a
-
- b) the constructor may store an unboxed version of a strict field.
-
-So whenever this module talks about the representation of a data constructor
-what it means is the DataCon with all Unpacking having been applied.
-We can think of this as the Core representation.
-
-Here's an example illustrating the Core representation:
- data Ord a => T a = MkT Int! a Void#
-Here
- T :: Ord a => Int -> a -> Void# -> T a
-but the rep type is
- Trep :: Int# -> a -> Void# -> T a
-Actually, the unboxed part isn't implemented yet!
-
-Note that this representation is still *different* from runtime
-representation. (Which is what STG uses after unarise).
-
-This is how T would end up being used in STG post-unarise:
-
- let x = T 1# y
- in ...
- case x of
- T int a -> ...
-
-The Void# argument is dropped and the boxed int is replaced by an unboxed
-one. In essence we only generate binders for runtime relevant values.
-
-We also flatten out unboxed tuples in this process. See the unarise
-pass for details on how this is done. But as an example consider
-`data S = MkS Bool (# Bool | Char #)` which when matched on would
-result in an alternative with three binders like this
-
- MkS bool tag tpl_field ->
-
-See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
-for the details of this transformation.
-
************************************************************************
* *
=====================================
compiler/GHC/Runtime/Heap/Inspect.hs
=====================================
@@ -889,12 +889,12 @@ extractSubTerms recurse clos = liftM thdOf3 . go 0 0
return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
| otherwise
= case typePrimRepArgs ty of
- [rep_ty] -> do
+ rep_ty :| [] -> do
(ptr_i, arr_i, term0) <- go_rep ptr_i arr_i ty rep_ty
(ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
return (ptr_i, arr_i, term0 : terms1)
- rep_tys -> do
- (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i rep_tys
+ rep_ty :| rep_tys -> do
+ (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i (rep_ty:rep_tys)
(ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -36,7 +36,7 @@ import GHC.Core ( AltCon(..) )
import GHC.Core.Type
import GHC.StgToCmm.Types
-import GHC.StgToCmm.Closure (mkLFImported)
+import GHC.StgToCmm.Closure (importedIdLFInfo)
import GHC.Stg.Utils
import GHC.Stg.Syntax as StgSyn
@@ -271,11 +271,11 @@ isTagged v = do
TagProper -> True
TagTagged -> True
TagTuple _ -> True -- Consider unboxed tuples tagged.
- False -- Imported
- -> return $!
+ -- Imported
+ False -> return $!
-- Determine whether it is tagged from the LFInfo of the imported id.
-- See Note [The LFInfo of Imported Ids]
- case mkLFImported v of
+ case importedIdLFInfo v of
-- Function, applied not entered.
LFReEntrant {}
-> True
=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -237,6 +237,52 @@ StgConApp and StgPrimApp --- saturated applications
There are specialised forms of application, for constructors, primitives, and
literals.
+
+Note [Constructor applications in STG]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+After the unarisation pass:
+* In `StgConApp` and `StgRhsCon` and `StgAlt` we filter out the void arguments,
+ leaving only non-void ones.
+* In `StgApp` and `StgOpApp` we retain void arguments.
+
+We can do this because we know that `StgConApp` and `StgRhsCon` are saturated applications,
+so we lose no information by dropping those void args. In contrast, in `StgApp` we need the
+ void argument to compare the number of args in the call with the arity of the function.
+
+This is an open design choice. We could instead choose to treat all these applications
+consistently (keeping the void args). But for some reason we don't, and this Note simply
+documents that design choice.
+
+As an example, consider:
+
+ data T a = MkT Int! a Void#
+
+The wrapper's representation and the worker's representation (i.e. the
+datacon's Core representation) are respectively:
+
+ $WT :: Int -> a -> Void# -> T a
+ T :: Int# -> a -> Void# -> T a
+
+T would end up being used in STG post-unarise as:
+
+ let x = T 1# y
+ in ...
+ case x of
+ T int a -> ...
+
+The Void# argument is dropped. In essence we only generate binders for runtime
+relevant values.
+
+We also flatten out unboxed tuples in this process. See the unarise
+pass for details on how this is done. But as an example consider
+`data S = MkS Bool (# Bool | Char #)` which when matched on would
+result in an alternative with three binders like this
+
+ MkS bool tag tpl_field ->
+
+See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
+for the details of this transformation.
+
-}
| StgLit Literal
@@ -245,7 +291,7 @@ literals.
-- which can't be let-bound
| StgConApp DataCon
ConstructorNumber
- [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
+ [StgArg] -- Saturated. See Note [Constructor applications in STG]
[Type] -- See Note [Types in StgConApp] in GHC.Stg.Unarise
| StgOpApp StgOp -- Primitive op or foreign call
@@ -422,7 +468,7 @@ important):
-- are not allocated.
ConstructorNumber
[StgTickish]
- [StgArg] -- Args
+ [StgArg] -- Saturated Args. See Note [Constructor applications in STG]
Type -- Type, for rewriting to an StgRhsClosure
-- | Like 'GHC.Hs.Extension.NoExtField', but with an 'Outputable' instance that
=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -81,8 +81,10 @@ import Data.Coerce (coerce)
import Data.ByteString (ByteString)
import Data.Map (Map)
import Data.IntMap (IntMap)
+import Data.List.NonEmpty (NonEmpty(..))
import qualified Data.Map as Map
import qualified Data.IntMap as IntMap
+import qualified Data.List.NonEmpty as NE
import qualified GHC.Data.FiniteMap as Map
import Data.Ord
import GHC.Stack.CCS
@@ -296,8 +298,8 @@ argBits platform (rep : args)
| isFollowableArg rep = False : argBits platform args
| otherwise = replicate (argRepSizeW platform rep) True ++ argBits platform args
-non_void :: [ArgRep] -> [ArgRep]
-non_void = filter nv
+non_void :: NonEmpty ArgRep -> [ArgRep]
+non_void = NE.filter nv
where nv V = False
nv _ = True
@@ -464,7 +466,7 @@ returnUnliftedAtom d s p e = do
StgLitArg lit -> typePrimRepArgs (literalType lit)
StgVarArg i -> bcIdPrimReps i
(push, szb) <- pushAtom d p e
- ret <- returnUnliftedReps d s szb reps
+ ret <- returnUnliftedReps d s szb (NE.toList $! reps)
return (push `appOL` ret)
-- return an unlifted value from the top of the stack
@@ -867,7 +869,7 @@ doCase d s p scrut bndr alts
(bndr_size, call_info, args_offsets)
| ubx_tuple_frame =
let bndr_ty = primRepCmmType platform
- bndr_reps = filter (not.isVoidRep) (bcIdPrimReps bndr)
+ bndr_reps = NE.filter (not.isVoidRep) (bcIdPrimReps bndr)
(call_info, args_offsets) =
layoutNativeCall profile NativeTupleReturn 0 bndr_ty bndr_reps
in ( wordsToBytes platform (nativeCallSize call_info)
@@ -1660,9 +1662,8 @@ maybe_getCCallReturnRep fn_ty
(pprType fn_ty)
in
case r_reps of
- [] -> panic "empty typePrimRepArgs"
- [VoidRep] -> Nothing
- [rep] -> Just rep
+ VoidRep :| [] -> Nothing
+ rep :| [] -> Just rep
-- if it was, it would be impossible to create a
-- valid return value placeholder on the stack
@@ -2117,7 +2118,7 @@ idSizeCon platform var
isUnboxedSumType (idType var) =
wordsToBytes platform .
WordOff . sum . map (argRepSizeW platform . toArgRep platform) .
- bcIdPrimReps $ var
+ NE.toList . bcIdPrimReps $ var
| otherwise = ByteOff (primRepSizeB platform (bcIdPrimRep var))
bcIdArgRep :: Platform -> Id -> ArgRep
@@ -2125,13 +2126,13 @@ bcIdArgRep platform = toArgRep platform . bcIdPrimRep
bcIdPrimRep :: Id -> PrimRep
bcIdPrimRep id
- | [rep] <- typePrimRepArgs (idType id)
+ | rep :| [] <- typePrimRepArgs (idType id)
= rep
| otherwise
= pprPanic "bcIdPrimRep" (ppr id <+> dcolon <+> ppr (idType id))
-bcIdPrimReps :: Id -> [PrimRep]
+bcIdPrimReps :: Id -> NonEmpty PrimRep
bcIdPrimReps id = typePrimRepArgs (idType id)
repSizeWords :: Platform -> PrimRep -> WordOff
@@ -2189,8 +2190,8 @@ atomRep platform e = toArgRep platform (atomPrimRep e)
mkStackOffsets :: ByteOff -> [ByteOff] -> [ByteOff]
mkStackOffsets original_depth szsb = tail (scanl' (+) original_depth szsb)
-typeArgReps :: Platform -> Type -> [ArgRep]
-typeArgReps platform = map (toArgRep platform) . typePrimRepArgs
+typeArgReps :: Platform -> Type -> NonEmpty ArgRep
+typeArgReps platform = NE.map (toArgRep platform) . typePrimRepArgs
-- -----------------------------------------------------------------------------
-- The bytecode generator's monad
=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -28,7 +28,7 @@ module GHC.StgToCmm.Closure (
LambdaFormInfo, -- Abstract
StandardFormInfo, -- ...ditto...
mkLFThunk, mkLFReEntrant, mkConLFInfo, mkSelectorLFInfo,
- mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape,
+ mkApLFInfo, importedIdLFInfo, mkLFArgument, mkLFLetNoEscape,
mkLFStringLit,
lfDynTag,
isLFThunk, isLFReEntrant, lfUpdatable,
@@ -96,6 +96,7 @@ import GHC.Utils.Outputable
import GHC.Utils.Panic
import GHC.Utils.Panic.Plain
import GHC.Utils.Misc
+import GHC.Data.Maybe (isNothing)
import Data.Coerce (coerce)
import qualified Data.ByteString.Char8 as BS8
@@ -255,130 +256,67 @@ mkApLFInfo id upd_flag arity
(mightBeFunTy (idType id))
-------------
-mkLFImported :: Id -> LambdaFormInfo
-mkLFImported id =
+-- | The 'LambdaFormInfo' of an imported Id.
+-- See Note [The LFInfo of Imported Ids]
+importedIdLFInfo :: Id -> LambdaFormInfo
+importedIdLFInfo id =
-- See Note [Conveying CAF-info and LFInfo between modules] in
-- GHC.StgToCmm.Types
case idLFInfo_maybe id of
Just lf_info ->
- -- Use the LambdaFormInfo from the interface
+ -- Use the existing LambdaFormInfo
lf_info
Nothing
- -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
- -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+ -- Doesn't have a LambdaFormInfo, but we know it must be 'LFReEntrant' from its arity
| arity > 0
-> LFReEntrant TopLevel arity True ArgUnknown
- | Just con <- isDataConId_maybe id
- -- See Note [Imported unlifted nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
- -- and Note [The LFInfo of Imported Ids] below
- -> assert (hasNoNonZeroWidthArgs con) $
- LFCon con -- An imported nullary constructor
- -- We assume that the constructor is evaluated so that
- -- the id really does point directly to the constructor
-
+ -- We can't be sure of the LambdaFormInfo of this imported Id,
+ -- so make a conservative one from the type.
| otherwise
- -> mkLFArgument id -- Not sure of exact arity
+ -> assert (isNothing (isDataConId_maybe id)) $ -- See Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+ mkLFArgument id -- Not sure of exact arity
where
arity = idFunRepArity id
- hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
{-
Note [The LFInfo of Imported Ids]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As explained in Note [Conveying CAF-info and LFInfo between modules] and
-Note [Imported unlifted nullary datacon wrappers must have correct LFInfo], the
-LambdaFormInfo records the details of a closure representation and is often,
-when optimisations are enabled, serialized to the interface of a module.
-
-In particular, the `lfInfo` field of the `IdInfo` field of an `Id`
-* For Ids defined in this module: is `Nothing`
-* For imported Ids:
+As explained in Note [Conveying CAF-info and LFInfo between modules]
+the LambdaFormInfo records the details of a closure representation and is
+often, when optimisations are enabled, serialized to the interface of a module.
+
+In particular, the `lfInfo` field of the `IdInfo` field of an `Id`:
+* For DataCon workers and wrappers is populated as described in
+Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+* For other Ids defined in the module being compiled: is `Nothing`
+* For other imported Ids:
* is (Just lf_info) if the LFInfo was serialised into the interface file
(typically, when the exporting module was compiled with -O)
* is Nothing if it wasn't serialised
-However, when an interface doesn't have a LambdaFormInfo for some imported Id
-(so that its `lfInfo` field is `Nothing`), we can conservatively create one
-using `mkLFImported`.
-
The LambdaFormInfo we give an Id is used in determining how to tag its pointer
-(see `litIdInfo`). Therefore, it's crucial we re-construct a LambdaFormInfo as
-faithfully as possible or otherwise risk having pointers incorrectly tagged,
-which can lead to performance issues and even segmentation faults (see #23231
-and #23146). In particular, saturated data constructor applications *must* be
-unambiguously given `LFCon`, and the invariant
-
- If the LFInfo (serialised or built with mkLFImported) says LFCon, then it
- really is a static data constructor, and similar for LFReEntrant
-
-must be upheld.
-
-In `mkLFImported`, we make a conservative approximation to the real
-LambdaFormInfo as follows:
-
-(1) Ids with an `idFunRepArity > 0` are `LFReEntrant` and pointers to them are
-tagged (by `litIdInfo`) with the corresponding arity.
- - This is also true of data con wrappers and workers with arity > 0,
- regardless of the runtime relevance of the arguments
- - For example, `Just :: a -> Maybe a` is given `LFReEntrant`
- and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too
-
-(2) Data constructors with `idFunRepArity == 0` should be given `LFCon` because
-they are fully saturated data constructor applications and pointers to them
-should be tagged with the constructor index.
-
-(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
-of the worker to zero-width arguments only (which are dropped after unarisation)
-
-(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
-no arguments whatsoever (not even zero-width args)
-
-To ensure we properly give `LFReEntrant` to data constructors with some arity,
-and `LFCon` only to data constructors with zero arity, we must first check for
-`arity > 0` and only afterwards `isDataConId` -- the order of the guards in
-`mkLFImported` is quite important.
-
-As an example, consider the following data constructors:
-
- data T1 a where
- TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
-
- data T2 a where
- TCon2 :: {-# UNPACK #-} !() -> T2 a
-
- data T3 a where
- TCon3 :: T3 '[]
-
-`TCon1`'s wrapper has a lifted equality argument, which is non-zero-width, while
-the worker has an unlifted equality argument, which is zero-width.
-
-`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
-while the worker has no arguments.
-
-`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument;
-their Core representation:
-
- $WTCon3 :: T3 '[]
- $WTCon3 = TCon3 @[] <Refl>
-
- TCon3 :: forall (a :: * -> *). (a ~# []) => T a
- TCon3 = /\a. \(co :: a~#[]). TCon3 co
-
-For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
-both have arity == 1.
-
-For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
-while the worker is `LFCon` since its arity == 0
-
-For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
-worker `LFReEntrant` since its arity == 1
-
-One might think we could give *workers* with only zero-width-args the `LFCon`
-LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
-However, these workers, albeit rarely used, are unambiguously functions
--- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
-See also the discussion in #23158.
+(see `litIdInfo` and `lfDynTag`). Therefore, it's crucial we attribute a correct
+LambdaFormInfo to imported Ids, or otherwise risk having pointers incorrectly
+tagged which can lead to performance issues and even segmentation faults (see
+#23231 and Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]).
+
+In particular, saturated data constructor applications *must* be unambiguously
+given `LFCon`, and if the LFInfo says LFCon, then it really is a static data
+constructor, and similar for LFReEntrant.
+
+In `importedIdLFInfo`, we construct a LambdaFormInfo for imported Ids as follows:
+
+(1) If the `lfInfo` field contains an LFInfo, we use that LFInfo which is
+correct by construction (the invariant being that if it exists, it is correct):
+ (1.1) Either it was serialised to the interface we're importing the Id from,
+ (1.2) Or it's a DataCon worker or wrapper and its LFInfo was constructed
+ according to Note [LFInfo of DataCon workers and wrappers]
+(2) When the `lfInfo` field is `Nothing`
+ (2.1) If the `idFunRepArity` of the Id is known and is greater than 0, then
+ the Id is unambiguously a function and is given `LFReEntrant`, and pointers
+ to this Id will be tagged (by `litIdInfo`) with the corresponding arity.
+ (2.2) Otherwise, we can make a conservative estimate from the type.
-}
=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -149,7 +149,7 @@ getCgIdInfo id
| otherwise
= pprPanic "GHC.StgToCmm.Env: label not found" (ppr id <+> dcolon <+> ppr (idType id))
in return $
- litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
+ litIdInfo platform id (importedIdLFInfo id) (CmmLabel ext_lbl)
else
cgLookupPanic id -- Bug, id is neither in local binds nor is external
}}}
=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -53,7 +53,7 @@ make a conservative assumption, but that is bad: e.g.
#16559, #15155, and wiki: commentary/rts/haskell-execution/pointer-tagging
Conservative assumption here is made when we import an Id without a
- LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.mkLFImported.
+ LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.importedIdLFInfo.
So we arrange to always serialise this information into the interface file. The
moving parts are:
@@ -75,9 +75,25 @@ Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As described in `Note [Conveying CAF-info and LFInfo between modules]`,
imported unlifted nullary datacons must have their LambdaFormInfo set to
-reflect the fact that they are evaluated . This is necessary as otherwise
+reflect the fact that they are evaluated. This is necessary as otherwise
references to them may be passed untagged to code that expects tagged
-references.
+references because of the unlifted nature of the argument.
+
+For example, in
+
+ type T :: UnliftedType
+ data T = T1
+ | T2
+
+ f :: T -> Int
+ f x = case x of T1 -> 1; T2 -> 2
+
+`f` expects `x` to be evaluated and properly tagged due to its unliftedness.
+We can guarantee all occurrences of `T1` and `T2` are considered evaluated and
+are properly tagged by giving them the `LFCon` LambdaFormInfo which indicates
+they are fully saturated constructor applications.
+(The LambdaFormInfo is used to tag the pointer with the tag of the
+constructor, in `litIdInfo`)
What may be less obvious is that this must be done for not only datacon
workers but also *wrappers*. The reason is found in this program
@@ -109,11 +125,9 @@ pointer to `fieldsSam`. This is problematic as `fieldsSam` may take advantage
of the unlifted nature of its arguments by omitting handling of the zero
tag when scrutinising them.
-The fix is straightforward: extend the logic in `mkLFImported` to cover
-(nullary) datacon wrappers as well as workers. This is safe because we
-know that the wrapper of a nullary datacon will be in WHNF, even if it
-includes equalities evidence (since such equalities are not runtime
-relevant). This fixed #23146.
+The fix is straightforward: ensure we always construct a /correct/ LFInfo for
+datacon workers and wrappers, and populate the `lfInfo` with it. See
+Note [LFInfo of DataCon workers and wrappers]. This fixed #23146.
See also Note [The LFInfo of Imported Ids]
-}
=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -123,7 +123,8 @@ infixl 1 `setRuleInfo`,
`setCafInfo`,
`setDmdSigInfo`,
`setCprSigInfo`,
- `setDemandInfo`
+ `setDemandInfo`,
+ `setLFInfo`
{-
************************************************************************
* *
@@ -401,7 +402,12 @@ data IdInfo
--
-- See documentation of the getters for what these packed fields mean.
lfInfo :: !(Maybe LambdaFormInfo),
- -- ^ See Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure
+ -- ^ If lfInfo = Just info, then the `info` is guaranteed /correct/.
+ -- If lfInfo = Nothing, then we do not have a `LambdaFormInfo` for this Id,
+ -- so (for imported Ids) we make a conservative version.
+ -- See Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure
+ -- For locally-defined Ids other than DataCons, the `lfInfo` field is always Nothing.
+ -- See also Note [LFInfo of DataCon workers and wrappers]
-- See documentation of the getters for what these packed fields mean.
tagSig :: !(Maybe TagSig)
=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -65,6 +65,7 @@ import GHC.Core.DataCon
import GHC.Types.Literal
import GHC.Types.SourceText
+import GHC.Types.RepType ( countFunRepArgs )
import GHC.Types.Name.Set
import GHC.Types.Name
import GHC.Types.ForeignCall
@@ -87,6 +88,10 @@ import GHC.Data.FastString
import GHC.Data.List.SetOps
import Data.List ( zipWith4 )
+-- A bit of a shame we must import these here
+import GHC.StgToCmm.Types (LambdaFormInfo(..))
+import GHC.Runtime.Heap.Layout (ArgDescr(ArgUnknown))
+
{-
************************************************************************
* *
@@ -595,11 +600,18 @@ mkDataConWorkId wkr_name data_con
`setInlinePragInfo` wkr_inline_prag
`setUnfoldingInfo` evaldUnfolding -- Record that it's evaluated,
-- even if arity = 0
+ `setLFInfo` wkr_lf_info
-- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
wkr_arity = dataConRepArity data_con
+ -- See Note [LFInfo of DataCon workers and wrappers]
+ wkr_lf_info
+ | wkr_arity == 0 = LFCon data_con
+ | otherwise = LFReEntrant TopLevel (countFunRepArgs wkr_arity wkr_ty) True ArgUnknown
+ -- LFInfo stores post-unarisation arity
+
----------- Workers for newtypes --------------
univ_tvs = dataConUnivTyVars data_con
ex_tcvs = dataConExTyCoVars data_con
@@ -608,6 +620,7 @@ mkDataConWorkId wkr_name data_con
`setArityInfo` 1 -- Arity 1
`setInlinePragInfo` dataConWrapperInlinePragma
`setUnfoldingInfo` newtype_unf
+ `setLFInfo` (panic "mkDataConWorkId: we shouldn't look at LFInfo for newtype worker ids")
id_arg1 = mkScaledTemplateLocal 1 (head arg_tys)
res_ty_args = mkTyCoVarTys univ_tvs
newtype_unf = assertPpr (null ex_tcvs && isSingleton arg_tys)
@@ -618,6 +631,82 @@ mkDataConWorkId wkr_name data_con
wrapNewTypeBody tycon res_ty_args (Var id_arg1)
{-
+Note [LFInfo of DataCon workers and wrappers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As noted in Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure, it's
+crucial that saturated data con applications are given an LFInfo of `LFCon`.
+
+Since for data constructors we never serialise the worker and the wrapper (only
+the data type declaration), we never serialise their lambda form info either.
+
+Therefore, when making data constructors workers and wrappers, we construct a
+correct `LFInfo` for them right away, and put it it in the `lfInfo` field of the
+worker/wrapper Id, ensuring that:
+
+ The `lfInfo` field of a DataCon worker or wrapper is always populated with the correct LFInfo.
+
+How do we construct a /correct/ LFInfo for workers and wrappers?
+(Remember: `LFCon` means "a saturated constructor application")
+
+(1) Data constructor workers and wrappers with arity > 0 are unambiguously
+functions and should be given `LFReEntrant`, regardless of the runtime
+relevance of the arguments.
+ - For example, `Just :: a -> Maybe a` is given `LFReEntrant`,
+ and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too.
+
+(2) A datacon /worker/ with zero arity is trivially fully saturated -- it takes
+no arguments whatsoever (not even zero-width args), so it is given `LFCon`.
+
+(3) Perhaps surprisingly, a datacon /wrapper/ can be an `LFCon`. See Wrinkle (W1) below.
+A datacon /wrapper/ with zero arity must be a fully saturated application of
+the worker to zero-width arguments only (which are dropped after unarisation),
+and therefore is also given `LFCon`.
+
+For example, consider the following data constructors:
+
+ data T1 a where
+ TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
+
+ data T2 a where
+ TCon2 :: {-# UNPACK #-} !() -> T2 a
+
+ data T3 a where
+ TCon3 :: T3 '[]
+
+`TCon1`'s wrapper has a lifted argument, which is non-zero-width, while the
+worker has an unlifted equality argument, which is zero-width.
+
+`TCon2`'s wrapper has a lifted argument, which is non-zero-width, while the
+worker has no arguments.
+
+Wrinkle (W1). Perhaps surprisingly, it is possible for the /wrapper/ to be an
+`LFCon` even though the /worker/ is not. Consider `T3` above. Here is the
+Core representation of the worker and wrapper:
+
+ $WTCon3 :: T3 '[] -- Wrapper
+ $WTCon3 = TCon3 @[] <Refl> -- A saturated constructor application: LFCon
+
+ TCon3 :: forall (a :: * -> *). (a ~# []) => T a -- Worker
+ TCon3 = /\a. \(co :: a~#[]). TCon3 co -- A function: LFReEntrant
+
+For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
+both have arity == 1.
+
+For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
+while the worker is `LFCon` since its arity == 0
+
+For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
+worker `LFReEntrant` since its arity == 1
+
+One might think we could give *workers* with only zero-width-args the `LFCon`
+LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
+However, these workers are unambiguously functions
+-- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
+See also the discussion in #23158.
+
+See also the Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
+in GHC.StgToCmm.Types.
+
-------------------------------------------------
-- Data constructor representation
--
@@ -709,11 +798,18 @@ mkDataConRep dc_bang_opts fam_envs wrap_name data_con
-- We need to get the CAF info right here because GHC.Iface.Tidy
-- does not tidy the IdInfo of implicit bindings (like the wrapper)
-- so it not make sure that the CAF info is sane
+ `setLFInfo` wrap_lf_info
-- The signature is purely for passes like the Simplifier, not for
-- DmdAnal itself; see Note [DmdAnal for DataCon wrappers].
wrap_sig = mkClosedDmdSig wrap_arg_dmds topDiv
+ -- See Note [LFInfo of DataCon workers and wrappers]
+ wrap_lf_info
+ | wrap_arity == 0 = LFCon data_con
+ | otherwise = LFReEntrant TopLevel (countFunRepArgs wrap_arity wrap_ty) True ArgUnknown
+ -- LFInfo stores post-unarisation arity
+
wrap_arg_dmds =
replicate (length theta) topDmd ++ map mk_dmd arg_ibangs
-- Don't forget the dictionary arguments when building
=====================================
compiler/GHC/Types/RepType.hs
=====================================
@@ -84,12 +84,11 @@ isNvUnaryType ty
= False
-- INVARIANT: the result list is never empty.
-typePrimRepArgs :: HasDebugCallStack => Type -> [PrimRep]
+typePrimRepArgs :: HasDebugCallStack => Type -> NonEmpty PrimRep
typePrimRepArgs ty
- | [] <- reps
- = [VoidRep]
- | otherwise
- = reps
+ = case reps of
+ [] -> VoidRep :| []
+ (x:xs) -> x :| xs
where
reps = typePrimRep ty
@@ -124,6 +123,10 @@ unwrapType ty
| otherwise
= NS_Done
+-- | Count the arity of a function post-unarisation, including zero-width arguments.
+--
+-- The post-unarisation arity may be larger than the arity of the original
+-- function type. See Note [Unarisation].
countFunRepArgs :: Arity -> Type -> RepArity
countFunRepArgs 0 _
= 0
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/fa447a6321521c9813678601334d84ce486adb2a...efacf82b6f8802124f3014aaf01fdaece5126181
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/fa447a6321521c9813678601334d84ce486adb2a...efacf82b6f8802124f3014aaf01fdaece5126181
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230516/66970743/attachment-0001.html>
More information about the ghc-commits
mailing list