[Git][ghc/ghc][wip/romes/isNullaryRepDataCon] codeGen: Fix LFInfo of imported datacon wrappers
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Wed Apr 12 16:54:18 UTC 2023
Rodrigo Mesquita pushed to branch wip/romes/isNullaryRepDataCon at Glasgow Haskell Compiler / GHC
Commits:
7f362914 by Rodrigo Mesquita at 2023-04-12T17:54:06+01:00
codeGen: Fix LFInfo of imported datacon wrappers
As noted in #23231 and in the previous commit, we were failing to give a
an LFInfo of LFCon to a nullary (no non-zero-width arguments) datacon
wrapper from another module, failing to properly tag pointers which
ultimately led to the segmentation fault in #23146.
First, we fix the nullary datacon predicate to one which considers
constructors with no non-zero-width arguments to be nullary, unlike
`isNullaryRepDataCon` which doesn't consider constructors with just
zero-width to be nullary (since it is a predicate over the Core data con
representation arity, for which zero-width arguments are relevant)
Secondly, we failed to account for data con wrappers that received real
arguments but whose worker received non-zero-width arguments. An example
of such a data type was pointed out by @clyring:
data T a where
TCon :: {-# UNPACK #-} !(a :~: True) -> T a
In this case, we were incorrectly tagging the TCon wrapper with `LFCon`,
which is incorrect since it takes a lifted argument.
The solution is simple: change the order of the guards so that we check
for the arity of the binding before we check whether it is a constructor
with a nullary worker.
The creation of LFInfos for imported Ids and this detail are explained
in Note [The LFInfo of Imported Ids].
Closes #23231, #23146
(I've additionally batched some fixes to documentation I found while
investigating this issue)
- - - - -
9 changed files:
- compiler/GHC/Cmm/CLabel.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/Stg/Unarise.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id/Info.hs
Changes:
=====================================
compiler/GHC/Cmm/CLabel.hs
=====================================
@@ -1384,6 +1384,7 @@ For a data constructor (such as Just or Nothing), we have:
ordinary Haskell function of arity 1 that
allocates a (Just x) box:
Just = \x -> Just x
+ Just_entry: The entry code for the worker function
Just_closure: The closure for this worker
Nothing_closure: a statically allocated closure for Nothing
=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -111,8 +111,8 @@ import Data.List( find )
import Language.Haskell.Syntax.Module.Name
{-
-Data constructor representation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Data constructor representation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following Haskell data type declaration
data T = T !Int ![Int]
@@ -213,7 +213,7 @@ Note [Data constructor workers and wrappers]
* The wrapper (if it exists) takes dcOrigArgTys as its arguments.
The worker takes dataConRepArgTys as its arguments
- If the worker is absent, dataConRepArgTys is the same as dcOrigArgTys
+ If the wrapper is absent, dataConRepArgTys is the same as dcOrigArgTys
* The 'NoDataConRep' case of DataConRep is important. Not only is it
efficient, but it also ensures that the wrapper is replaced by the
@@ -981,7 +981,7 @@ but the rep type is
Trep :: Int# -> a -> Void# -> T a
Actually, the unboxed part isn't implemented yet!
-Not that this representation is still *different* from runtime
+Note that this representation is still *different* from runtime
representation. (Which is what STG uses after unarise).
This is how T would end up being used in STG post-unarise:
@@ -1395,9 +1395,10 @@ dataConSrcBangs = dcSrcBangs
dataConSourceArity :: DataCon -> Arity
dataConSourceArity (MkData { dcSourceArity = arity }) = arity
--- | Gives the number of actual fields in the /representation/ of the
--- data constructor. This may be more than appear in the source code;
--- the extra ones are the existentially quantified dictionaries
+-- | Gives the number of actual fields in the Core /representation/ of the data
+-- constructor. This may be more than appear in the source code; the extra ones
+-- are existentially quantified dictionaries and coercion arguments, lifted and
+-- unlifted! (despite the unlifted coercion arguments being zero-width).
dataConRepArity :: DataCon -> Arity
dataConRepArity (MkData { dcRepArity = arity }) = arity
@@ -1406,8 +1407,12 @@ dataConRepArity (MkData { dcRepArity = arity }) = arity
isNullarySrcDataCon :: DataCon -> Bool
isNullarySrcDataCon dc = dataConSourceArity dc == 0
--- | Return whether there are any argument types for this 'DataCon's runtime representation type
--- See Note [DataCon arities]
+-- | Return whether there are any argument types for this 'DataCon's Core
+-- representation type. See Note [DataCon arities].
+--
+-- In particular, Core's representation type considers coercion arguments to be
+-- arguments -- both lifted and unlifted coercions, despite the latter having
+-- zero-width runtime representation.
isNullaryRepDataCon :: DataCon -> Bool
isNullaryRepDataCon dc = dataConRepArity dc == 0
=====================================
compiler/GHC/Core/Tidy.hs
=====================================
@@ -82,7 +82,7 @@ tidyBind env (Rec prs)
-- This means the code generator can get the full calling convention by only looking at the function
-- itself without having to inspect the RHS.
--
--- The actual logic is in tidyCbvInfo and takes:
+-- The actual logic is in computeCbvInfo and takes:
-- * The function id
-- * The functions rhs
-- And gives us back the function annotated with the marks.
@@ -169,7 +169,7 @@ computeCbvInfo fun_id rhs
-- seqList: avoid retaining the original rhs
| otherwise
- = -- pprTraceDebug "tidyCbvInfo: Worker seems to take unboxed tuple/sum types!"
+ = -- pprTraceDebug "computeCbvInfo: Worker seems to take unboxed tuple/sum types!"
-- (ppr fun_id <+> ppr rhs)
asNonWorkerLikeId fun_id
=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -245,7 +245,7 @@ literals.
-- which can't be let-bound
| StgConApp DataCon
ConstructorNumber
- [StgArg] -- Saturated
+ [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
[Type] -- See Note [Types in StgConApp] in GHC.Stg.Unarise
| StgOpApp StgOp -- Primitive op or foreign call
=====================================
compiler/GHC/Stg/Unarise.hs
=====================================
@@ -956,6 +956,8 @@ ubxSumRubbishArg (VecSlot n e) = StgLitArg (LitRubbish TypeLike vec_rep)
--------------------------------------------------------------------------------
{-
+Note [Unarisation of Void binders and arguments]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For arguments (StgArg) and binders (Id) we have two kind of unarisation:
- When unarising function arg binders and arguments, we don't want to remove
=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -264,23 +264,79 @@ mkLFImported id =
-- Use the LambdaFormInfo from the interface
lf_info
Nothing
- -- Interface doesn't have a LambdaFormInfo, make a conservative one from
- -- the type.
+ -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
+ -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+ | arity > 0
+ -> LFReEntrant TopLevel arity True ArgUnknown
+
| Just con <- isDataConId_maybe id
- , isNullaryRepDataCon con
- -- See Note [Imported nullary datacon wrappers must have correct LFInfo]
- -- in GHC.StgToCmm.Types
+ , hasNoNonZeroWidthArgs con
+ -- See Note [Imported nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
+ -- and Note [The LFInfo of Imported Ids] below
-> LFCon con -- An imported nullary constructor
-- We assume that the constructor is evaluated so that
-- the id really does point directly to the constructor
- | arity > 0
- -> LFReEntrant TopLevel arity True ArgUnknown
-
| otherwise
-> mkLFArgument id -- Not sure of exact arity
where
arity = idFunRepArity id
+ hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
+
+{-
+Note [The LFInfo of Imported Ids]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As explained in Note [Conveying CAF-info and LFInfo between modules] and
+Note [Imported nullary datacon wrappers must have correct LFInfo], the
+LambdaFormInfo records the details of a closure representation and is often,
+when optimisations are enabled, serialized to the interface of a module.
+
+However, when an interface doesn't have a LambdaFormInfo for some Id, we
+conservatively create one (in `mkLFImported`).
+
+The LambdaFormInfo we give an Id is used in determining how to tag its pointer
+(see `litIdInfo`). Therefore, its crucial we re-construct a LambdaFormInfo as
+faithfully as possible or otherwise risk having pointers incorrectly tagged,
+which can lead to performance issues and even segmentation faults (see #23231
+and #23146)
+
+In general,
+
+(1) Ids with a `RepArity > 0` are `LFReEntrant` and pointers to them are tagged
+with the corresponding arity
+
+(2) Nullary data constructors should be given `LFCon`, since they are indeed
+fully saturated data constructor applications; pointers to them should be
+tagged with the constructor index.
+
+However, we must be very careful about what we consider to be nullary data
+constructors. In particular, nullary data con *wrappers* must also be given
+`LFCon` (see #23231 and #23146). That is:
+ - Data con *workers* with no non-zero-width arguments have LFCon
+ - Data con *wrappers* with no non-zero-width arguments also have LFCon
+
+(Note that before the patch to those issues we would already consider these
+nullary wrappers to have `LFCon` lambda form info; but failed to re-construct
+that information in `mkLFImported`)
+
+Finally, we must consider the case of a data constructor whose worker has no
+non-zero-width arity whereas the wrapper has an argument. For example,
+
+ data T a where
+ TCon :: {-# UNPACK #-} !(a :~: True) -> T a
+
+`TCon` will have a wrapper with a lifted equality argument, which is
+non-zero-width, while the worker will have an unlifted equality argument, which
+is zero-width.
+
+We can (trivially) handle this situation in `mkLFImported` by *first* checking
+whether `idFunRepArity id > 0` (which will be 'True' of the wrapper and 'False'
+of the worker) and only then for `isDataConId_maybe && hasNoNonZeroWidthArgs`
+(which is true of both). This makes the order of the guards in `mkLFImported`
+crucial to the correctness!
+
+-}
-------------
mkLFStringLit :: LambdaFormInfo
=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -151,7 +151,7 @@ getCgIdInfo id
in return $
litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
else
- cgLookupPanic id -- Bug
+ cgLookupPanic id -- Bug, id is neither in local binds nor is external
}}}
-- | Retrieve cg info for a name if it already exists.
=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -113,6 +113,8 @@ The fix is straightforward: extend the logic in `mkLFImported` to cover
know that the wrapper of a nullary datacon will be in WHNF, even if it
includes equalities evidence (since such equalities are not runtime
relevant). This fixed #23146.
+
+See also Note [The LFInfo of Imported Ids]
-}
-- | Codegen-generated Id infos, to be passed to downstream via interfaces.
@@ -161,7 +163,7 @@ data LambdaFormInfo
!StandardFormInfo
!Bool -- True <=> *might* be a function type
- | LFCon -- A saturated constructor application
+ | LFCon -- A saturated data constructor application
!DataCon -- The constructor
| LFUnknown -- Used for function arguments and imported things.
=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -439,7 +439,7 @@ oneShotInfo :: IdInfo -> OneShotInfo
oneShotInfo = bitfieldGetOneShotInfo . bitfield
-- | 'Id' arity, as computed by "GHC.Core.Opt.Arity". Specifies how many arguments
--- this 'Id' has to be applied to before it doesn any meaningful work.
+-- this 'Id' has to be applied to before it does any meaningful work.
arityInfo :: IdInfo -> ArityInfo
arityInfo = bitfieldGetArityInfo . bitfield
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7f362914b95558aedd3108fd147183d438b8ae61
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7f362914b95558aedd3108fd147183d438b8ae61
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230412/34739ef1/attachment-0001.html>
More information about the ghc-commits
mailing list