[Git][ghc/ghc][wip/T23146] codeGen: Fix LFInfo of imported datacon wrappers
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Fri Apr 14 13:30:40 UTC 2023
Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC
Commits:
36794d13 by Rodrigo Mesquita at 2023-04-14T14:29:34+01:00
codeGen: Fix LFInfo of imported datacon wrappers
As noted in #23231 and in the previous commit, we were failing to give a
an LFInfo of LFCon to a nullary datacon wrapper from another module,
failing to properly tag pointers which ultimately led to the
segmentation fault in #23146.
On top of the previous commit which now considers wrappers where we
previously only considered workers, we change the order of the guards so
that we check for the arity of the binding before we check whether it is
a constructor. This allows us to
(1) Correctly assign `LFReEntrant` to imported wrappers whose worker was
nullary, which we previously would fail to do
(2) Remove the `isNullaryRepDataCon` predicate:
(a) which was previously wrong, since it considered wrappers whose
workers had zero-width arguments to be non-nullary and would fail to
give `LFCon` to them
(b) is now unnecessary, since arity == 0 guarantees
- that the worker takes no arguments at all
- and the wrapper takes no arguments and its RHS must be an
application of the worker to zero-width-args only.
- we lint these two items with an assertion that the datacon
`hasNoNonZeroWidthArgs`
The creation of LFInfos for imported Ids and this detail are explained
in Note [The LFInfo of Imported Ids].
Note that before the patch to those issues we would already consider these
nullary wrappers to have `LFCon` lambda form info; but failed to re-construct
that information in `mkLFImported`
Closes #23231, #23146
(I've additionally batched some fixes to documentation I found while
investigating this issue)
- - - - -
14 changed files:
- compiler/GHC/Cmm/CLabel.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/Stg/Unarise.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id.hs
- compiler/GHC/Types/Id/Info.hs
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.hs
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.stdout
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unliftedA.hs
- testsuite/tests/codeGen/should_run/T23146/all.T
Changes:
=====================================
compiler/GHC/Cmm/CLabel.hs
=====================================
@@ -1384,6 +1384,7 @@ For a data constructor (such as Just or Nothing), we have:
ordinary Haskell function of arity 1 that
allocates a (Just x) box:
Just = \x -> Just x
+ Just_entry: The entry code for the worker function
Just_closure: The closure for this worker
Nothing_closure: a statically allocated closure for Nothing
=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -111,8 +111,8 @@ import Data.List( find )
import Language.Haskell.Syntax.Module.Name
{-
-Data constructor representation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Data constructor representation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the following Haskell data type declaration
data T = T !Int ![Int]
@@ -213,7 +213,7 @@ Note [Data constructor workers and wrappers]
* The wrapper (if it exists) takes dcOrigArgTys as its arguments.
The worker takes dataConRepArgTys as its arguments
- If the worker is absent, dataConRepArgTys is the same as dcOrigArgTys
+ If the wrapper is absent, dataConRepArgTys is the same as dcOrigArgTys
* The 'NoDataConRep' case of DataConRep is important. Not only is it
efficient, but it also ensures that the wrapper is replaced by the
@@ -981,7 +981,7 @@ but the rep type is
Trep :: Int# -> a -> Void# -> T a
Actually, the unboxed part isn't implemented yet!
-Not that this representation is still *different* from runtime
+Note that this representation is still *different* from runtime
representation. (Which is what STG uses after unarise).
This is how T would end up being used in STG post-unarise:
@@ -1395,9 +1395,10 @@ dataConSrcBangs = dcSrcBangs
dataConSourceArity :: DataCon -> Arity
dataConSourceArity (MkData { dcSourceArity = arity }) = arity
--- | Gives the number of actual fields in the /representation/ of the
--- data constructor. This may be more than appear in the source code;
--- the extra ones are the existentially quantified dictionaries
+-- | Gives the number of actual fields in the Core /representation/ of the data
+-- constructor. This may be more than appear in the source code; the extra ones
+-- are existentially quantified dictionaries and coercion arguments, lifted and
+-- unlifted (despite the unlifted coercion arguments being zero-width).
dataConRepArity :: DataCon -> Arity
dataConRepArity (MkData { dcRepArity = arity }) = arity
@@ -1406,8 +1407,12 @@ dataConRepArity (MkData { dcRepArity = arity }) = arity
isNullarySrcDataCon :: DataCon -> Bool
isNullarySrcDataCon dc = dataConSourceArity dc == 0
--- | Return whether there are any argument types for this 'DataCon's runtime representation type
--- See Note [DataCon arities]
+-- | Return whether there are any argument types for this 'DataCon's Core
+-- representation type. See Note [DataCon arities].
+--
+-- In particular, Core's representation type considers coercion arguments to be
+-- arguments -- both lifted and unlifted coercions, despite the latter having
+-- zero-width runtime representation.
isNullaryRepDataCon :: DataCon -> Bool
isNullaryRepDataCon dc = dataConRepArity dc == 0
=====================================
compiler/GHC/Core/Tidy.hs
=====================================
@@ -82,7 +82,7 @@ tidyBind env (Rec prs)
-- This means the code generator can get the full calling convention by only looking at the function
-- itself without having to inspect the RHS.
--
--- The actual logic is in tidyCbvInfo and takes:
+-- The actual logic is in computeCbvInfo and takes:
-- * The function id
-- * The functions rhs
-- And gives us back the function annotated with the marks.
@@ -169,7 +169,7 @@ computeCbvInfo fun_id rhs
-- seqList: avoid retaining the original rhs
| otherwise
- = -- pprTraceDebug "tidyCbvInfo: Worker seems to take unboxed tuple/sum types!"
+ = -- pprTraceDebug "computeCbvInfo: Worker seems to take unboxed tuple/sum types!"
-- (ppr fun_id <+> ppr rhs)
asNonWorkerLikeId fun_id
=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -245,7 +245,7 @@ literals.
-- which can't be let-bound
| StgConApp DataCon
ConstructorNumber
- [StgArg] -- Saturated
+ [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
[Type] -- See Note [Types in StgConApp] in GHC.Stg.Unarise
| StgOpApp StgOp -- Primitive op or foreign call
=====================================
compiler/GHC/Stg/Unarise.hs
=====================================
@@ -956,6 +956,8 @@ ubxSumRubbishArg (VecSlot n e) = StgLitArg (LitRubbish TypeLike vec_rep)
--------------------------------------------------------------------------------
{-
+Note [Unarisation of Void binders and arguments]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For arguments (StgArg) and binders (Id) we have two kind of unarisation:
- When unarising function arg binders and arguments, we don't want to remove
=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -264,23 +264,97 @@ mkLFImported id =
-- Use the LambdaFormInfo from the interface
lf_info
Nothing
- -- Interface doesn't have a LambdaFormInfo, make a conservative one from
- -- the type.
+ -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
+ -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+ | arity > 0
+ -> LFReEntrant TopLevel arity True ArgUnknown
+
| Just con <- isDataConId_maybe id
- , isNullaryRepDataCon con
- -- See Note [Imported nullary datacon wrappers must have correct LFInfo]
- -- in GHC.StgToCmm.Types
- -> LFCon con -- An imported nullary constructor
+ -- See Note [Imported nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
+ -- and Note [The LFInfo of Imported Ids] below
+ -> assert (hasNoNonZeroWidthArgs con) $
+ LFCon con -- An imported nullary constructor
-- We assume that the constructor is evaluated so that
-- the id really does point directly to the constructor
- | arity > 0
- -> LFReEntrant TopLevel arity True ArgUnknown
-
| otherwise
-> mkLFArgument id -- Not sure of exact arity
where
arity = idFunRepArity id
+ hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
+
+{-
+Note [The LFInfo of Imported Ids]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As explained in Note [Conveying CAF-info and LFInfo between modules] and
+Note [Imported nullary datacon wrappers must have correct LFInfo], the
+LambdaFormInfo records the details of a closure representation and is often,
+when optimisations are enabled, serialized to the interface of a module.
+
+However, when an interface doesn't have a LambdaFormInfo for some Id, we
+conservatively create one (in `mkLFImported`).
+
+The LambdaFormInfo we give an Id is used in determining how to tag its pointer
+(see `litIdInfo`). Therefore, its crucial we re-construct a LambdaFormInfo as
+faithfully as possible or otherwise risk having pointers incorrectly tagged,
+which can lead to performance issues and even segmentation faults (see #23231
+and #23146)
+
+In general,
+
+(1) Ids with an `idFunRepArity > 0` are `LFReEntrant` and pointers to them are
+tagged with the corresponding arity.
+ - This is also true of data con wrappers and workers with arity > 0,
+ regardless of the runtime relevance of the arguments
+ - For example, `Just :: a -> T a` is given `LFReEntrant`
+ and `Nil :: (a ~# '[]) -> T a` is given `LFReEntrant` too
+
+(2) Data constructors with `idFunRepArity == 0` should be given `LFCon` because
+they are fully saturated data constructor applications and pointers to them
+should be tagged with the constructor index.
+
+(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
+of the worker to zero-width arguments only (which are dropped after unarisation)
+
+(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
+no arguments whatsoever (not even zero-width args)
+
+To ensure we properly give `LFReEntrant` to data constructors with some arity,
+and `LFCon` only to data constructors with zero arity, we must first check for
+`arity > 0` and only afterwards `isDataConId` -- the order of the guards in
+`mkLFImported` is quite important.
+
+As an example, consider the following data constructors:
+
+ data T a where
+ TCon :: {-# UNPACK #-} !(a :~: True) -> T a
+
+ data T2 a where
+ TCon2 :: {-# UNPACK #-} !() -> T2 a
+
+ data T3 a where
+ TCon3 :: T3 '[]
+
+`TCon`'s wrapper has a lifted equality argument, which is non-zero-width, while
+the worker has an unlifted equality argument, which is zero-width.
+
+`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
+while the worker has no arguments.
+
+`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument
+
+For `TCon`, both the wrapper and worker will be given `LFReEntrant` since they both have arity == 1.
+
+For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1 while the worker is `LFCon` since its arity == 0
+
+For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the worker `LFReEntrant` since its arity == 1
+
+One thing worth considering is giving *workers* with only zero-width-args the
+`LFCon` LambdaFormInfo, e.g. giving `LFCon` to the worker of `TCon` and
+`TCon3`. Currently, they are being marked `LFReEntrant`.
+
+-}
-------------
mkLFStringLit :: LambdaFormInfo
=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -151,7 +151,7 @@ getCgIdInfo id
in return $
litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
else
- cgLookupPanic id -- Bug
+ cgLookupPanic id -- Bug, id is neither in local binds nor is external
}}}
-- | Retrieve cg info for a name if it already exists.
=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -113,6 +113,8 @@ The fix is straightforward: extend the logic in `mkLFImported` to cover
know that the wrapper of a nullary datacon will be in WHNF, even if it
includes equalities evidence (since such equalities are not runtime
relevant). This fixed #23146.
+
+See also Note [The LFInfo of Imported Ids]
-}
-- | Codegen-generated Id infos, to be passed to downstream via interfaces.
@@ -161,7 +163,7 @@ data LambdaFormInfo
!StandardFormInfo
!Bool -- True <=> *might* be a function type
- | LFCon -- A saturated constructor application
+ | LFCon -- A saturated data constructor application
!DataCon -- The constructor
| LFUnknown -- Used for function arguments and imported things.
=====================================
compiler/GHC/Types/Id.hs
=====================================
@@ -697,6 +697,10 @@ idCallArity id = callArityInfo (idInfo id)
setIdCallArity :: Id -> Arity -> Id
setIdCallArity id arity = modifyIdInfo (`setCallArityInfo` arity) id
+-- | The RepArity of an Id.
+--
+-- This arity counts all arguments relevant in Core, which includes arguments
+-- with no runtime representation
idFunRepArity :: Id -> RepArity
idFunRepArity x = countFunRepArgs (idArity x) (idType x)
=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -439,7 +439,7 @@ oneShotInfo :: IdInfo -> OneShotInfo
oneShotInfo = bitfieldGetOneShotInfo . bitfield
-- | 'Id' arity, as computed by "GHC.Core.Opt.Arity". Specifies how many arguments
--- this 'Id' has to be applied to before it doesn any meaningful work.
+-- this 'Id' has to be applied to before it does any meaningful work.
arityInfo :: IdInfo -> ArityInfo
arityInfo = bitfieldGetArityInfo . bitfield
=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.hs
=====================================
@@ -0,0 +1,14 @@
+{-# LANGUAGE GADTs, DataKinds #-}
+
+import T23146_lifted_unliftedA
+
+import Data.Type.Equality
+
+fieldsSam :: NP True -> NP True -> Bool
+fieldsSam (x' ::* xs) (y' ::* ys) = fieldsSam xs ys
+fieldsSam (UNil Refl) (UNil Refl) = True
+
+main :: IO ()
+main = print (fieldsSam (UNil Refl) (UNil Refl))
+
+
=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.stdout
=====================================
@@ -0,0 +1 @@
+True
=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unliftedA.hs
=====================================
@@ -0,0 +1,13 @@
+{-# OPTIONS_GHC -O1 #-}
+{-# LANGUAGE DataKinds #-}
+
+module T23146_lifted_unliftedA where
+
+import Data.Kind
+import Data.Type.Equality
+
+data NP a where
+ UNil :: {-# UNPACK #-} !(a :~: True) -> NP a
+ (::*) :: Bool -> NP True -> NP True
+
+
=====================================
testsuite/tests/codeGen/should_run/T23146/all.T
=====================================
@@ -1,4 +1,4 @@
test('T23146', normal, compile_and_run, [''])
test('T23146_lifted', normal, compile_and_run, [''])
test('T23146_liftedeq', normal, compile_and_run, [''])
-
+test('T23146_lifted_unlifted', normal, compile_and_run, [''])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/36794d13ea55cbdac6f107bb1691551525ada313
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/36794d13ea55cbdac6f107bb1691551525ada313
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230414/0fd1eab8/attachment-0001.html>
More information about the ghc-commits
mailing list