[Git][ghc/ghc][wip/romes/isNullaryRepDataCon] codeGen: Fix LFInfo of imported datacon wrappers

Rodrigo Mesquita (@alt-romes) gitlab at gitlab.haskell.org
Wed Apr 12 16:54:18 UTC 2023



Rodrigo Mesquita pushed to branch wip/romes/isNullaryRepDataCon at Glasgow Haskell Compiler / GHC


Commits:
7f362914 by Rodrigo Mesquita at 2023-04-12T17:54:06+01:00
codeGen: Fix LFInfo of imported datacon wrappers

As noted in #23231 and in the previous commit, we were failing to give a
an LFInfo of LFCon to a nullary (no non-zero-width arguments) datacon
wrapper from another module, failing to properly tag pointers which
ultimately led to the segmentation fault in #23146.

First, we fix the nullary datacon predicate to one which considers
constructors with no non-zero-width arguments to be nullary, unlike
`isNullaryRepDataCon` which doesn't consider constructors with just
zero-width to be nullary (since it is a predicate over the Core data con
representation arity, for which zero-width arguments are relevant)

Secondly, we failed to account for data con wrappers that received real
arguments but whose worker received non-zero-width arguments. An example
of such a data type was pointed out by @clyring:

  data T a where
    TCon :: {-# UNPACK #-} !(a :~: True) -> T a

In this case, we were incorrectly tagging the TCon wrapper with `LFCon`,
which is incorrect since it takes a lifted argument.

The solution is simple: change the order of the guards so that we check
for the arity of the binding before we check whether it is a constructor
with a nullary worker.

The creation of LFInfos for imported Ids and this detail are explained
in Note [The LFInfo of Imported Ids].

Closes #23231, #23146

(I've additionally batched some fixes to documentation I found while
investigating this issue)

- - - - -


9 changed files:

- compiler/GHC/Cmm/CLabel.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/Stg/Unarise.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id/Info.hs


Changes:

=====================================
compiler/GHC/Cmm/CLabel.hs
=====================================
@@ -1384,6 +1384,7 @@ For a data constructor (such as Just or Nothing), we have:
                    ordinary Haskell function of arity 1 that
                    allocates a (Just x) box:
                       Just = \x -> Just x
+    Just_entry:    The entry code for the worker function
     Just_closure:  The closure for this worker
 
     Nothing_closure: a statically allocated closure for Nothing


=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -111,8 +111,8 @@ import Data.List( find )
 import Language.Haskell.Syntax.Module.Name
 
 {-
-Data constructor representation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Data constructor representation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider the following Haskell data type declaration
 
         data T = T !Int ![Int]
@@ -213,7 +213,7 @@ Note [Data constructor workers and wrappers]
 
 * The wrapper (if it exists) takes dcOrigArgTys as its arguments.
   The worker takes dataConRepArgTys as its arguments
-  If the worker is absent, dataConRepArgTys is the same as dcOrigArgTys
+  If the wrapper is absent, dataConRepArgTys is the same as dcOrigArgTys
 
 * The 'NoDataConRep' case of DataConRep is important. Not only is it
   efficient, but it also ensures that the wrapper is replaced by the
@@ -981,7 +981,7 @@ but the rep type is
         Trep :: Int# -> a -> Void# -> T a
 Actually, the unboxed part isn't implemented yet!
 
-Not that this representation is still *different* from runtime
+Note that this representation is still *different* from runtime
 representation. (Which is what STG uses after unarise).
 
 This is how T would end up being used in STG post-unarise:
@@ -1395,9 +1395,10 @@ dataConSrcBangs = dcSrcBangs
 dataConSourceArity :: DataCon -> Arity
 dataConSourceArity (MkData { dcSourceArity = arity }) = arity
 
--- | Gives the number of actual fields in the /representation/ of the
--- data constructor. This may be more than appear in the source code;
--- the extra ones are the existentially quantified dictionaries
+-- | Gives the number of actual fields in the Core /representation/ of the data
+-- constructor. This may be more than appear in the source code; the extra ones
+-- are existentially quantified dictionaries and coercion arguments, lifted and
+-- unlifted! (despite the unlifted coercion arguments being zero-width).
 dataConRepArity :: DataCon -> Arity
 dataConRepArity (MkData { dcRepArity = arity }) = arity
 
@@ -1406,8 +1407,12 @@ dataConRepArity (MkData { dcRepArity = arity }) = arity
 isNullarySrcDataCon :: DataCon -> Bool
 isNullarySrcDataCon dc = dataConSourceArity dc == 0
 
--- | Return whether there are any argument types for this 'DataCon's runtime representation type
--- See Note [DataCon arities]
+-- | Return whether there are any argument types for this 'DataCon's Core
+-- representation type. See Note [DataCon arities].
+--
+-- In particular, Core's representation type considers coercion arguments to be
+-- arguments -- both lifted and unlifted coercions, despite the latter having
+-- zero-width runtime representation.
 isNullaryRepDataCon :: DataCon -> Bool
 isNullaryRepDataCon dc = dataConRepArity dc == 0
 


=====================================
compiler/GHC/Core/Tidy.hs
=====================================
@@ -82,7 +82,7 @@ tidyBind env (Rec prs)
 -- This means the code generator can get the full calling convention by only looking at the function
 -- itself without having to inspect the RHS.
 --
--- The actual logic is in tidyCbvInfo and takes:
+-- The actual logic is in computeCbvInfo and takes:
 -- * The function id
 -- * The functions rhs
 -- And gives us back the function annotated with the marks.
@@ -169,7 +169,7 @@ computeCbvInfo fun_id rhs
                -- seqList: avoid retaining the original rhs
 
              | otherwise
-             = -- pprTraceDebug "tidyCbvInfo: Worker seems to take unboxed tuple/sum types!"
+             = -- pprTraceDebug "computeCbvInfo: Worker seems to take unboxed tuple/sum types!"
                --    (ppr fun_id <+> ppr rhs)
                asNonWorkerLikeId fun_id
 


=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -245,7 +245,7 @@ literals.
         -- which can't be let-bound
   | StgConApp   DataCon
                 ConstructorNumber
-                [StgArg] -- Saturated
+                [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
                 [Type]   -- See Note [Types in StgConApp] in GHC.Stg.Unarise
 
   | StgOpApp    StgOp    -- Primitive op or foreign call


=====================================
compiler/GHC/Stg/Unarise.hs
=====================================
@@ -956,6 +956,8 @@ ubxSumRubbishArg (VecSlot n e)   = StgLitArg (LitRubbish TypeLike vec_rep)
 --------------------------------------------------------------------------------
 
 {-
+Note [Unarisation of Void binders and arguments]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 For arguments (StgArg) and binders (Id) we have two kind of unarisation:
 
   - When unarising function arg binders and arguments, we don't want to remove


=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -264,23 +264,79 @@ mkLFImported id =
         -- Use the LambdaFormInfo from the interface
         lf_info
       Nothing
-        -- Interface doesn't have a LambdaFormInfo, make a conservative one from
-        -- the type.
+        -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
+        -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+        | arity > 0
+        -> LFReEntrant TopLevel arity True ArgUnknown
+
         | Just con <- isDataConId_maybe id
-        , isNullaryRepDataCon con
-            -- See Note [Imported nullary datacon wrappers must have correct LFInfo]
-            -- in GHC.StgToCmm.Types
+        , hasNoNonZeroWidthArgs con
+          -- See Note [Imported nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
+          -- and Note [The LFInfo of Imported Ids] below
         -> LFCon con   -- An imported nullary constructor
                        -- We assume that the constructor is evaluated so that
                        -- the id really does point directly to the constructor
 
-        | arity > 0
-        -> LFReEntrant TopLevel arity True ArgUnknown
-
         | otherwise
         -> mkLFArgument id -- Not sure of exact arity
   where
     arity = idFunRepArity id
+    hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
+
+{-
+Note [The LFInfo of Imported Ids]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As explained in Note [Conveying CAF-info and LFInfo between modules] and
+Note [Imported nullary datacon wrappers must have correct LFInfo], the
+LambdaFormInfo records the details of a closure representation and is often,
+when optimisations are enabled, serialized to the interface of a module.
+
+However, when an interface doesn't have a LambdaFormInfo for some Id, we
+conservatively create one (in `mkLFImported`).
+
+The LambdaFormInfo we give an Id is used in determining how to tag its pointer
+(see `litIdInfo`). Therefore, its crucial we re-construct a LambdaFormInfo as
+faithfully as possible or otherwise risk having pointers incorrectly tagged,
+which can lead to performance issues and even segmentation faults (see #23231
+and #23146)
+
+In general,
+
+(1) Ids with a `RepArity > 0` are `LFReEntrant` and pointers to them are tagged
+with the corresponding arity
+
+(2) Nullary data constructors should be given `LFCon`, since they are indeed
+fully saturated data constructor applications; pointers to them should be
+tagged with the constructor index.
+
+However, we must be very careful about what we consider to be nullary data
+constructors. In particular, nullary data con *wrappers* must also be given
+`LFCon` (see #23231 and #23146). That is:
+  - Data con *workers* with no non-zero-width arguments have LFCon
+  - Data con *wrappers* with no non-zero-width arguments also have LFCon
+
+(Note that before the patch to those issues we would already consider these
+nullary wrappers to have `LFCon` lambda form info; but failed to re-construct
+that information in `mkLFImported`)
+
+Finally, we must consider the case of a data constructor whose worker has no
+non-zero-width arity whereas the wrapper has an argument. For example,
+
+  data T a where
+    TCon :: {-# UNPACK #-} !(a :~: True) -> T a
+
+`TCon` will have a wrapper with a lifted equality argument, which is
+non-zero-width, while the worker will have an unlifted equality argument, which
+is zero-width.
+
+We can (trivially) handle this situation in `mkLFImported` by *first* checking
+whether `idFunRepArity id > 0` (which will be 'True' of the wrapper and 'False'
+of the worker) and only then for `isDataConId_maybe && hasNoNonZeroWidthArgs`
+(which is true of both). This makes the order of the guards in `mkLFImported`
+crucial to the correctness!
+
+-}
 
 -------------
 mkLFStringLit :: LambdaFormInfo


=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -151,7 +151,7 @@ getCgIdInfo id
               in return $
                   litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
           else
-              cgLookupPanic id -- Bug
+              cgLookupPanic id -- Bug, id is neither in local binds nor is external
         }}}
 
 -- | Retrieve cg info for a name if it already exists.


=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -113,6 +113,8 @@ The fix is straightforward: extend the logic in `mkLFImported` to cover
 know that the wrapper of a nullary datacon will be in WHNF, even if it
 includes equalities evidence (since such equalities are not runtime
 relevant). This fixed #23146.
+
+See also Note [The LFInfo of Imported Ids]
 -}
 
 -- | Codegen-generated Id infos, to be passed to downstream via interfaces.
@@ -161,7 +163,7 @@ data LambdaFormInfo
         !StandardFormInfo
         !Bool           -- True <=> *might* be a function type
 
-  | LFCon               -- A saturated constructor application
+  | LFCon               -- A saturated data constructor application
         !DataCon        -- The constructor
 
   | LFUnknown           -- Used for function arguments and imported things.


=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -439,7 +439,7 @@ oneShotInfo :: IdInfo -> OneShotInfo
 oneShotInfo = bitfieldGetOneShotInfo . bitfield
 
 -- | 'Id' arity, as computed by "GHC.Core.Opt.Arity". Specifies how many arguments
--- this 'Id' has to be applied to before it doesn any meaningful work.
+-- this 'Id' has to be applied to before it does any meaningful work.
 arityInfo :: IdInfo -> ArityInfo
 arityInfo = bitfieldGetArityInfo . bitfield
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7f362914b95558aedd3108fd147183d438b8ae61

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7f362914b95558aedd3108fd147183d438b8ae61
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230412/34739ef1/attachment-0001.html>


More information about the ghc-commits mailing list