[Git][ghc/ghc][wip/T23146] codeGen: Fix LFInfo of imported datacon wrappers

Rodrigo Mesquita (@alt-romes) gitlab at gitlab.haskell.org
Wed Apr 26 12:34:54 UTC 2023



Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC


Commits:
9ed45119 by Rodrigo Mesquita at 2023-04-26T13:34:41+01:00
codeGen: Fix LFInfo of imported datacon wrappers

As noted in #23231 and in the previous commit, we were failing to give a
an LFInfo of LFCon to a nullary datacon wrapper from another module,
failing to properly tag pointers which ultimately led to the
segmentation fault in #23146.

On top of the previous commit which now considers wrappers where we
previously only considered workers, we change the order of the guards so
that we check for the arity of the binding before we check whether it is
a constructor. This allows us to
(1) Correctly assign `LFReEntrant` to imported wrappers whose worker was
nullary, which we previously would fail to do
(2) Remove the `isNullaryRepDataCon` predicate:
    (a) which was previously wrong, since it considered wrappers whose
    workers had zero-width arguments to be non-nullary and would fail to
    give `LFCon` to them
    (b) is now unnecessary, since arity == 0 guarantees
        - that the worker takes no arguments at all
        - and the wrapper takes no arguments and its RHS must be an
          application of the worker to zero-width-args only.
        - we lint these two items with an assertion that the datacon
          `hasNoNonZeroWidthArgs`

We also update `isTagged` to use the new logic in determining the
LFInfos of imported Ids.

The creation of LFInfos for imported Ids and this detail are explained
in Note [The LFInfo of Imported Ids].

Note that before the patch to those issues we would already consider these
nullary wrappers to have `LFCon` lambda form info; but failed to re-construct
that information in `mkLFImported`

Closes #23231, #23146

(I've additionally batched some fixes to documentation I found while
investigating this issue)

- - - - -


16 changed files:

- compiler/GHC/Cmm/CLabel.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Core/Tidy.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/Stg/Unarise.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id.hs
- compiler/GHC/Types/Id/Info.hs
- compiler/GHC/Types/Id/Make.hs
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.hs
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.stdout
- + testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unliftedA.hs
- testsuite/tests/codeGen/should_run/T23146/all.T


Changes:

=====================================
compiler/GHC/Cmm/CLabel.hs
=====================================
@@ -1384,6 +1384,7 @@ For a data constructor (such as Just or Nothing), we have:
                    ordinary Haskell function of arity 1 that
                    allocates a (Just x) box:
                       Just = \x -> Just x
+    Just_entry:    The entry code for the worker function
     Just_closure:  The closure for this worker
 
     Nothing_closure: a statically allocated closure for Nothing


=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -111,8 +111,8 @@ import Data.List( find )
 import Language.Haskell.Syntax.Module.Name
 
 {-
-Data constructor representation
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Data constructor representation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Consider the following Haskell data type declaration
 
         data T = T !Int ![Int]
@@ -213,7 +213,7 @@ Note [Data constructor workers and wrappers]
 
 * The wrapper (if it exists) takes dcOrigArgTys as its arguments.
   The worker takes dataConRepArgTys as its arguments
-  If the worker is absent, dataConRepArgTys is the same as dcOrigArgTys
+  If the wrapper is absent, dataConRepArgTys is the same as dcOrigArgTys
 
 * The 'NoDataConRep' case of DataConRep is important. Not only is it
   efficient, but it also ensures that the wrapper is replaced by the
@@ -586,12 +586,22 @@ Function call 'dataConKindEqSpec' returns [k'~k]
 
 Note [DataCon arities]
 ~~~~~~~~~~~~~~~~~~~~~~
-dcSourceArity does not take constraints into account,
-but dcRepArity does.  For example:
+A `DataCon`'s source arity and core representation arity may differ:
+`dcSourceArity` does not take constraints into account, but `dcRepArity` does.
+
+The additional arguments taken into account by `dcRepArity` include quantified
+dictionaries and coercion arguments, lifted and unlifted (despite the unlifted
+coercion arguments having a zero-width runtime representation).
+For example:
    MkT :: Ord a => a -> T a
     dcSourceArity = 1
     dcRepArity    = 2
 
+   MkU :: (b ~ '[]) => U b
+    dcSourceArity = 0
+    dcRepArity    = 1
+
+
 Note [DataCon user type variable binders]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A DataCon has two different sets of type variables:
@@ -981,7 +991,7 @@ but the rep type is
         Trep :: Int# -> a -> Void# -> T a
 Actually, the unboxed part isn't implemented yet!
 
-Not that this representation is still *different* from runtime
+Note that this representation is still *different* from runtime
 representation. (Which is what STG uses after unarise).
 
 This is how T would end up being used in STG post-unarise:
@@ -1395,9 +1405,10 @@ dataConSrcBangs = dcSrcBangs
 dataConSourceArity :: DataCon -> Arity
 dataConSourceArity (MkData { dcSourceArity = arity }) = arity
 
--- | Gives the number of actual fields in the /representation/ of the
--- data constructor. This may be more than appear in the source code;
--- the extra ones are the existentially quantified dictionaries
+-- | Gives the number of value arguments (including zero-width coercions)
+-- stored by the given `DataCon`'s worker in its Core representation. This may
+-- differ from the number of arguments that appear in the source code; see also
+-- Note [DataCon arities]
 dataConRepArity :: DataCon -> Arity
 dataConRepArity (MkData { dcRepArity = arity }) = arity
 
@@ -1406,8 +1417,14 @@ dataConRepArity (MkData { dcRepArity = arity }) = arity
 isNullarySrcDataCon :: DataCon -> Bool
 isNullarySrcDataCon dc = dataConSourceArity dc == 0
 
--- | Return whether there are any argument types for this 'DataCon's runtime representation type
--- See Note [DataCon arities]
+-- | Return whether this `DataCon`'s worker, in its Core representation, takes
+-- any value arguments.
+--
+-- In particular, remember that we include coercion arguments in the arity of
+-- the Core representation of the `DataCon` -- both lifted and unlifted
+-- coercions, despite the latter having zero-width runtime representation.
+--
+-- See also Note [DataCon arities].
 isNullaryRepDataCon :: DataCon -> Bool
 isNullaryRepDataCon dc = dataConRepArity dc == 0
 


=====================================
compiler/GHC/Core/Tidy.hs
=====================================
@@ -82,7 +82,7 @@ tidyBind env (Rec prs)
 -- This means the code generator can get the full calling convention by only looking at the function
 -- itself without having to inspect the RHS.
 --
--- The actual logic is in tidyCbvInfo and takes:
+-- The actual logic is in computeCbvInfo and takes:
 -- * The function id
 -- * The functions rhs
 -- And gives us back the function annotated with the marks.
@@ -169,7 +169,7 @@ computeCbvInfo fun_id rhs
                -- seqList: avoid retaining the original rhs
 
              | otherwise
-             = -- pprTraceDebug "tidyCbvInfo: Worker seems to take unboxed tuple/sum types!"
+             = -- pprTraceDebug "computeCbvInfo: Worker seems to take unboxed tuple/sum types!"
                --    (ppr fun_id <+> ppr rhs)
                asNonWorkerLikeId fun_id
 


=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -36,6 +36,7 @@ import GHC.Core            ( AltCon(..) )
 import GHC.Core.Type
 
 import GHC.StgToCmm.Types
+import GHC.StgToCmm.Closure (mkLFImported)
 
 import GHC.Stg.Utils
 import GHC.Stg.Syntax as StgSyn
@@ -271,13 +272,10 @@ isTagged v = do
                             TagTagged -> True
                             TagTuple _ -> True -- Consider unboxed tuples tagged.
         False -- Imported
-            | Just con <- (isDataConWorkId_maybe v)
-            , isNullaryRepDataCon con
-            -> return True
-            | Just lf_info <- idLFInfo_maybe v
             -> return $!
-                -- Can we treat the thing as tagged based on it's LFInfo?
-                case lf_info of
+                -- Determine whether it is tagged from the LFInfo of the imported id.
+                -- See Note [The LFInfo of Imported Ids]
+                case mkLFImported v of
                     -- Function, applied not entered.
                     LFReEntrant {}
                         -> True
@@ -295,9 +293,6 @@ isTagged v = do
                     -- Shouldn't be possible. I don't think we can export letNoEscapes
                         -> True
 
-            | otherwise
-            -> return False
-
 
 isArgTagged :: StgArg -> RM Bool
 isArgTagged (StgLitArg _) = return True


=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -245,7 +245,7 @@ literals.
         -- which can't be let-bound
   | StgConApp   DataCon
                 ConstructorNumber
-                [StgArg] -- Saturated
+                [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
                 [Type]   -- See Note [Types in StgConApp] in GHC.Stg.Unarise
 
   | StgOpApp    StgOp    -- Primitive op or foreign call


=====================================
compiler/GHC/Stg/Unarise.hs
=====================================
@@ -956,6 +956,8 @@ ubxSumRubbishArg (VecSlot n e)   = StgLitArg (LitRubbish TypeLike vec_rep)
 --------------------------------------------------------------------------------
 
 {-
+Note [Unarisation of Void binders and arguments]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 For arguments (StgArg) and binders (Id) we have two kind of unarisation:
 
   - When unarising function arg binders and arguments, we don't want to remove


=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -264,23 +264,123 @@ mkLFImported id =
         -- Use the LambdaFormInfo from the interface
         lf_info
       Nothing
-        -- Interface doesn't have a LambdaFormInfo, make a conservative one from
-        -- the type.
+        -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
+        -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+        | arity > 0
+        -> LFReEntrant TopLevel arity True ArgUnknown
+
         | Just con <- isDataConId_maybe id
-        , isNullaryRepDataCon con
-            -- See Note [Imported nullary datacon wrappers must have correct LFInfo]
-            -- in GHC.StgToCmm.Types
-        -> LFCon con   -- An imported nullary constructor
+          -- See Note [Imported unlifted nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
+          -- and Note [The LFInfo of Imported Ids] below
+        -> assert (hasNoNonZeroWidthArgs con) $
+           LFCon con   -- An imported nullary constructor
                        -- We assume that the constructor is evaluated so that
                        -- the id really does point directly to the constructor
 
-        | arity > 0
-        -> LFReEntrant TopLevel arity True ArgUnknown
-
         | otherwise
         -> mkLFArgument id -- Not sure of exact arity
   where
     arity = idFunRepArity id
+    hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
+
+{-
+Note [The LFInfo of Imported Ids]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As explained in Note [Conveying CAF-info and LFInfo between modules] and
+Note [Imported unlifted nullary datacon wrappers must have correct LFInfo], the
+LambdaFormInfo records the details of a closure representation and is often,
+when optimisations are enabled, serialized to the interface of a module.
+
+In particular, the `lfInfo` field of the `IdInfo` field of an `Id`
+* For Ids defined in this module: is `Nothing`
+* For imported Ids:
+  * is (Just lf_info) if the LFInfo was serialised into the interface file
+    (typically, when the exporting module was compiled with -O)
+  * is Nothing if it wasn't serialised
+
+However, when an interface doesn't have a LambdaFormInfo for some imported Id
+(so that its `lfInfo` field is `Nothing`), we can conservatively create one
+using `mkLFImported`.
+
+The LambdaFormInfo we give an Id is used in determining how to tag its pointer
+(see `litIdInfo`). Therefore, it's crucial we re-construct a LambdaFormInfo as
+faithfully as possible or otherwise risk having pointers incorrectly tagged,
+which can lead to performance issues and even segmentation faults (see #23231
+and #23146). In particular, saturated data constructor applications *must* be
+unambiguously given `LFCon`, and the invariant
+
+  If the LFInfo (serialised or built with mkLFImported) says LFCon, then it
+  really is a static data constructor, and similar for LFReEntrant
+
+must be upheld.
+
+In `mkLFImported`, we make a conservative approximation to the real
+LambdaFormInfo as follows:
+
+(1) Ids with an `idFunRepArity > 0` are `LFReEntrant` and pointers to them are
+tagged (by `litIdInfo`) with the corresponding arity.
+    - This is also true of data con wrappers and workers with arity > 0,
+    regardless of the runtime relevance of the arguments
+    - For example, `Just :: a -> Maybe a` is given `LFReEntrant`
+               and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too
+
+(2) Data constructors with `idFunRepArity == 0` should be given `LFCon` because
+they are fully saturated data constructor applications and pointers to them
+should be tagged with the constructor index.
+
+(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
+of the worker to zero-width arguments only (which are dropped after unarisation)
+
+(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
+no arguments whatsoever (not even zero-width args)
+
+To ensure we properly give `LFReEntrant` to data constructors with some arity,
+and `LFCon` only to data constructors with zero arity, we must first check for
+`arity > 0` and only afterwards `isDataConId` -- the order of the guards in
+`mkLFImported` is quite important.
+
+As an example, consider the following data constructors:
+
+  data T1 a where
+    TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
+
+  data T2 a where
+    TCon2 :: {-# UNPACK #-} !() -> T2 a
+
+  data T3 a where
+    TCon3 :: T3 '[]
+
+`TCon1`'s wrapper has a lifted equality argument, which is non-zero-width, while
+the worker has an unlifted equality argument, which is zero-width.
+
+`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
+while the worker has no arguments.
+
+`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument;
+their Core representation:
+
+  $WTCon3 :: T3 '[]
+  $WTCon3 = TCon3 @[] <Refl>
+
+  TCon3 :: forall (a :: * -> *). (a ~# []) => T a
+  TCon3 = /\a. \(co :: a~#[]). TCon3 co
+
+For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
+both have arity == 1.
+
+For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
+while the worker is `LFCon` since its arity == 0
+
+For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
+worker `LFReEntrant` since its arity == 1
+
+One might think we could give *workers* with only zero-width-args the `LFCon`
+LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
+However, these workers, albeit rarely used, are unambiguously functions
+-- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
+See also the discussion in #23158.
+
+-}
 
 -------------
 mkLFStringLit :: LambdaFormInfo


=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -151,7 +151,7 @@ getCgIdInfo id
               in return $
                   litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
           else
-              cgLookupPanic id -- Bug
+              cgLookupPanic id -- Bug, id is neither in local binds nor is external
         }}}
 
 -- | Retrieve cg info for a name if it already exists.


=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -71,12 +71,13 @@ moving parts are:
   -fomit-interface-pragmas or -fno-code; and we won't read it in if you have
   -fignore-interface-pragmas.  (We could revisit this decision.)
 
-Note [Imported nullary datacon wrappers must have correct LFInfo]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 As described in `Note [Conveying CAF-info and LFInfo between modules]`,
-imported nullary datacons must have their LambdaFormInfo set to reflect the
-fact that they are evaluated . This is necessary are otherwise references
-to them may be passed untagged to code that expects tagged references.
+imported unlifted nullary datacons must have their LambdaFormInfo set to
+reflect the fact that they are evaluated . This is necessary as otherwise
+references to them may be passed untagged to code that expects tagged
+references.
 
 What may be less obvious is that this must be done for not only datacon
 workers but also *wrappers*. The reason is found in this program
@@ -113,6 +114,8 @@ The fix is straightforward: extend the logic in `mkLFImported` to cover
 know that the wrapper of a nullary datacon will be in WHNF, even if it
 includes equalities evidence (since such equalities are not runtime
 relevant). This fixed #23146.
+
+See also Note [The LFInfo of Imported Ids]
 -}
 
 -- | Codegen-generated Id infos, to be passed to downstream via interfaces.
@@ -161,7 +164,7 @@ data LambdaFormInfo
         !StandardFormInfo
         !Bool           -- True <=> *might* be a function type
 
-  | LFCon               -- A saturated constructor application
+  | LFCon               -- A saturated data constructor application
         !DataCon        -- The constructor
 
   | LFUnknown           -- Used for function arguments and imported things.


=====================================
compiler/GHC/Types/Id.hs
=====================================
@@ -697,6 +697,8 @@ idCallArity id = callArityInfo (idInfo id)
 setIdCallArity :: Id -> Arity -> Id
 setIdCallArity id arity = modifyIdInfo (`setCallArityInfo` arity) id
 
+-- | This function counts all arguments post-unarisation, which includes
+-- arguments with no runtime representation -- see Note [Unarisation and arity]
 idFunRepArity :: Id -> RepArity
 idFunRepArity x = countFunRepArgs (idArity x) (idType x)
 


=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -374,6 +374,7 @@ data IdInfo
         --
         -- See documentation of the getters for what these packed fields mean.
         lfInfo          :: !(Maybe LambdaFormInfo),
+        -- ^ See Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure
 
         -- See documentation of the getters for what these packed fields mean.
         tagSig          :: !(Maybe TagSig)
@@ -439,7 +440,7 @@ oneShotInfo :: IdInfo -> OneShotInfo
 oneShotInfo = bitfieldGetOneShotInfo . bitfield
 
 -- | 'Id' arity, as computed by "GHC.Core.Opt.Arity". Specifies how many arguments
--- this 'Id' has to be applied to before it doesn any meaningful work.
+-- this 'Id' has to be applied to before it does any meaningful work.
 arityInfo :: IdInfo -> ArityInfo
 arityInfo = bitfieldGetArityInfo . bitfield
 


=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -595,10 +595,12 @@ mkDataConWorkId wkr_name data_con
                    `setInlinePragInfo`     wkr_inline_prag
                    `setUnfoldingInfo`      evaldUnfolding  -- Record that it's evaluated,
                                                            -- even if arity = 0
+                   `setLFInfo`             wkr_lf_info
           -- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
 
     wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
     wkr_arity = dataConRepArity data_con
+    wkr_lf_info = 
 
     ----------- Workers for newtypes --------------
     univ_tvs = dataConUnivTyVars data_con


=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.hs
=====================================
@@ -0,0 +1,14 @@
+{-# LANGUAGE GADTs, DataKinds #-}
+
+import T23146_lifted_unliftedA
+
+import Data.Type.Equality
+
+fieldsSam :: NP True -> NP True -> Bool
+fieldsSam (x' ::* xs) (y' ::* ys) = fieldsSam xs ys
+fieldsSam (UNil Refl) (UNil Refl) = True
+
+main :: IO ()
+main = print (fieldsSam (UNil Refl) (UNil Refl))
+
+


=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unlifted.stdout
=====================================
@@ -0,0 +1 @@
+True


=====================================
testsuite/tests/codeGen/should_run/T23146/T23146_lifted_unliftedA.hs
=====================================
@@ -0,0 +1,13 @@
+{-# OPTIONS_GHC -O1 #-}
+{-# LANGUAGE DataKinds #-}
+
+module T23146_lifted_unliftedA where
+
+import Data.Kind
+import Data.Type.Equality
+
+data NP a where
+  UNil :: {-# UNPACK #-} !(a :~: True) -> NP a
+  (::*) :: Bool -> NP True -> NP True
+
+


=====================================
testsuite/tests/codeGen/should_run/T23146/all.T
=====================================
@@ -1,4 +1,4 @@
 test('T23146', normal, compile_and_run, [''])
 test('T23146_lifted', normal, compile_and_run, [''])
 test('T23146_liftedeq', normal, compile_and_run, [''])
-
+test('T23146_lifted_unlifted', normal, compile_and_run, [''])



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/9ed451199b0c955ee9a46d8e667eb88a4d0d2aa6

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/9ed451199b0c955ee9a46d8e667eb88a4d0d2aa6
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230426/9380cde5/attachment-0001.html>


More information about the ghc-commits mailing list