[Git][ghc/ghc][wip/T23146] Make LFInfos for DataCons on construction

Rodrigo Mesquita (@alt-romes) gitlab at gitlab.haskell.org
Fri Apr 28 11:38:36 UTC 2023



Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC


Commits:
d0fd4454 by Rodrigo Mesquita at 2023-04-28T12:38:25+01:00
Make LFInfos for DataCons on construction

As a result of the discussion in !10165, we decided to amend the
previous commit which fixed the logic of `mkLFImported` with regard to
datacon workers and wrappers.

Instead of having the logic for the LFInfo of datacons be in
`mkLFImported`, we now construct an LFInfo for all data constructors on
GHC.Types.Id.Make and store it in the `lfInfo` field.

See the new Note [LFInfo of DataCon workers and wrappers] and
ammendments to Note [The LFInfo of Imported Ids]

- - - - -


3 changed files:

- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/Types/Id/Info.hs
- compiler/GHC/Types/Id/Make.hs


Changes:

=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -96,6 +96,7 @@ import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Panic.Plain
 import GHC.Utils.Misc
+import GHC.Data.Maybe (isNothing)
 
 import Data.Coerce (coerce)
 import qualified Data.ByteString.Char8 as BS8
@@ -255,130 +256,65 @@ mkApLFInfo id upd_flag arity
         (mightBeFunTy (idType id))
 
 -------------
+-- | Make a 'LambdaFormInfo' for an imported Id.
+--   See Note [The LFInfo of Imported Ids]
 mkLFImported :: Id -> LambdaFormInfo
 mkLFImported id =
     -- See Note [Conveying CAF-info and LFInfo between modules] in
     -- GHC.StgToCmm.Types
     case idLFInfo_maybe id of
       Just lf_info ->
-        -- Use the LambdaFormInfo from the interface
+        -- Use the existing LambdaFormInfo
         lf_info
       Nothing
-        -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
-        -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+        -- Doesn't have a LambdaFormInfo, but we know it must be 'LFReEntrant' from its arity
         | arity > 0
         -> LFReEntrant TopLevel arity True ArgUnknown
 
-        | Just con <- isDataConId_maybe id
-          -- See Note [Imported unlifted nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
-          -- and Note [The LFInfo of Imported Ids] below
-        -> assert (hasNoNonZeroWidthArgs con) $
-           LFCon con   -- An imported nullary constructor
-                       -- We assume that the constructor is evaluated so that
-                       -- the id really does point directly to the constructor
-
+        -- We can't be sure of the LambdaFormInfo of this imported Id,
+        -- so make a conservative one from the type.
         | otherwise
-        -> mkLFArgument id -- Not sure of exact arity
+        -> assert (isNothing (isDataConId_maybe id)) $ -- See Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+           mkLFArgument id -- Not sure of exact arity
   where
     arity = idFunRepArity id
-    hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
 
 {-
 Note [The LFInfo of Imported Ids]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As explained in Note [Conveying CAF-info and LFInfo between modules] and
-Note [Imported unlifted nullary datacon wrappers must have correct LFInfo], the
-LambdaFormInfo records the details of a closure representation and is often,
-when optimisations are enabled, serialized to the interface of a module.
-
-In particular, the `lfInfo` field of the `IdInfo` field of an `Id`
-* For Ids defined in this module: is `Nothing`
-* For imported Ids:
+As explained in Note [Conveying CAF-info and LFInfo between modules]
+the LambdaFormInfo records the details of a closure representation and is
+often, when optimisations are enabled, serialized to the interface of a module.
+
+In particular, the `lfInfo` field of the `IdInfo` field of an `Id`:
+* For DataCon workers and wrappers is populated as described in
+Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+* For other Ids defined in this module: is `Nothing`
+* For other imported Ids:
   * is (Just lf_info) if the LFInfo was serialised into the interface file
     (typically, when the exporting module was compiled with -O)
   * is Nothing if it wasn't serialised
 
-However, when an interface doesn't have a LambdaFormInfo for some imported Id
-(so that its `lfInfo` field is `Nothing`), we can conservatively create one
-using `mkLFImported`.
-
 The LambdaFormInfo we give an Id is used in determining how to tag its pointer
-(see `litIdInfo`). Therefore, it's crucial we re-construct a LambdaFormInfo as
-faithfully as possible or otherwise risk having pointers incorrectly tagged,
-which can lead to performance issues and even segmentation faults (see #23231
-and #23146). In particular, saturated data constructor applications *must* be
-unambiguously given `LFCon`, and the invariant
-
-  If the LFInfo (serialised or built with mkLFImported) says LFCon, then it
-  really is a static data constructor, and similar for LFReEntrant
-
-must be upheld.
-
-In `mkLFImported`, we make a conservative approximation to the real
-LambdaFormInfo as follows:
-
-(1) Ids with an `idFunRepArity > 0` are `LFReEntrant` and pointers to them are
-tagged (by `litIdInfo`) with the corresponding arity.
-    - This is also true of data con wrappers and workers with arity > 0,
-    regardless of the runtime relevance of the arguments
-    - For example, `Just :: a -> Maybe a` is given `LFReEntrant`
-               and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too
-
-(2) Data constructors with `idFunRepArity == 0` should be given `LFCon` because
-they are fully saturated data constructor applications and pointers to them
-should be tagged with the constructor index.
-
-(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
-of the worker to zero-width arguments only (which are dropped after unarisation)
-
-(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
-no arguments whatsoever (not even zero-width args)
-
-To ensure we properly give `LFReEntrant` to data constructors with some arity,
-and `LFCon` only to data constructors with zero arity, we must first check for
-`arity > 0` and only afterwards `isDataConId` -- the order of the guards in
-`mkLFImported` is quite important.
-
-As an example, consider the following data constructors:
-
-  data T1 a where
-    TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
-
-  data T2 a where
-    TCon2 :: {-# UNPACK #-} !() -> T2 a
-
-  data T3 a where
-    TCon3 :: T3 '[]
-
-`TCon1`'s wrapper has a lifted equality argument, which is non-zero-width, while
-the worker has an unlifted equality argument, which is zero-width.
-
-`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
-while the worker has no arguments.
-
-`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument;
-their Core representation:
-
-  $WTCon3 :: T3 '[]
-  $WTCon3 = TCon3 @[] <Refl>
-
-  TCon3 :: forall (a :: * -> *). (a ~# []) => T a
-  TCon3 = /\a. \(co :: a~#[]). TCon3 co
-
-For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
-both have arity == 1.
-
-For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
-while the worker is `LFCon` since its arity == 0
-
-For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
-worker `LFReEntrant` since its arity == 1
-
-One might think we could give *workers* with only zero-width-args the `LFCon`
-LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
-However, these workers, albeit rarely used, are unambiguously functions
--- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
-See also the discussion in #23158.
+(see `litIdInfo`). Therefore, it's crucial we attribute a correct
+LambdaFormInfo to imported Ids, or otherwise risk having pointers incorrectly
+tagged which can lead to performance issues and even segmentation faults (see
+#23231 and #23146). In particular, saturated data constructor applications
+*must* be unambiguously given `LFCon`, and if the LFInfo says LFCon, then it
+really is a static data constructor, and similar for LFReEntrant.
+
+In `mkLFImported`, we construct a LambdaFormInfo for imported Ids as follows:
+
+(1) If the `lfInfo` field contains an LFInfo, we use that LFInfo which is
+correct by construction (the invariant being that if it exists, it is correct):
+  (1.1) Either it was serialised to the interface we're importing the Id from,
+  (1.2) Or it's a DataCon worker or wrapper and its LFInfo was constructed
+        according to Note [LFInfo of DataCon workers and wrappers]
+(2) When the `lfInfo` field is `Nothing`
+  (2.1) If the `idFunRepArity` of the Id is known and is greater than 0, then
+  the Id is unambiguously a function and is given `LFReEntrant`, and pointers
+  to this Id will be tagged (by `litIdInfo`) with the corresponding arity.
+  (2.2) Otherwise, we can make a conservative estimate from the type.
 
 -}
 


=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -120,7 +120,8 @@ infixl  1 `setRuleInfo`,
           `setCafInfo`,
           `setDmdSigInfo`,
           `setCprSigInfo`,
-          `setDemandInfo`
+          `setDemandInfo`,
+          `setLFInfo`
 {-
 ************************************************************************
 *                                                                      *


=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -87,6 +87,10 @@ import GHC.Data.FastString
 import GHC.Data.List.SetOps
 import Data.List        ( zipWith4 )
 
+-- A bit of a shame we must import these here
+import GHC.StgToCmm.Types (LambdaFormInfo(..))
+import GHC.Runtime.Heap.Layout (ArgDescr(ArgUnknown))
+
 {-
 ************************************************************************
 *                                                                      *
@@ -595,11 +599,17 @@ mkDataConWorkId wkr_name data_con
                    `setInlinePragInfo`     wkr_inline_prag
                    `setUnfoldingInfo`      evaldUnfolding  -- Record that it's evaluated,
                                                            -- even if arity = 0
+                   `setLFInfo`             wkr_lf_info
           -- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
 
     wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
     wkr_arity = dataConRepArity data_con
 
+    -- See Note [LFInfo of DataCon workers and wrappers]
+    wkr_lf_info
+      | wkr_arity == 0 = LFCon data_con
+      | otherwise      = LFReEntrant TopLevel wkr_arity True ArgUnknown
+
     ----------- Workers for newtypes --------------
     univ_tvs = dataConUnivTyVars data_con
     ex_tcvs  = dataConExTyCoVars data_con
@@ -608,6 +618,7 @@ mkDataConWorkId wkr_name data_con
                   `setArityInfo` 1      -- Arity 1
                   `setInlinePragInfo`     dataConWrapperInlinePragma
                   `setUnfoldingInfo`      newtype_unf
+                  `setLFInfo`            (LFReEntrant TopLevel 1 True ArgUnknown)
     id_arg1      = mkScaledTemplateLocal 1 (head arg_tys)
     res_ty_args  = mkTyCoVarTys univ_tvs
     newtype_unf  = assertPpr (null ex_tcvs && isSingleton arg_tys)
@@ -618,6 +629,85 @@ mkDataConWorkId wkr_name data_con
                    wrapNewTypeBody tycon res_ty_args (Var id_arg1)
 
 {-
+Note [LFInfo of DataCon workers and wrappers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As noted in Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure, it's
+crucial saturated data con applications are given an LFInfo of `LFCon`.
+
+Since for data constructors we never serialise the worker and the wrapper (only
+the data type declaration), we never serialise their lambda form info either.
+
+Therefore, when making data constructors workers and wrappers, we construct a
+correct LFInfo for them right away. This ensures the critical logic of creating
+the correct LFInfo for a DataCon is done once on creation and we assertain that:
+
+  The `lfInfo` field of a DataCon worker or wrapper is always populated with the correct LFInfo.
+
+which is expected by `mkLFImported`.
+NB: The greater invariant being that if an `lfInfo` field is populated, the
+    LFInfo in it contained is correct
+
+How do we construct a /correct/ LFInfo for workers and wrappers?
+
+(1) Data constructors with arity > 0 are unambiguously functions and should be
+given `LFReEntrant`, regardless of the runtime relevance of the arguments:
+  - For example, `Just :: a -> Maybe a` is given `LFReEntrant`,
+             and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too.
+
+(2) Data constructors with arity == 0 should be given `LFCon` because
+they are fully saturated data constructor applications (and pointers to them
+should be tagged with the constructor index).
+
+(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
+of the worker to zero-width arguments only (which are dropped after unarisation)
+
+(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
+no arguments whatsoever (not even zero-width args)
+
+For example, consider the following data constructors:
+
+  data T1 a where
+    TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
+
+  data T2 a where
+    TCon2 :: {-# UNPACK #-} !() -> T2 a
+
+  data T3 a where
+    TCon3 :: T3 '[]
+
+`TCon1`'s wrapper has a lifted argument, which is non-zero-width, while
+the worker has an unlifted equality argument, which is zero-width.
+
+`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
+while the worker has no arguments.
+
+`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument;
+their Core representation:
+
+  $WTCon3 :: T3 '[]
+  $WTCon3 = TCon3 @[] <Refl>
+
+  TCon3 :: forall (a :: * -> *). (a ~# []) => T a
+  TCon3 = /\a. \(co :: a~#[]). TCon3 co
+
+For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
+both have arity == 1.
+
+For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
+while the worker is `LFCon` since its arity == 0
+
+For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
+worker `LFReEntrant` since its arity == 1
+
+One might think we could give *workers* with only zero-width-args the `LFCon`
+LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
+However, these workers, albeit rarely used, are unambiguously functions
+-- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
+See also the discussion in #23158.
+
+See also the Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
+in GHC.StgToCmm.Types.
+
 -------------------------------------------------
 --         Data constructor representation
 --
@@ -709,11 +799,17 @@ mkDataConRep dc_bang_opts fam_envs wrap_name data_con
                              -- We need to get the CAF info right here because GHC.Iface.Tidy
                              -- does not tidy the IdInfo of implicit bindings (like the wrapper)
                              -- so it not make sure that the CAF info is sane
+                         `setLFInfo`            wrap_lf_info
 
              -- The signature is purely for passes like the Simplifier, not for
              -- DmdAnal itself; see Note [DmdAnal for DataCon wrappers].
              wrap_sig = mkClosedDmdSig wrap_arg_dmds topDiv
 
+             -- See Note [LFInfo of DataCon workers and wrappers]
+             wrap_lf_info
+               | wrap_arity == 0 = LFCon data_con
+               | otherwise       = LFReEntrant TopLevel wrap_arity True ArgUnknown
+
              wrap_arg_dmds =
                replicate (length theta) topDmd ++ map mk_dmd arg_ibangs
                -- Don't forget the dictionary arguments when building



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d0fd4454d93aff0d193c37593cb37d4872f4b81c

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d0fd4454d93aff0d193c37593cb37d4872f4b81c
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230428/ac2771d7/attachment-0001.html>


More information about the ghc-commits mailing list