[Git][ghc/ghc][wip/T23146] 5 commits: Make LFInfos for DataCons on construction

Rodrigo Mesquita (@alt-romes) gitlab at gitlab.haskell.org
Fri May 5 14:33:55 UTC 2023



Rodrigo Mesquita pushed to branch wip/T23146 at Glasgow Haskell Compiler / GHC


Commits:
b3137a58 by Rodrigo Mesquita at 2023-05-05T15:32:58+01:00
Make LFInfos for DataCons on construction

As a result of the discussion in !10165, we decided to amend the
previous commit which fixed the logic of `mkLFImported` with regard to
datacon workers and wrappers.

Instead of having the logic for the LFInfo of datacons be in
`mkLFImported`, we now construct an LFInfo for all data constructors on
GHC.Types.Id.Make and store it in the `lfInfo` field.

See the new Note [LFInfo of DataCon workers and wrappers] and
ammendments to Note [The LFInfo of Imported Ids]

- - - - -
cdb20d1b by Rodrigo Mesquita at 2023-05-05T15:33:02+01:00
Update Note [Core letrec invariant]

Authored by @simonpj

- - - - -
3831cdf5 by Rodrigo Mesquita at 2023-05-05T15:33:02+01:00
Rename mkLFImported to importedIdLFInfo

The `mkLFImported` sounded too much like a constructor of sorts, when
really it got the `LFInfo` of an imported Id from its `lf_info` field
when this existed, and otherwise returned a conservative estimate of
that imported Id's LFInfo. This in contrast to functions such as
`mkLFReEntrant` which really are about constructing an `LFInfo`.

- - - - -
0c4f6796 by Rodrigo Mesquita at 2023-05-05T15:33:02+01:00
Enforce invariant on typePrimRepArgs in the types

As part of the documentation effort in !10165 I came across this
invariant on 'typePrimRepArgs' which is easily expressed at the
type-level through a NonEmpty list.

It allowed us to remove one panic.

- - - - -
f801b269 by Rodrigo Mesquita at 2023-05-05T15:33:02+01:00
Merge outdated Note [Data con representation] into Note [Data constructor representation]

Introduce new Note [Constructor applications in STG] to better support
the merge, and reference it from the relevant bits in the STG syntax.

- - - - -


12 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/DataCon.hs
- compiler/GHC/Runtime/Heap/Inspect.hs
- compiler/GHC/Stg/InferTags/Rewrite.hs
- compiler/GHC/Stg/Syntax.hs
- compiler/GHC/StgToByteCode.hs
- compiler/GHC/StgToCmm/Closure.hs
- compiler/GHC/StgToCmm/Env.hs
- compiler/GHC/StgToCmm/Types.hs
- compiler/GHC/Types/Id/Info.hs
- compiler/GHC/Types/Id/Make.hs
- compiler/GHC/Types/RepType.hs


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -368,18 +368,37 @@ Note [Core letrec invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The Core letrec invariant:
 
-    The right hand sides of all
-      /top-level/ or /recursive/
-    bindings must be of lifted type
-
-    There is one exception to this rule, top-level @let at s are
-    allowed to bind primitive string literals: see
-    Note [Core top-level string literals].
+  The right hand sides of all /top-level/ or /recursive/
+  bindings must be of lifted type
 
 See "Type#type_classification" in GHC.Core.Type
-for the meaning of "lifted" vs. "unlifted").
-
-For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
+for the meaning of "lifted" vs. "unlifted".
+
+For the non-top-level, non-recursive case see
+Note [Core let-can-float invariant].
+
+At top level, however, there are two exceptions to this rule:
+
+(TL1) A top-level binding is allowed to bind primitive string literal,
+      (which is unlifted).  See Note [Core top-level string literals].
+
+(TL2) In CorePrep, we generate a top-level binding for every non-newtype data
+constructor worker or wrapper
+      e.g.   data T = MkT Int
+      we generate
+             MkT :: Int -> T
+             MkT = \x. MkT x
+      (This binding looks recursive, but isn't; it defines a top-level, curried
+      function whose body just allocates and returns the data constructor.)
+
+      But if (a) the data contructor is nullary and (b) the data type is unlifted,
+      this binding is unlifted.
+      e.g.   data S :: UnliftedType where { S1 :: S, S2 :: S -> S }
+      we generate
+             S1 :: S   -- A top-level unlifted binding
+             S1 = S1
+      We allow this top-level unlifted binding to exist, after CorePrep
+      only.
 
 Note [Core let-can-float invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


=====================================
compiler/GHC/Core/DataCon.hs
=====================================
@@ -141,7 +141,19 @@ becomes
         case e of { T a' b -> let a = I# a' in ... }
 
 To keep ourselves sane, we name the different versions of the data constructor
-differently, as follows.
+differently, as follows in Note [Data Constructor Naming].
+
+The `dcRepType` field of a `DataCon` contains the type of the representation of
+the constructor /worker/, also called the Core representation.
+
+The Core representation may differ from the type of the constructor /wrapper/
+(built by `mkDataConId`). Besides unpacking (as seen in the example above),
+dictionaries and coercions become explict arguments in the Core representation
+of a constructor.
+
+Note that this representation is still *different* from runtime
+representation. (Which is what STG uses after unarise).
+See Note [Constructor applications in STG] in GHC.Stg.Syntax.
 
 
 Note [Data Constructor Naming]
@@ -528,7 +540,7 @@ data DataCon
                                 --      forall a x y. (a~(x,y), x~y, Ord x) =>
                                 --        x -> y -> T a
                                 -- (this is *not* of the constructor wrapper Id:
-                                --  see Note [Data con representation] below)
+                                --  see Note [Data constructor representation])
         -- Notice that the existential type parameters come *second*.
         -- Reason: in a case expression we may find:
         --      case (e :: T t) of
@@ -586,12 +598,19 @@ Function call 'dataConKindEqSpec' returns [k'~k]
 
 Note [DataCon arities]
 ~~~~~~~~~~~~~~~~~~~~~~
-A `DataCon`'s source arity and core representation arity may differ:
-`dcSourceArity` does not take constraints into account, but `dcRepArity` does.
-
-The additional arguments taken into account by `dcRepArity` include quantified
-dictionaries and coercion arguments, lifted and unlifted (despite the unlifted
-coercion arguments having a zero-width runtime representation).
+A `DataCon`'s source and core representation may differ, meaning the source
+arity (`dcSourceArity`) and the core representation arity (`dcRepArity`) may
+differ too.
+
+Note that the source arity is the number of arguments the data constructor has
+at its source level, which is also the number of arguments the data con
+/wrapper/ has. On the other hand, the Core representation arity is the number
+of arguments of the data constructor in its Core representation, which is also
+the number of arguments of the data con /worker/.
+
+The arity might differ since `dcRepArity` takes into account arguments such as
+quantified dictionaries and coercion arguments, lifted and unlifted (despite
+the unlifted coercion arguments having a zero-width runtime representation).
 For example:
    MkT :: Ord a => a -> T a
     dcSourceArity = 1
@@ -601,6 +620,15 @@ For example:
     dcSourceArity = 0
     dcRepArity    = 1
 
+The arity might also differ due to unpacking, for example, consider the
+following datatype and its wrapper and worker's type:
+   data V = MkV !() !Int
+   $WV :: () -> Int -> V
+     V :: Int# -> V
+As you see, because of unpacking we have both dropped the unit argument and
+unboxed the Int. In this case, the source arity (which is the arity of the
+wrapper) is 2, while the Core representation arity (the arity of the worker) is 1.
+
 
 Note [DataCon user type variable binders]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -969,51 +997,6 @@ we consult HsImplBang:
 The boolean flag is used only for this warning.
 See #11270 for motivation.
 
-Note [Data con representation]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The dcRepType field contains the type of the representation of a constructor
-This may differ from the type of the constructor *Id* (built
-by MkId.mkDataConId) for two reasons:
-        a) the constructor Id may be overloaded, but the dictionary isn't stored
-           e.g.    data Eq a => T a = MkT a a
-
-        b) the constructor may store an unboxed version of a strict field.
-
-So whenever this module talks about the representation of a data constructor
-what it means is the DataCon with all Unpacking having been applied.
-We can think of this as the Core representation.
-
-Here's an example illustrating the Core representation:
-        data Ord a => T a = MkT Int! a Void#
-Here
-        T :: Ord a => Int -> a -> Void# -> T a
-but the rep type is
-        Trep :: Int# -> a -> Void# -> T a
-Actually, the unboxed part isn't implemented yet!
-
-Note that this representation is still *different* from runtime
-representation. (Which is what STG uses after unarise).
-
-This is how T would end up being used in STG post-unarise:
-
-  let x = T 1# y
-  in ...
-      case x of
-        T int a -> ...
-
-The Void# argument is dropped and the boxed int is replaced by an unboxed
-one. In essence we only generate binders for runtime relevant values.
-
-We also flatten out unboxed tuples in this process. See the unarise
-pass for details on how this is done. But as an example consider
-`data S = MkS Bool (# Bool | Char #)` which when matched on would
-result in an alternative with three binders like this
-
-    MkS bool tag tpl_field ->
-
-See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
-for the details of this transformation.
-
 
 ************************************************************************
 *                                                                      *


=====================================
compiler/GHC/Runtime/Heap/Inspect.hs
=====================================
@@ -889,12 +889,12 @@ extractSubTerms recurse clos = liftM thdOf3 . go 0 0
            return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
       | otherwise
       = case typePrimRepArgs ty of
-          [rep_ty] ->  do
+          rep_ty :| [] ->  do
             (ptr_i, arr_i, term0)  <- go_rep ptr_i arr_i ty rep_ty
             (ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
             return (ptr_i, arr_i, term0 : terms1)
-          rep_tys -> do
-           (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i rep_tys
+          rep_ty :| rep_tys -> do
+           (ptr_i, arr_i, terms0) <- go_unary_types ptr_i arr_i (rep_ty:rep_tys)
            (ptr_i, arr_i, terms1) <- go ptr_i arr_i tys
            return (ptr_i, arr_i, unboxedTupleTerm ty terms0 : terms1)
 


=====================================
compiler/GHC/Stg/InferTags/Rewrite.hs
=====================================
@@ -36,7 +36,7 @@ import GHC.Core            ( AltCon(..) )
 import GHC.Core.Type
 
 import GHC.StgToCmm.Types
-import GHC.StgToCmm.Closure (mkLFImported)
+import GHC.StgToCmm.Closure (importedIdLFInfo)
 
 import GHC.Stg.Utils
 import GHC.Stg.Syntax as StgSyn
@@ -271,11 +271,11 @@ isTagged v = do
                             TagProper -> True
                             TagTagged -> True
                             TagTuple _ -> True -- Consider unboxed tuples tagged.
-        False -- Imported
-            -> return $!
+        -- Imported
+        False -> return $!
                 -- Determine whether it is tagged from the LFInfo of the imported id.
                 -- See Note [The LFInfo of Imported Ids]
-                case mkLFImported v of
+                case importedIdLFInfo v of
                     -- Function, applied not entered.
                     LFReEntrant {}
                         -> True


=====================================
compiler/GHC/Stg/Syntax.hs
=====================================
@@ -237,6 +237,50 @@ StgConApp and StgPrimApp --- saturated applications
 
 There are specialised forms of application, for constructors, primitives, and
 literals.
+
+Note [Constructor applications in STG]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In STG, constructor applications `StgConApp` and `StgRhsCon` are always /fully saturated/. 
+However, the number of arguments saturating the application may differ
+post-unarisation because we drop void arguments from `StgConApp` and `StgRhsCon`
+constructor applications during unarisation.
+
+Therefore, in `StgConApp` and `StgRhsCon`:
+* Before unarisation, we have the saturated list of the arguments [StgArg]
+* Post unarisation, we have the saturated list of non-void arguments, the type
+of which should really be [NonVoid StgArg]
+
+As an example, consider:
+
+        data T a = MkT Int! a Void#
+
+The wrapper's representation and the worker's representation (i.e. the
+datacon's Core representation) are respectively:
+
+        $WT :: Int  -> a -> Void# -> T a
+        T   :: Int# -> a -> Void# -> T a
+
+T would end up being used in STG post-unarise as:
+
+  let x = T 1# y
+  in ...
+      case x of
+        T int a -> ...
+
+The Void# argument is dropped. In essence we only generate binders for runtime
+relevant values.
+
+We also flatten out unboxed tuples in this process. See the unarise
+pass for details on how this is done. But as an example consider
+`data S = MkS Bool (# Bool | Char #)` which when matched on would
+result in an alternative with three binders like this
+
+    MkS bool tag tpl_field ->
+
+See Note [Translating unboxed sums to unboxed tuples] and Note [Unarisation]
+for the details of this transformation.
+
 -}
 
   | StgLit      Literal
@@ -245,7 +289,7 @@ literals.
         -- which can't be let-bound
   | StgConApp   DataCon
                 ConstructorNumber
-                [StgArg] -- Saturated. (After Unarisation, [NonVoid StgArg])
+                [StgArg] -- Saturated. See Note [Constructor applications in STG]
                 [Type]   -- See Note [Types in StgConApp] in GHC.Stg.Unarise
 
   | StgOpApp    StgOp    -- Primitive op or foreign call
@@ -421,7 +465,7 @@ important):
                         -- are not allocated.
         ConstructorNumber
         [StgTickish]
-        [StgArg]        -- Args
+        [StgArg]        -- Saturated Args. See Note [Constructor applications in STG]
 
 -- | Like 'GHC.Hs.Extension.NoExtField', but with an 'Outputable' instance that
 -- returns 'empty'.


=====================================
compiler/GHC/StgToByteCode.hs
=====================================
@@ -81,8 +81,10 @@ import Data.Coerce (coerce)
 import Data.ByteString (ByteString)
 import Data.Map (Map)
 import Data.IntMap (IntMap)
+import Data.List.NonEmpty (NonEmpty(..))
 import qualified Data.Map as Map
 import qualified Data.IntMap as IntMap
+import qualified Data.List.NonEmpty as NE
 import qualified GHC.Data.FiniteMap as Map
 import Data.Ord
 import GHC.Stack.CCS
@@ -296,8 +298,8 @@ argBits platform (rep : args)
   | isFollowableArg rep  = False : argBits platform args
   | otherwise = replicate (argRepSizeW platform rep) True ++ argBits platform args
 
-non_void :: [ArgRep] -> [ArgRep]
-non_void = filter nv
+non_void :: NonEmpty ArgRep -> [ArgRep]
+non_void = NE.filter nv
   where nv V = False
         nv _ = True
 
@@ -464,7 +466,7 @@ returnUnliftedAtom d s p e = do
                  StgLitArg lit -> typePrimRepArgs (literalType lit)
                  StgVarArg i   -> bcIdPrimReps i
     (push, szb) <- pushAtom d p e
-    ret <- returnUnliftedReps d s szb reps
+    ret <- returnUnliftedReps d s szb (NE.toList reps)
     return (push `appOL` ret)
 
 -- return an unlifted value from the top of the stack
@@ -867,7 +869,7 @@ doCase d s p scrut bndr alts
         (bndr_size, call_info, args_offsets)
            | ubx_tuple_frame =
                let bndr_ty = primRepCmmType platform
-                   bndr_reps = filter (not.isVoidRep) (bcIdPrimReps bndr)
+                   bndr_reps = NE.filter (not.isVoidRep) (bcIdPrimReps bndr)
                    (call_info, args_offsets) =
                        layoutNativeCall profile NativeTupleReturn 0 bndr_ty bndr_reps
                in ( wordsToBytes platform (nativeCallSize call_info)
@@ -1660,9 +1662,8 @@ maybe_getCCallReturnRep fn_ty
                          (pprType fn_ty)
      in
        case r_reps of
-         []            -> panic "empty typePrimRepArgs"
-         [VoidRep]     -> Nothing
-         [rep]         -> Just rep
+         VoidRep :| [] -> Nothing
+         rep     :| [] -> Just rep
 
                  -- if it was, it would be impossible to create a
                  -- valid return value placeholder on the stack
@@ -2117,7 +2118,7 @@ idSizeCon platform var
     isUnboxedSumType (idType var) =
     wordsToBytes platform .
     WordOff . sum . map (argRepSizeW platform . toArgRep platform) .
-    bcIdPrimReps $ var
+    NE.toList . bcIdPrimReps $ var
   | otherwise = ByteOff (primRepSizeB platform (bcIdPrimRep var))
 
 bcIdArgRep :: Platform -> Id -> ArgRep
@@ -2125,13 +2126,13 @@ bcIdArgRep platform = toArgRep platform . bcIdPrimRep
 
 bcIdPrimRep :: Id -> PrimRep
 bcIdPrimRep id
-  | [rep] <- typePrimRepArgs (idType id)
+  | rep :| [] <- typePrimRepArgs (idType id)
   = rep
   | otherwise
   = pprPanic "bcIdPrimRep" (ppr id <+> dcolon <+> ppr (idType id))
 
 
-bcIdPrimReps :: Id -> [PrimRep]
+bcIdPrimReps :: Id -> NonEmpty PrimRep
 bcIdPrimReps id = typePrimRepArgs (idType id)
 
 repSizeWords :: Platform -> PrimRep -> WordOff
@@ -2189,8 +2190,8 @@ atomRep platform e = toArgRep platform (atomPrimRep e)
 mkStackOffsets :: ByteOff -> [ByteOff] -> [ByteOff]
 mkStackOffsets original_depth szsb = tail (scanl' (+) original_depth szsb)
 
-typeArgReps :: Platform -> Type -> [ArgRep]
-typeArgReps platform = map (toArgRep platform) . typePrimRepArgs
+typeArgReps :: Platform -> Type -> NonEmpty ArgRep
+typeArgReps platform = NE.map (toArgRep platform) . typePrimRepArgs
 
 -- -----------------------------------------------------------------------------
 -- The bytecode generator's monad


=====================================
compiler/GHC/StgToCmm/Closure.hs
=====================================
@@ -28,7 +28,7 @@ module GHC.StgToCmm.Closure (
         LambdaFormInfo,         -- Abstract
         StandardFormInfo,        -- ...ditto...
         mkLFThunk, mkLFReEntrant, mkConLFInfo, mkSelectorLFInfo,
-        mkApLFInfo, mkLFImported, mkLFArgument, mkLFLetNoEscape,
+        mkApLFInfo, importedIdLFInfo, mkLFArgument, mkLFLetNoEscape,
         mkLFStringLit,
         lfDynTag,
         isLFThunk, isLFReEntrant, lfUpdatable,
@@ -96,6 +96,7 @@ import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Panic.Plain
 import GHC.Utils.Misc
+import GHC.Data.Maybe (isNothing)
 
 import Data.Coerce (coerce)
 import qualified Data.ByteString.Char8 as BS8
@@ -255,130 +256,67 @@ mkApLFInfo id upd_flag arity
         (mightBeFunTy (idType id))
 
 -------------
-mkLFImported :: Id -> LambdaFormInfo
-mkLFImported id =
+-- | The 'LambdaFormInfo' of an imported Id.
+--   See Note [The LFInfo of Imported Ids]
+importedIdLFInfo :: Id -> LambdaFormInfo
+importedIdLFInfo id =
     -- See Note [Conveying CAF-info and LFInfo between modules] in
     -- GHC.StgToCmm.Types
     case idLFInfo_maybe id of
       Just lf_info ->
-        -- Use the LambdaFormInfo from the interface
+        -- Use the existing LambdaFormInfo
         lf_info
       Nothing
-        -- Interface doesn't have a LambdaFormInfo, so make a conservative one from the type.
-        -- See Note [The LFInfo of Imported Ids]; The order of the guards musn't be changed!
+        -- Doesn't have a LambdaFormInfo, but we know it must be 'LFReEntrant' from its arity
         | arity > 0
         -> LFReEntrant TopLevel arity True ArgUnknown
 
-        | Just con <- isDataConId_maybe id
-          -- See Note [Imported unlifted nullary datacon wrappers must have correct LFInfo] in GHC.StgToCmm.Types
-          -- and Note [The LFInfo of Imported Ids] below
-        -> assert (hasNoNonZeroWidthArgs con) $
-           LFCon con   -- An imported nullary constructor
-                       -- We assume that the constructor is evaluated so that
-                       -- the id really does point directly to the constructor
-
+        -- We can't be sure of the LambdaFormInfo of this imported Id,
+        -- so make a conservative one from the type.
         | otherwise
-        -> mkLFArgument id -- Not sure of exact arity
+        -> assert (isNothing (isDataConId_maybe id)) $ -- See Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+           mkLFArgument id -- Not sure of exact arity
   where
     arity = idFunRepArity id
-    hasNoNonZeroWidthArgs = all (isZeroBitTy . scaledThing) . dataConRepArgTys
 
 {-
 Note [The LFInfo of Imported Ids]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As explained in Note [Conveying CAF-info and LFInfo between modules] and
-Note [Imported unlifted nullary datacon wrappers must have correct LFInfo], the
-LambdaFormInfo records the details of a closure representation and is often,
-when optimisations are enabled, serialized to the interface of a module.
-
-In particular, the `lfInfo` field of the `IdInfo` field of an `Id`
-* For Ids defined in this module: is `Nothing`
-* For imported Ids:
+As explained in Note [Conveying CAF-info and LFInfo between modules]
+the LambdaFormInfo records the details of a closure representation and is
+often, when optimisations are enabled, serialized to the interface of a module.
+
+In particular, the `lfInfo` field of the `IdInfo` field of an `Id`:
+* For DataCon workers and wrappers is populated as described in
+Note [LFInfo of DataCon workers and wrappers] in GHC.Types.Id.Make
+* For other Ids defined in the module being compiled: is `Nothing`
+* For other imported Ids:
   * is (Just lf_info) if the LFInfo was serialised into the interface file
     (typically, when the exporting module was compiled with -O)
   * is Nothing if it wasn't serialised
 
-However, when an interface doesn't have a LambdaFormInfo for some imported Id
-(so that its `lfInfo` field is `Nothing`), we can conservatively create one
-using `mkLFImported`.
-
 The LambdaFormInfo we give an Id is used in determining how to tag its pointer
-(see `litIdInfo`). Therefore, it's crucial we re-construct a LambdaFormInfo as
-faithfully as possible or otherwise risk having pointers incorrectly tagged,
-which can lead to performance issues and even segmentation faults (see #23231
-and #23146). In particular, saturated data constructor applications *must* be
-unambiguously given `LFCon`, and the invariant
-
-  If the LFInfo (serialised or built with mkLFImported) says LFCon, then it
-  really is a static data constructor, and similar for LFReEntrant
-
-must be upheld.
-
-In `mkLFImported`, we make a conservative approximation to the real
-LambdaFormInfo as follows:
-
-(1) Ids with an `idFunRepArity > 0` are `LFReEntrant` and pointers to them are
-tagged (by `litIdInfo`) with the corresponding arity.
-    - This is also true of data con wrappers and workers with arity > 0,
-    regardless of the runtime relevance of the arguments
-    - For example, `Just :: a -> Maybe a` is given `LFReEntrant`
-               and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too
-
-(2) Data constructors with `idFunRepArity == 0` should be given `LFCon` because
-they are fully saturated data constructor applications and pointers to them
-should be tagged with the constructor index.
-
-(2.1) A datacon *wrapper* with zero arity must be a fully saturated application
-of the worker to zero-width arguments only (which are dropped after unarisation)
-
-(2.2) A datacon *worker* with zero arity is trivially fully saturated, it takes
-no arguments whatsoever (not even zero-width args)
-
-To ensure we properly give `LFReEntrant` to data constructors with some arity,
-and `LFCon` only to data constructors with zero arity, we must first check for
-`arity > 0` and only afterwards `isDataConId` -- the order of the guards in
-`mkLFImported` is quite important.
-
-As an example, consider the following data constructors:
-
-  data T1 a where
-    TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
-
-  data T2 a where
-    TCon2 :: {-# UNPACK #-} !() -> T2 a
-
-  data T3 a where
-    TCon3 :: T3 '[]
-
-`TCon1`'s wrapper has a lifted equality argument, which is non-zero-width, while
-the worker has an unlifted equality argument, which is zero-width.
-
-`TCon2`'s wrapper has a lifted equality argument, which is non-zero-width,
-while the worker has no arguments.
-
-`TCon3`'s wrapper has no arguments, and the worker has 1 zero-width argument;
-their Core representation:
-
-  $WTCon3 :: T3 '[]
-  $WTCon3 = TCon3 @[] <Refl>
-
-  TCon3 :: forall (a :: * -> *). (a ~# []) => T a
-  TCon3 = /\a. \(co :: a~#[]). TCon3 co
-
-For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
-both have arity == 1.
-
-For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
-while the worker is `LFCon` since its arity == 0
-
-For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
-worker `LFReEntrant` since its arity == 1
-
-One might think we could give *workers* with only zero-width-args the `LFCon`
-LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
-However, these workers, albeit rarely used, are unambiguously functions
--- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
-See also the discussion in #23158.
+(see `litIdInfo` and `lfDynTag`). Therefore, it's crucial we attribute a correct
+LambdaFormInfo to imported Ids, or otherwise risk having pointers incorrectly
+tagged which can lead to performance issues and even segmentation faults (see
+#23231 and Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]).
+
+In particular, saturated data constructor applications *must* be unambiguously
+given `LFCon`, and if the LFInfo says LFCon, then it really is a static data
+constructor, and similar for LFReEntrant.
+
+In `importedIdLFInfo`, we construct a LambdaFormInfo for imported Ids as follows:
+
+(1) If the `lfInfo` field contains an LFInfo, we use that LFInfo which is
+correct by construction (the invariant being that if it exists, it is correct):
+  (1.1) Either it was serialised to the interface we're importing the Id from,
+  (1.2) Or it's a DataCon worker or wrapper and its LFInfo was constructed
+        according to Note [LFInfo of DataCon workers and wrappers]
+(2) When the `lfInfo` field is `Nothing`
+  (2.1) If the `idFunRepArity` of the Id is known and is greater than 0, then
+  the Id is unambiguously a function and is given `LFReEntrant`, and pointers
+  to this Id will be tagged (by `litIdInfo`) with the corresponding arity.
+  (2.2) Otherwise, we can make a conservative estimate from the type.
 
 -}
 


=====================================
compiler/GHC/StgToCmm/Env.hs
=====================================
@@ -149,7 +149,7 @@ getCgIdInfo id
                       | otherwise
                       = pprPanic "GHC.StgToCmm.Env: label not found" (ppr id <+> dcolon <+> ppr (idType id))
               in return $
-                  litIdInfo platform id (mkLFImported id) (CmmLabel ext_lbl)
+                  litIdInfo platform id (importedIdLFInfo id) (CmmLabel ext_lbl)
           else
               cgLookupPanic id -- Bug, id is neither in local binds nor is external
         }}}


=====================================
compiler/GHC/StgToCmm/Types.hs
=====================================
@@ -53,7 +53,7 @@ make a conservative assumption, but that is bad: e.g.
   #16559, #15155, and wiki: commentary/rts/haskell-execution/pointer-tagging
 
   Conservative assumption here is made when we import an Id without a
-  LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.mkLFImported.
+  LambdaFormInfo in the interface, in GHC.StgToCmm.Closure.importedIdLFInfo.
 
 So we arrange to always serialise this information into the interface file.  The
 moving parts are:
@@ -75,9 +75,25 @@ Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 As described in `Note [Conveying CAF-info and LFInfo between modules]`,
 imported unlifted nullary datacons must have their LambdaFormInfo set to
-reflect the fact that they are evaluated . This is necessary as otherwise
+reflect the fact that they are evaluated. This is necessary as otherwise
 references to them may be passed untagged to code that expects tagged
-references.
+references because of the unlifted nature of the argument.
+
+For example, in
+
+   type T :: UnliftedType
+   data T = T1
+          | T2
+
+   f :: T -> Int
+   f x = case x of T1 -> 1; T2 -> 2
+
+`f` expects `x` to be evaluated and properly tagged due to its unliftedness.
+We can guarantee all occurrences of `T1` and `T2` are considered evaluated and
+are properly tagged by giving them the `LFCon` LambdaFormInfo which indicates
+they are fully saturated constructor applications.
+(The LambdaFormInfo is used to tag the pointer with the tag of the
+constructor, in `litIdInfo`)
 
 What may be less obvious is that this must be done for not only datacon
 workers but also *wrappers*. The reason is found in this program
@@ -109,11 +125,9 @@ pointer to `fieldsSam`. This is problematic as `fieldsSam` may take advantage
 of the unlifted nature of its arguments by omitting handling of the zero
 tag when scrutinising them.
 
-The fix is straightforward: extend the logic in `mkLFImported` to cover
-(nullary) datacon wrappers as well as workers. This is safe because we
-know that the wrapper of a nullary datacon will be in WHNF, even if it
-includes equalities evidence (since such equalities are not runtime
-relevant). This fixed #23146.
+The fix is straightforward: ensure we always construct a /correct/ LFInfo for
+datacon workers and wrappers, and populate the `lfInfo` with it. See
+Note [LFInfo of DataCon workers and wrappers]. This fixed #23146.
 
 See also Note [The LFInfo of Imported Ids]
 -}


=====================================
compiler/GHC/Types/Id/Info.hs
=====================================
@@ -120,7 +120,8 @@ infixl  1 `setRuleInfo`,
           `setCafInfo`,
           `setDmdSigInfo`,
           `setCprSigInfo`,
-          `setDemandInfo`
+          `setDemandInfo`,
+          `setLFInfo`
 {-
 ************************************************************************
 *                                                                      *
@@ -374,7 +375,12 @@ data IdInfo
         --
         -- See documentation of the getters for what these packed fields mean.
         lfInfo          :: !(Maybe LambdaFormInfo),
-        -- ^ See Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure
+        -- ^ If lfInfo = Just info, then the `info` is guaranteed /correct/.
+        --   If lfInfo = Nothing, then we do not have a `LambdaFormInfo` for this Id,
+        --                so (for imported Ids) we make a conservative version.
+        --                See Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure
+        -- For locally-defined Ids other than DataCons, the `lfInfo` field is always Nothing.
+        -- See also Note [LFInfo of DataCon workers and wrappers]
 
         -- See documentation of the getters for what these packed fields mean.
         tagSig          :: !(Maybe TagSig)


=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -65,6 +65,7 @@ import GHC.Core.DataCon
 
 import GHC.Types.Literal
 import GHC.Types.SourceText
+import GHC.Types.RepType ( countFunRepArgs )
 import GHC.Types.Name.Set
 import GHC.Types.Name
 import GHC.Types.ForeignCall
@@ -87,6 +88,10 @@ import GHC.Data.FastString
 import GHC.Data.List.SetOps
 import Data.List        ( zipWith4 )
 
+-- A bit of a shame we must import these here
+import GHC.StgToCmm.Types (LambdaFormInfo(..))
+import GHC.Runtime.Heap.Layout (ArgDescr(ArgUnknown))
+
 {-
 ************************************************************************
 *                                                                      *
@@ -595,11 +600,18 @@ mkDataConWorkId wkr_name data_con
                    `setInlinePragInfo`     wkr_inline_prag
                    `setUnfoldingInfo`      evaldUnfolding  -- Record that it's evaluated,
                                                            -- even if arity = 0
+                   `setLFInfo`             wkr_lf_info
           -- No strictness: see Note [Data-con worker strictness] in GHC.Core.DataCon
 
     wkr_inline_prag = defaultInlinePragma { inl_rule = ConLike }
     wkr_arity = dataConRepArity data_con
 
+    -- See Note [LFInfo of DataCon workers and wrappers]
+    wkr_lf_info
+      | wkr_arity == 0 = LFCon data_con
+      | otherwise      = LFReEntrant TopLevel (countFunRepArgs wkr_arity wkr_ty) True ArgUnknown
+                                            -- LFInfo stores post-unarisation arity
+
     ----------- Workers for newtypes --------------
     univ_tvs = dataConUnivTyVars data_con
     ex_tcvs  = dataConExTyCoVars data_con
@@ -608,6 +620,7 @@ mkDataConWorkId wkr_name data_con
                   `setArityInfo` 1      -- Arity 1
                   `setInlinePragInfo`     dataConWrapperInlinePragma
                   `setUnfoldingInfo`      newtype_unf
+                  `setLFInfo`            (LFReEntrant TopLevel 1 True ArgUnknown)
     id_arg1      = mkScaledTemplateLocal 1 (head arg_tys)
     res_ty_args  = mkTyCoVarTys univ_tvs
     newtype_unf  = assertPpr (null ex_tcvs && isSingleton arg_tys)
@@ -618,6 +631,82 @@ mkDataConWorkId wkr_name data_con
                    wrapNewTypeBody tycon res_ty_args (Var id_arg1)
 
 {-
+Note [LFInfo of DataCon workers and wrappers]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As noted in Note [The LFInfo of Imported Ids] in GHC.StgToCmm.Closure, it's
+crucial that saturated data con applications are given an LFInfo of `LFCon`.
+
+Since for data constructors we never serialise the worker and the wrapper (only
+the data type declaration), we never serialise their lambda form info either.
+
+Therefore, when making data constructors workers and wrappers, we construct a
+correct `LFInfo` for them right away, and put it it in the `lfInfo` field of the
+worker/wrapper Id, ensuring that:
+
+  The `lfInfo` field of a DataCon worker or wrapper is always populated with the correct LFInfo.
+
+How do we construct a /correct/ LFInfo for workers and wrappers?
+(Remember: `LFCon` means "a saturated constructor application")
+
+(1) Data constructor workers and wrappers with arity > 0 are unambiguously
+functions and should be given `LFReEntrant`, regardless of the runtime
+relevance of the arguments.
+  - For example, `Just :: a -> Maybe a` is given `LFReEntrant`,
+             and `HNil :: (a ~# '[]) -> HList a` is given `LFReEntrant` too.
+
+(2) A datacon /worker/ with zero arity is trivially fully saturated -- it takes
+no arguments whatsoever (not even zero-width args), so it is given `LFCon`.
+
+(3) Perhaps surprisingly, a datacon /wrapper/ can be an `LFCon`. See Wrinkle (W1) below.
+A datacon /wrapper/ with zero arity must be a fully saturated application of
+the worker to zero-width arguments only (which are dropped after unarisation),
+and therefore is also given `LFCon`.
+
+For example, consider the following data constructors:
+
+  data T1 a where
+    TCon1 :: {-# UNPACK #-} !(a :~: True) -> T1 a
+
+  data T2 a where
+    TCon2 :: {-# UNPACK #-} !() -> T2 a
+
+  data T3 a where
+    TCon3 :: T3 '[]
+
+`TCon1`'s wrapper has a lifted argument, which is non-zero-width, while the
+worker has an unlifted equality argument, which is zero-width.
+
+`TCon2`'s wrapper has a lifted argument, which is non-zero-width, while the
+worker has no arguments.
+
+Wrinkle (W1). Perhaps surprisingly, it is possible for the /wrapper/ to be an
+`LFCon` even though the /worker/ is not. Consider `T3` above. Here is the
+Core representation of the worker and wrapper:
+
+  $WTCon3 :: T3 '[]             -- Wrapper
+  $WTCon3 = TCon3 @[] <Refl>    -- A saturated constructor application: LFCon
+
+  TCon3 :: forall (a :: * -> *). (a ~# []) => T a   -- Worker
+  TCon3 = /\a. \(co :: a~#[]). TCon3 co             -- A function: LFReEntrant
+
+For `TCon1`, both the wrapper and worker will be given `LFReEntrant` since they
+both have arity == 1.
+
+For `TCon2`, the wrapper will be given `LFReEntrant` since it has arity == 1
+while the worker is `LFCon` since its arity == 0
+
+For `TCon3`, the wrapper will be given `LFCon` since its arity == 0 and the
+worker `LFReEntrant` since its arity == 1
+
+One might think we could give *workers* with only zero-width-args the `LFCon`
+LambdaFormInfo, e.g. give `LFCon` to the worker of `TCon1` and `TCon3`.
+However, these workers are unambiguously functions
+-- which makes `LFReEntrant`, the LambdaFormInfo we give them, correct.
+See also the discussion in #23158.
+
+See also the Note [Imported unlifted nullary datacon wrappers must have correct LFInfo]
+in GHC.StgToCmm.Types.
+
 -------------------------------------------------
 --         Data constructor representation
 --
@@ -709,11 +798,18 @@ mkDataConRep dc_bang_opts fam_envs wrap_name data_con
                              -- We need to get the CAF info right here because GHC.Iface.Tidy
                              -- does not tidy the IdInfo of implicit bindings (like the wrapper)
                              -- so it not make sure that the CAF info is sane
+                         `setLFInfo`            wrap_lf_info
 
              -- The signature is purely for passes like the Simplifier, not for
              -- DmdAnal itself; see Note [DmdAnal for DataCon wrappers].
              wrap_sig = mkClosedDmdSig wrap_arg_dmds topDiv
 
+             -- See Note [LFInfo of DataCon workers and wrappers]
+             wrap_lf_info
+               | wrap_arity == 0 = LFCon data_con
+               | otherwise       = LFReEntrant TopLevel (countFunRepArgs wrap_arity wrap_ty) True ArgUnknown
+                                                      -- LFInfo stores post-unarisation arity
+
              wrap_arg_dmds =
                replicate (length theta) topDmd ++ map mk_dmd arg_ibangs
                -- Don't forget the dictionary arguments when building


=====================================
compiler/GHC/Types/RepType.hs
=====================================
@@ -84,12 +84,11 @@ isNvUnaryType ty
   = False
 
 -- INVARIANT: the result list is never empty.
-typePrimRepArgs :: HasDebugCallStack => Type -> [PrimRep]
+typePrimRepArgs :: HasDebugCallStack => Type -> NonEmpty PrimRep
 typePrimRepArgs ty
-  | [] <- reps
-  = [VoidRep]
-  | otherwise
-  = reps
+  = case reps of
+      [] -> VoidRep :| []
+      (x:xs) ->   x :| xs
   where
     reps = typePrimRep ty
 
@@ -124,6 +123,10 @@ unwrapType ty
       | otherwise
       = NS_Done
 
+-- | Count the arity of a function post-unarisation, including zero-width arguments.
+--
+-- The post-unarisation arity may be larger than the arity of the original
+-- function type. See Note [Unarisation].
 countFunRepArgs :: Arity -> Type -> RepArity
 countFunRepArgs 0 _
   = 0



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/94db08351659f012361c44ff1e9d5224f1407f06...f801b269a9475f86cc003a431b1045f3cf0db897

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/94db08351659f012361c44ff1e9d5224f1407f06...f801b269a9475f86cc003a431b1045f3cf0db897
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230505/c0c7ef41/attachment-0001.html>


More information about the ghc-commits mailing list