[Git][ghc/ghc][wip/marge_bot_batch_merge_job] 3 commits: Implement UNPACK support for sum types.

Marge Bot (@marge-bot) gitlab at gitlab.haskell.org
Tue Nov 15 01:20:01 UTC 2022



Marge Bot pushed to branch wip/marge_bot_batch_merge_job at Glasgow Haskell Compiler / GHC


Commits:
2b7d5ccc by Madeline Haraj at 2022-11-14T22:44:17+00:00
Implement UNPACK support for sum types.

This is based on osa's unpack_sums PR from ages past.

The meat of the patch is implemented in dataConArgUnpackSum
and described in Note [UNPACK for sum types].

- - - - -
0da5234c by Andreas Klebinger at 2022-11-14T20:19:48-05:00
Expand on the need to clone local binders.

Fixes #22402.

- - - - -
612cc569 by Krzysztof Gogolewski at 2022-11-14T20:19:49-05:00
Fix :i Constraint printing "type Constraint = Constraint"

Since Constraint became a synonym for CONSTRAINT 'LiftedRep,
we need the same code for handling printing as for the synonym
Type = TYPE 'LiftedRep.
This addresses the same bug as #18594, so I'm reusing the test.

- - - - -


28 changed files:

- compiler/GHC/Core/Make.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/CoreToStg/Prep.hs
- compiler/GHC/HsToCore/Expr.hs
- compiler/GHC/Iface/Syntax.hs
- compiler/GHC/Iface/Type.hs
- compiler/GHC/Stg/Subst.hs
- compiler/GHC/Types/Id/Make.hs
- docs/users_guide/exts/pragmas.rst
- testsuite/tests/ghci/should_run/T18594.script
- testsuite/tests/ghci/should_run/T18594.stdout
- + testsuite/tests/unboxedsums/Makefile
- testsuite/tests/unboxedsums/all.T
- + testsuite/tests/unboxedsums/unpack_sums_1.hs
- + testsuite/tests/unboxedsums/unpack_sums_1.stdout
- + testsuite/tests/unboxedsums/unpack_sums_2.hs
- + testsuite/tests/unboxedsums/unpack_sums_3.hs
- + testsuite/tests/unboxedsums/unpack_sums_4.hs
- + testsuite/tests/unboxedsums/unpack_sums_4.stdout
- + testsuite/tests/unboxedsums/unpack_sums_5.hs
- + testsuite/tests/unboxedsums/unpack_sums_5.stderr
- + testsuite/tests/unboxedsums/unpack_sums_6.hs
- + testsuite/tests/unboxedsums/unpack_sums_6.stdout
- + testsuite/tests/unboxedsums/unpack_sums_7.hs
- + testsuite/tests/unboxedsums/unpack_sums_7.stderr
- + testsuite/tests/unboxedsums/unpack_sums_8.hs
- + testsuite/tests/unboxedsums/unpack_sums_8.stdout
- + testsuite/tests/unboxedsums/unpack_sums_9.hs


Changes:

=====================================
compiler/GHC/Core/Make.hs
=====================================
@@ -23,7 +23,7 @@ module GHC.Core.Make (
         FloatBind(..), wrapFloat, wrapFloats, floatBindings,
 
         -- * Constructing small tuples
-        mkCoreVarTupTy, mkCoreTup, mkCoreUnboxedTuple, mkCoreUbxSum,
+        mkCoreVarTupTy, mkCoreTup, mkCoreUnboxedTuple, mkCoreUnboxedSum,
         mkCoreTupBoxity, unitExpr,
 
         -- * Constructing big tuples
@@ -405,8 +405,8 @@ mkCoreTup cs  = mkCoreBoxedTuple cs   -- non-1-tuples are uniform
 -- | Build an unboxed sum.
 --
 -- Alternative number ("alt") starts from 1.
-mkCoreUbxSum :: Int -> Int -> [Type] -> CoreExpr -> CoreExpr
-mkCoreUbxSum arity alt tys exp
+mkCoreUnboxedSum :: Int -> Int -> [Type] -> CoreExpr -> CoreExpr
+mkCoreUnboxedSum arity alt tys exp
   = assert (length tys == arity) $
     assert (alt <= arity) $
     mkCoreConApps (sumDataCon alt arity)


=====================================
compiler/GHC/Core/Opt/ConstantFold.hs
=====================================
@@ -2097,7 +2097,7 @@ builtinBignumRules =
         x <- isNaturalLiteral a0
         y <- isNaturalLiteral a1
         -- return an unboxed sum: (# (# #) | Natural #)
-        let ret n v = pure $ mkCoreUbxSum 2 n [unboxedUnitTy,naturalTy] v
+        let ret n v = pure $ mkCoreUnboxedSum 2 n [unboxedUnitTy,naturalTy] v
         platform <- getPlatform
         if x < y
             then ret 1 unboxedUnitExpr


=====================================
compiler/GHC/CoreToStg/Prep.hs
=====================================
@@ -112,6 +112,17 @@ The goal of this pass is to prepare for code generation.
     and doing so would be tiresome because then we'd need
     to substitute in types and coercions.
 
+    We need to clone ids for two reasons:
+    + Things associated with labels in the final code must be truly unique in
+      order to avoid labels being shadowed in the final output.
+    + Even binders without info tables like function arguments or alternative
+      bound binders must be unique at least in their type/unique combination.
+      We only emit a single declaration for each binder when compiling to C
+      so if binders are not unique we would either get duplicate declarations
+      or misstyped variables. The later happend in #22402.
+    + We heavily use unique-keyed maps in the backend which can go wrong when
+      ids with the same unique are meant to represent the same variable.
+
 7.  Give each dynamic CCall occurrence a fresh unique; this is
     rather like the cloning step above.
 


=====================================
compiler/GHC/HsToCore/Expr.hs
=====================================
@@ -369,7 +369,7 @@ dsExpr (ExplicitTuple _ tup_args boxity)
                         -- See Note [Don't flatten tuples from HsSyn] in GHC.Core.Make
 
 dsExpr (ExplicitSum types alt arity expr)
-  = mkCoreUbxSum arity alt types <$> dsLExpr expr
+  = mkCoreUnboxedSum arity alt types <$> dsLExpr expr
 
 dsExpr (HsPragE _ prag expr) =
   ds_prag_expr prag expr


=====================================
compiler/GHC/Iface/Syntax.hs
=====================================
@@ -44,7 +44,8 @@ module GHC.Iface.Syntax (
 
 import GHC.Prelude
 
-import GHC.Builtin.Names ( unrestrictedFunTyConKey, liftedTypeKindTyConKey )
+import GHC.Builtin.Names ( unrestrictedFunTyConKey, liftedTypeKindTyConKey,
+                           constraintKindTyConKey )
 import GHC.Types.Unique ( hasKey )
 import GHC.Iface.Type
 import GHC.Iface.Recomp.Binary
@@ -988,7 +989,8 @@ pprIfaceDecl ss (IfaceSynonym { ifName    = tc
 
     -- See Note [Printing type abbreviations] in GHC.Iface.Type
     ppr_tau | tc `hasKey` liftedTypeKindTyConKey ||
-              tc `hasKey` unrestrictedFunTyConKey
+              tc `hasKey` unrestrictedFunTyConKey ||
+              tc `hasKey` constraintKindTyConKey
             = updSDocContext (\ctx -> ctx { sdocPrintTypeAbbreviations = False }) $ ppr tau
             | otherwise = ppr tau
 


=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -846,7 +846,7 @@ Note [Printing type abbreviations]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Normally, we pretty-print
    `TYPE       'LiftedRep` as `Type` (or `*`)
-   `CONSTRAINT 'LiftedRep` as `Constraint` (or `*`)
+   `CONSTRAINT 'LiftedRep` as `Constraint`
    `FUN 'Many`             as `(->)`.
 This way, error messages don't refer to representation polymorphism
 or linearity if it is not necessary.  Normally we'd would represent
@@ -856,14 +856,16 @@ command we specifically expand synonyms (see GHC.Tc.Module.tcRnExpr).
 So here in the pretty-printing we effectively collapse back Type
 and Constraint to their synonym forms.  A bit confusing!
 
-However, when printing the definition of Type or (->) with :info,
+However, when printing the definition of Type, Constraint or (->) with :info,
 this would give confusing output: `type (->) = (->)` (#18594).
 Solution: detect when we are in :info and disable displaying the synonym
 with the SDoc option sdocPrintTypeAbbreviations.
+If you are creating a similar synonym, make sure it is listed in pprIfaceDecl,
+see reference to this Note.
 
 If there will be a need, in the future we could expose it as a flag
--fprint-type-abbreviations or even two separate flags controlling
-TYPE 'LiftedRep and FUN 'Many.
+-fprint-type-abbreviations or even three separate flags controlling
+TYPE 'LiftedRep, CONSTRAINT 'LiftedRep and FUN 'Many.
 -}
 
 -- | Do we want to suppress kind annotations on binders?


=====================================
compiler/GHC/Stg/Subst.hs
=====================================
@@ -12,6 +12,13 @@ import GHC.Utils.Outputable
 import GHC.Utils.Misc
 import GHC.Utils.Panic
 
+-- TODO: This code might make folly of the work done in CorePrep where
+-- we clone local ids in order to ensure *all* local binders are unique.
+-- It's my understanding that here we use "the rapier"/uniqAway which makes up
+-- uniques based on the ids in scope. Which can give the same unique to different
+-- binders as long as they are in different scopes. A guarantee which isn't
+-- strong enough for code generation in general. See Note [CorePrep Overview].
+
 -- | A renaming substitution from 'Id's to 'Id's. Like 'RnEnv2', but not
 -- maintaining pairs of substitutions. Like 'GHC.Core.Subst.Subst', but
 -- with the domain being 'Id's instead of entire 'CoreExpr'.


=====================================
compiler/GHC/Types/Id/Make.hs
=====================================
@@ -56,7 +56,7 @@ import GHC.Core.Coercion
 import GHC.Core.Reduction
 import GHC.Core.Make
 import GHC.Core.FVs     ( mkRuleInfo )
-import GHC.Core.Utils   ( exprType, mkCast, mkDefaultCase )
+import GHC.Core.Utils   ( exprType, mkCast, mkDefaultCase, coreAltsType )
 import GHC.Core.Unfold.Make
 import GHC.Core.SimpleOpt
 import GHC.Core.TyCon
@@ -85,6 +85,7 @@ import GHC.Utils.Panic.Plain
 
 import GHC.Data.FastString
 import GHC.Data.List.SetOps
+import Data.List        ( zipWith4 )
 
 {-
 ************************************************************************
@@ -1028,14 +1029,8 @@ dataConSrcToImplBang bang_opts fam_envs arg_ty
         arg_ty' = case mb_co of
                     { Just redn -> scaledSet arg_ty (reductionReducedType redn)
                     ; Nothing   -> arg_ty }
-  , isUnpackableType bang_opts fam_envs (scaledThing arg_ty')
-  , (rep_tys, _) <- dataConArgUnpack arg_ty'
-  , case unpk_prag of
-      NoSrcUnpack ->
-        bang_opt_unbox_strict bang_opts
-            || (bang_opt_unbox_small bang_opts
-                && rep_tys `lengthAtMost` 1) -- See Note [Unpack one-wide fields]
-      srcUnpack -> isSrcUnpacked srcUnpack
+  , all (not . isNewTyCon . fst) (splitTyConApp_maybe $ scaledThing arg_ty')
+  , shouldUnpackTy bang_opts unpk_prag fam_envs arg_ty'
   = case mb_co of
       Nothing   -> HsUnpack Nothing
       Just redn -> HsUnpack (Just $ reductionCoercion redn)
@@ -1043,7 +1038,6 @@ dataConSrcToImplBang bang_opts fam_envs arg_ty
   | otherwise -- Record the strict-but-no-unpack decision
   = HsStrict
 
-
 -- | Wrappers/Workers and representation following Unpack/Strictness
 -- decisions
 dataConArgRep
@@ -1059,8 +1053,7 @@ dataConArgRep arg_ty HsStrict
   = ([(arg_ty, MarkedStrict)], (seqUnboxer, unitBoxer))
 
 dataConArgRep arg_ty (HsUnpack Nothing)
-  | (rep_tys, wrappers) <- dataConArgUnpack arg_ty
-  = (rep_tys, wrappers)
+  = dataConArgUnpack arg_ty
 
 dataConArgRep (Scaled w _) (HsUnpack (Just co))
   | let co_rep_ty = coercionRKind co
@@ -1097,50 +1090,231 @@ unitBoxer :: Boxer
 unitBoxer = UnitBox
 
 -------------------------
+
+{- Note [UNPACK for sum types]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Suppose we have a data type D, for example:
+    data D = D1 [Int] [Bool]
+           | D2
+
+and another data type which unpacks a field of type D:
+    data U a = MkU {-# UNPACK #-} !D
+                   {-# UNPACK #-} !(a,a)
+                   {-# UNPACK #-} !D
+
+Then the wrapper and worker for MkU have these types
+
+  -- Wrapper
+  $WMkU :: D -> (a,a) -> D -> U a
+
+  -- Worker
+  MkU :: (# (# [Int],[Bool] #) | (# #) #)
+      -> a
+      -> a
+      -> (# (# [Int],[Bool] #) | (# #) #)
+      -> U a
+
+For each unpacked /sum/-type argument, the worker gets one argument.
+But for each unpacked /product/-type argument, the worker gets N
+arguments (here two).
+
+Why treat them differently?  See Note [Why sums and products are treated differently].
+
+The wrapper $WMkU looks like this:
+
+  $WMkU :: D -> (a,a) -> D -> U a
+  $WMkU x1 y x2
+    = case (case x1 of {
+              D1 a b -> (# (# a,b #) | #)
+              D2     -> (# | (# #) #) }) of { x1_ubx ->
+      case y of { (y1, y2) ->
+      case (case x2 of {
+              D1 a b -> (# (# a,b #) | #)
+              D2     -> (# | (# #) #) }) of { x2_ubx ->
+      MkU x1_ubx y1 y2 x2_ubx
+
+Notice the nested case needed for sums.
+
+This different treatment for sums and product is implemented in
+dataConArgUnpackSum and dataConArgUnpackProduct respectively.
+
+Note [Why sums and products are treated differently]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Can we handle sums like products, with each wrapper argument
+occupying multiple argument slots in the worker?  No: for a sum
+type the number of argument slots varies, and that's exactly what
+unboxed sums are designed for.
+
+Can we handle products like sums, with each wrapper argument occupying
+exactly one argument slot (and unboxed tuple) in the worker?  Yes,
+we could.  For example
+   data P = MkP {-# UNPACK #-} !Q
+   data Q = MkQ {-# NOUNPACK #-} !Int
+                {-# NOUNPACK #-} Int
+
+Currently could unpack P thus, taking two slots in the worker
+   $WMkP :: Q -> P
+   $WMkP x = case x of { MkQ a b -> MkP a b }
+   MkP :: Int -> Int -> P  -- Worker
+
+We could instead do this (uniformly with sums)
+
+   $WMkP1 :: Q -> P
+   $WMkP1 x = case (case x of { MkQ a b -> (# a, b #) }) of ubx_x
+              MkP1 ubx_x
+   MkP1 :: (# Int, Int #) -> P  -- Worker
+
+The representation of MkP and MkP1 would be identical (a constructor
+with two fields).
+
+BUT, with MkP (as with every data constructor) we record its argument
+strictness as a bit-vector, actually [StrictnessMark]
+   MkP strictness:  SL
+This information is used in Core to record which fields are sure to
+be evaluated.  (Look for calls to dataConRepStrictness.)  E.g. in Core
+    case v of MkP x y -> ....<here x is known to be evald>....
+
+Alas, with MkP1 this information is hidden by the unboxed pair,
+In Core there will be an auxiliary case expression to take apart the pair:
+    case v of MkP1 xy -> case xy of (# x,y #) -> ...
+And now we have no easy way to know that x is evaluated in the "...".
+
+Fixing this might be possible, but it'd be tricky.  So we avoid the
+problem entirely by treating sums and products differently here.
+-}
+
 dataConArgUnpack
    :: Scaled Type
    ->  ( [(Scaled Type, StrictnessMark)]   -- Rep types
        , (Unboxer, Boxer) )
-
-dataConArgUnpack (Scaled arg_mult arg_ty)
+dataConArgUnpack scaledTy@(Scaled _ arg_ty)
   | Just (tc, tc_args) <- splitTyConApp_maybe arg_ty
-  , Just con <- tyConSingleAlgDataCon_maybe tc
-      -- NB: check for an *algebraic* data type
-      -- A recursive newtype might mean that
-      -- 'arg_ty' is a newtype
-  , let rep_tys = map (scaleScaled arg_mult) $ dataConInstArgTys con tc_args
-  = assert (null (dataConExTyCoVars con))
-      -- Note [Unpacking GADTs and existentials]
-    ( rep_tys `zip` dataConRepStrictness con
-    ,( \ arg_id ->
-       do { rep_ids <- mapM (newLocal (fsLit "unbx")) rep_tys
-          ; let r_mult = idMult arg_id
-          ; let rep_ids' = map (scaleIdBy r_mult) rep_ids
-          ; let unbox_fn body
-                  = mkSingleAltCase (Var arg_id) arg_id
-                             (DataAlt con) rep_ids' body
-          ; return (rep_ids, unbox_fn) }
-     , Boxer $ \ subst ->
-       do { rep_ids <- mapM (newLocal (fsLit "bx") . TcType.substScaledTyUnchecked subst) rep_tys
-          ; return (rep_ids, Var (dataConWorkId con)
-                             `mkTyApps` (substTysUnchecked subst tc_args)
-                             `mkVarApps` rep_ids ) } ) )
+  = assert (not (isNewTyCon tc)) $
+    case tyConDataCons tc of
+      [con] -> dataConArgUnpackProduct scaledTy tc_args con
+      cons  -> dataConArgUnpackSum scaledTy tc_args cons
   | otherwise
   = pprPanic "dataConArgUnpack" (ppr arg_ty)
     -- An interface file specified Unpacked, but we couldn't unpack it
 
-isUnpackableType :: BangOpts -> FamInstEnvs -> Type -> Bool
--- True if we can unpack the UNPACK the argument type
+dataConArgUnpackProduct
+  :: Scaled Type
+  -> [Type]
+  -> DataCon
+  -> ( [(Scaled Type, StrictnessMark)]   -- Rep types
+     , (Unboxer, Boxer) )
+dataConArgUnpackProduct (Scaled arg_mult _) tc_args con =
+  assert (null (dataConExTyCoVars con)) $
+    -- Note [Unpacking GADTs and existentials]
+  let rep_tys = map (scaleScaled arg_mult) $ dataConInstArgTys con tc_args
+  in ( rep_tys `zip` dataConRepStrictness con
+     , ( \ arg_id ->
+         do { rep_ids <- mapM (newLocal (fsLit "unbx")) rep_tys
+            ; let r_mult = idMult arg_id
+            ; let rep_ids' = map (scaleIdBy r_mult) rep_ids
+            ; let unbox_fn body
+                    = mkSingleAltCase (Var arg_id) arg_id
+                               (DataAlt con) rep_ids' body
+            ; return (rep_ids, unbox_fn) }
+       , Boxer $ \ subst ->
+         do { rep_ids <- mapM (newLocal (fsLit "bx") . TcType.substScaledTyUnchecked subst) rep_tys
+            ; return (rep_ids, Var (dataConWorkId con)
+                               `mkTyApps` (substTysUnchecked subst tc_args)
+                               `mkVarApps` rep_ids ) } ) )
+
+dataConArgUnpackSum
+  :: Scaled Type
+  -> [Type]
+  -> [DataCon]
+  -> ( [(Scaled Type, StrictnessMark)]   -- Rep types
+     , (Unboxer, Boxer) )
+dataConArgUnpackSum (Scaled arg_mult arg_ty) tc_args cons =
+  ( [ (sum_ty, MarkedStrict) ] -- The idea: Unpacked variant will
+                               -- be one field only, and the type of the
+                               -- field will be an unboxed sum.
+  , ( unboxer, boxer ) )
+  where
+    !ubx_sum_arity = length cons
+    src_tys = map (\con -> map scaledThing $ dataConInstArgTys con tc_args) cons
+    sum_alt_tys = map mkUbxSumAltTy src_tys
+    sum_ty_unscaled = mkSumTy sum_alt_tys
+    sum_ty = Scaled arg_mult sum_ty_unscaled
+    newLocal' fs = newLocal fs . Scaled arg_mult
+
+    -- See Note [UNPACK for sum types]
+    unboxer :: Unboxer
+    unboxer arg_id = do
+      con_arg_binders <- mapM (mapM (newLocal' (fsLit "unbx"))) src_tys
+      ubx_sum_bndr <- newLocal (fsLit "unbx") sum_ty
+
+      let
+        mk_ubx_sum_alt :: Int -> DataCon -> [Var] -> CoreAlt
+        mk_ubx_sum_alt alt con [bndr] = Alt (DataAlt con) [bndr]
+            (mkCoreUnboxedSum ubx_sum_arity alt sum_alt_tys (Var bndr))
+
+        mk_ubx_sum_alt alt con bndrs =
+          let tuple = mkCoreUnboxedTuple (map Var bndrs)
+           in Alt (DataAlt con) bndrs (mkCoreUnboxedSum ubx_sum_arity alt sum_alt_tys tuple )
+
+        ubx_sum :: CoreExpr
+        ubx_sum =
+          let alts = zipWith3 mk_ubx_sum_alt [ 1 .. ] cons con_arg_binders
+           in Case (Var arg_id) arg_id (coreAltsType alts) alts
+
+        unbox_fn :: CoreExpr -> CoreExpr
+        unbox_fn body =
+          mkSingleAltCase ubx_sum ubx_sum_bndr DEFAULT [] body
+
+      return ([ubx_sum_bndr], unbox_fn)
+
+    boxer :: Boxer
+    boxer = Boxer $ \ subst -> do
+              unboxed_field_id <- newLocal' (fsLit "bx") (TcType.substTy subst sum_ty_unscaled)
+              tuple_bndrs <- mapM (newLocal' (fsLit "bx") . TcType.substTy subst) sum_alt_tys
+
+              let tc_args' = substTys subst tc_args
+                  arg_ty' = substTy subst arg_ty
+
+              con_arg_binders <-
+                mapM (mapM (newLocal' (fsLit "bx")) . map (TcType.substTy subst)) src_tys
+
+              let mk_sum_alt :: Int -> DataCon -> Var -> [Var] -> CoreAlt
+                  mk_sum_alt alt con _ [datacon_bndr] =
+                    ( Alt (DataAlt (sumDataCon alt ubx_sum_arity)) [datacon_bndr]
+                      (Var (dataConWorkId con) `mkTyApps`  tc_args'
+                                              `mkVarApps` [datacon_bndr] ))
+
+                  mk_sum_alt alt con tuple_bndr datacon_bndrs =
+                    ( Alt (DataAlt (sumDataCon alt ubx_sum_arity)) [tuple_bndr] (
+                      Case (Var tuple_bndr) tuple_bndr arg_ty'
+                        [ Alt (DataAlt (tupleDataCon Unboxed (length datacon_bndrs))) datacon_bndrs
+                            (Var (dataConWorkId con) `mkTyApps`  tc_args'
+                                                    `mkVarApps` datacon_bndrs ) ] ))
+
+              return ( [unboxed_field_id],
+                       Case (Var unboxed_field_id) unboxed_field_id arg_ty'
+                            (zipWith4 mk_sum_alt [ 1 .. ] cons tuple_bndrs con_arg_binders) )
+
+-- | Every alternative of an unboxed sum has exactly one field, and we use
+-- unboxed tuples when we need more than one field. This generates an unboxed
+-- tuple when necessary, to be used in unboxed sum alts.
+mkUbxSumAltTy :: [Type] -> Type
+mkUbxSumAltTy [ty] = ty
+mkUbxSumAltTy tys  = mkTupleTy Unboxed tys
+
+shouldUnpackTy :: BangOpts -> SrcUnpackedness -> FamInstEnvs -> Scaled Type -> Bool
+-- True if we ought to unpack the UNPACK the argument type
 -- See Note [Recursive unboxing]
 -- We look "deeply" inside rather than relying on the DataCons
 -- we encounter on the way, because otherwise we might well
 -- end up relying on ourselves!
-isUnpackableType bang_opts fam_envs ty
-  | Just data_con <- unpackable_type ty
-  = ok_con_args emptyNameSet data_con
+shouldUnpackTy bang_opts prag fam_envs ty
+  | Just data_cons <- unpackable_type_datacons (scaledThing ty)
+  = all (ok_con_args emptyNameSet) data_cons && should_unpack data_cons
   | otherwise
   = False
   where
+    ok_con_args :: NameSet -> DataCon -> Bool
     ok_con_args dcs con
        | dc_name `elemNameSet` dcs
        = False
@@ -1153,17 +1327,20 @@ isUnpackableType bang_opts fam_envs ty
          dc_name = getName con
          dcs' = dcs `extendNameSet` dc_name
 
+    ok_arg :: NameSet -> (Scaled Type, HsSrcBang) -> Bool
     ok_arg dcs (Scaled _ ty, bang)
       = not (attempt_unpack bang) || ok_ty dcs norm_ty
       where
         norm_ty = topNormaliseType fam_envs ty
 
+    ok_ty :: NameSet -> Type -> Bool
     ok_ty dcs ty
-      | Just data_con <- unpackable_type ty
-      = ok_con_args dcs data_con
+      | Just data_cons <- unpackable_type_datacons ty
+      = all (ok_con_args dcs) data_cons
       | otherwise
       = True        -- NB True here, in contrast to False at top level
 
+    attempt_unpack :: HsSrcBang -> Bool
     attempt_unpack (HsSrcBang _ SrcUnpack NoSrcStrict)
       = bang_opt_strict_data bang_opts
     attempt_unpack (HsSrcBang _ SrcUnpack SrcStrict)
@@ -1174,16 +1351,40 @@ isUnpackableType bang_opts fam_envs ty
       = bang_opt_strict_data bang_opts -- Be conservative
     attempt_unpack _ = False
 
-    unpackable_type :: Type -> Maybe DataCon
-    -- Works just on a single level
-    unpackable_type ty
-      | Just (tc, _) <- splitTyConApp_maybe ty
-      , Just data_con <- tyConSingleAlgDataCon_maybe tc
-      , null (dataConExTyCoVars data_con)
-          -- See Note [Unpacking GADTs and existentials]
-      = Just data_con
-      | otherwise
-      = Nothing
+    -- Determine whether we ought to unpack a field based on user annotations if present and heuristics if not.
+    should_unpack data_cons =
+      case prag of
+        SrcNoUnpack -> False -- {-# NOUNPACK #-}
+        SrcUnpack   -> True  -- {-# UNPACK #-}
+        NoSrcUnpack -- No explicit unpack pragma, so use heuristics
+          | (_:_:_) <- data_cons
+          -> False -- don't unpack sum types automatically, but they can be unpacked with an explicit source UNPACK.
+          | otherwise
+          -> bang_opt_unbox_strict bang_opts
+             || (bang_opt_unbox_small bang_opts
+                 && rep_tys `lengthAtMost` 1)  -- See Note [Unpack one-wide fields]
+      where (rep_tys, _) = dataConArgUnpack ty
+
+
+-- Given a type already assumed to have been normalized by topNormaliseType,
+-- unpackable_type_datacons ty = Just datacons
+-- iff ty is of the form
+--     T ty1 .. tyn
+-- and T is an algebraic data type (not newtype), in which no data
+-- constructors have existentials, and datacons is the list of data
+-- constructors of T.
+unpackable_type_datacons :: Type -> Maybe [DataCon]
+unpackable_type_datacons ty
+  | Just (tc, _) <- splitTyConApp_maybe ty
+  , not (isNewTyCon tc)
+    -- Even though `ty` has been normalised, it could still
+    -- be a /recursive/ newtype, so we must check for that
+  , Just cons <- tyConDataCons_maybe tc
+  , not (null cons)
+  , all (null . dataConExTyCoVars) cons
+  = Just cons -- See Note [Unpacking GADTs and existentials]
+  | otherwise
+  = Nothing
 
 {-
 Note [Unpacking GADTs and existentials]


=====================================
docs/users_guide/exts/pragmas.rst
=====================================
@@ -845,8 +845,14 @@ flattening the pair. Multi-level unpacking is also supported: ::
 will store two unboxed ``Int#``\ s directly in the ``T`` constructor.
 The unpacker can see through newtypes, too.
 
+Since 9.6.1, data types with multiple constructors can also be unpacked, effectively
+transforming the field into an unboxed sum of the unpackings of each
+constructor (see :extension:`UnboxedSums`).
+
 See also the :ghc-flag:`-funbox-strict-fields` flag, which essentially has the
-effect of adding ``{-# UNPACK #-}`` to every strict constructor field.
+effect of adding ``{-# UNPACK #-}`` to every strict constructor field which is
+of a single-constructor data type. Sum types won't be unpacked automatically
+by this though, only with the explicit pragma.
 
 .. [1]
    In fact, :pragma:`UNPACK` has no effect without :ghc-flag:`-O`, for technical


=====================================
testsuite/tests/ghci/should_run/T18594.script
=====================================
@@ -1,5 +1,6 @@
 :m GHC.Types
 :i (->)
+:i Constraint
 :set -XStarIsType
 :i Type
 :set -XNoStarIsType


=====================================
testsuite/tests/ghci/should_run/T18594.stdout
=====================================
@@ -7,6 +7,9 @@ instance Semigroup b => Semigroup (a -> b) -- Defined in ‘GHC.Base’
 instance Applicative ((->) r) -- Defined in ‘GHC.Base’
 instance Functor ((->) r) -- Defined in ‘GHC.Base’
 instance Monad ((->) r) -- Defined in ‘GHC.Base’
+type Constraint :: *
+type Constraint = CONSTRAINT LiftedRep
+  	-- Defined in ‘GHC.Types’
 type Type :: *
 type Type = TYPE LiftedRep
   	-- Defined in ‘GHC.Types’


=====================================
testsuite/tests/unboxedsums/Makefile
=====================================
@@ -0,0 +1,11 @@
+TOP=../..
+include $(TOP)/mk/boilerplate.mk
+include $(TOP)/mk/test.mk
+
+.PHONY: unpack_sums_7
+
+unpack_sums_7:
+	$(RM) -f unpack_sums_7.o unpack_sums_7.hi
+	'$(TEST_HC)' $(TEST_HC_OPTS) -c unpack_sums_7.hs  -O -dsuppress-all -ddump-simpl | grep -q '\(# |_ #\)'
+	# This is a test to check for the presence of an unboxed sum in the core for a program using UNPACK
+	# on a sum type which is evidence that the field has been correctly unpacked.


=====================================
testsuite/tests/unboxedsums/all.T
=====================================
@@ -40,3 +40,18 @@ test('T22187',[only_ways(llvm_ways)],compile,[''])
 test('T22187_run',[only_ways(llvm_ways)
                   ,unless(arch('x86_64'), skip)],compile_and_run,[''])
 
+test('unpack_sums_1', normal, compile_and_run, ['-O'])
+test('unpack_sums_2', normal, compile, ['-O'])
+test('unpack_sums_3', normal, compile_and_run, ['-O'])
+test('unpack_sums_4', normal, compile_and_run, ['-O'])
+test('unpack_sums_5', normal, compile, ['-O'])
+test('unpack_sums_6', normal, compile_and_run, ['-O'])
+test('unpack_sums_7', [], makefile_test, [])
+test('unpack_sums_8', normal, compile_and_run, [""])
+test('unpack_sums_9', normal, compile, [""])
+
+# TODO: Need to run this in --slow mode only
+# test('sum_api_annots',
+#      [only_ways(['normal']),
+#       extra_files([ "unboxedsums" + str(i) + ".hs" for i in range(1, 12) ])],
+#      makefile_test, [])


=====================================
testsuite/tests/unboxedsums/unpack_sums_1.hs
=====================================
@@ -0,0 +1,22 @@
+module Main where
+
+data T = T1 Int | T2 String
+  deriving (Show, Eq, Ord, Read)
+
+data T' = T' {-# UNPACK #-} !T
+  deriving (Show, Eq, Ord, Read)
+
+t1, t2 :: T
+t1 = T1 123
+t2 = T2 "OK"
+{-# NOINLINE t1 #-}
+{-# NOINLINE t2 #-}
+
+t'1, t'2 :: T'
+t'1 = T' t1
+t'2 = T' t2
+
+main :: IO ()
+main = do
+  print t'1
+  print t'2


=====================================
testsuite/tests/unboxedsums/unpack_sums_1.stdout
=====================================
@@ -0,0 +1,2 @@
+T' (T1 123)
+T' (T2 "OK")


=====================================
testsuite/tests/unboxedsums/unpack_sums_2.hs
=====================================
@@ -0,0 +1,9 @@
+module Lib where
+
+data Number = F {-# UNPACK #-} !Float | I {-# UNPACK #-} !Int
+
+-- This UNPACK was causing a panic:
+--   ghc-stage1: panic! (the 'impossible' happened)
+--     (GHC version 8.1.20160722 for x86_64-unknown-linux):
+--           LocalReg's live-in to graph crG {_grh::F32, _gri::I64}
+data T = T {-# UNPACK #-} !Number


=====================================
testsuite/tests/unboxedsums/unpack_sums_3.hs
=====================================
@@ -0,0 +1,14 @@
+-- Check that we can unpack a strict Maybe Int field.
+import System.Exit
+
+data T = MkT {-# UNPACK #-} !(Maybe Int)
+
+xs = Nothing : [Just n | n <- [1..10]]
+
+ts = map MkT xs
+
+main = if xs == map (\(MkT m) -> m) ts
+  then return ()
+  else do
+    putStrLn "Error in packing and unpacking!"
+    exitFailure


=====================================
testsuite/tests/unboxedsums/unpack_sums_4.hs
=====================================
@@ -0,0 +1,8 @@
+-- Check that nothing goes wrong with UNPACK in recursive case.
+data T = MkT {-# UNPACK #-} !(Maybe T)
+  deriving Show
+
+t :: T
+t = MkT (Just t)
+
+main = print $ take 100 (show t)


=====================================
testsuite/tests/unboxedsums/unpack_sums_4.stdout
=====================================
@@ -0,0 +1 @@
+"MkT (Just (MkT (Just (MkT (Just (MkT (Just (MkT (Just (MkT (Just (MkT (Just (MkT (Just (MkT (Just (M"


=====================================
testsuite/tests/unboxedsums/unpack_sums_5.hs
=====================================
@@ -0,0 +1,11 @@
+module UnpackSumsFive where
+-- Check that failure to unpack is warned about.
+
+data SMaybeT = NoT | JustT {-# UNPACK #-} !T
+  deriving Show
+
+data T = MkT {-# UNPACK #-} !SMaybeT
+  deriving Show
+
+t :: T
+t = MkT (JustT (MkT (JustT (MkT NoT))))


=====================================
testsuite/tests/unboxedsums/unpack_sums_5.stderr
=====================================
@@ -0,0 +1,10 @@
+
+unpack_sums_5.hs:4:22: warning:
+    • Ignoring unusable UNPACK pragma on the first argument of ‘JustT’
+    • In the definition of data constructor ‘JustT’
+      In the data type declaration for ‘SMaybeT’
+
+unpack_sums_5.hs:7:10: warning:
+    • Ignoring unusable UNPACK pragma on the first argument of ‘MkT’
+    • In the definition of data constructor ‘MkT’
+      In the data type declaration for ‘T’


=====================================
testsuite/tests/unboxedsums/unpack_sums_6.hs
=====================================
@@ -0,0 +1,55 @@
+{-# LANGUAGE BangPatterns #-}
+-- This perhaps overly simple test check if code involving
+-- unbacked sums is faster than non-unpacked ones which at
+-- least in this case we expect to be the case.
+-- However this test isn't quite robust, should it fail in
+-- the future we might want to redo it or mark it fragile.
+import Data.Time.Clock
+
+import Data.Int
+import System.Exit
+
+data A = ANothing | AJust {-# UNPACK #-} !Int64
+data B = BNothing | BJust {-# UNPACK #-} !A
+data C = CNothing | CJust {-# UNPACK #-} !B
+data D = DNothing | DJust {-# UNPACK #-} !C
+
+data Unlayered = Unlayered {-# UNPACK #-} !D
+
+data Layered = Layered !(Maybe (Maybe (Maybe (Maybe Int64))))
+
+makeUnlayered :: Int64 -> [Unlayered]
+makeUnlayered n = Unlayered . DJust . CJust . BJust . AJust <$> [1..n]
+
+makeLayered :: Int64 -> [Layered]
+makeLayered n = Layered . Just . Just . Just . Just <$> [1..n]
+
+sumUnlayered :: [Unlayered] -> Int64
+sumUnlayered = go 0
+ where
+  go !n [] = n
+  go !n (w:ws) = case w of
+    Unlayered (DJust (CJust (BJust (AJust i)))) -> go (n+i) ws
+    Unlayered _ -> go n ws
+
+sumLayered :: [Layered] -> Int64
+sumLayered = go 0
+ where
+  go !n [] = n
+  go !n (w:ws) = case w of
+    Layered (Just (Just (Just (Just i)))) -> go (n+i) ws
+    Layered _ -> go n ws
+
+main :: IO ()
+main = do
+  let magnitude = 10000000
+      unlayeredInts = makeUnlayered magnitude
+      layeredInts = makeLayered magnitude
+  now <- getCurrentTime
+  print $ sumUnlayered unlayeredInts
+  unlayeredTime <- getCurrentTime
+  print $ sumLayered layeredInts
+  layeredTime <- getCurrentTime
+  case (unlayeredTime `diffUTCTime` now) < (layeredTime `diffUTCTime` unlayeredTime) of
+    True -> exitSuccess
+    False -> exitFailure


=====================================
testsuite/tests/unboxedsums/unpack_sums_6.stdout
=====================================
@@ -0,0 +1,2 @@
+50000005000000
+50000005000000


=====================================
testsuite/tests/unboxedsums/unpack_sums_7.hs
=====================================
@@ -0,0 +1,10 @@
+-- NB: Compiling this module throws an exception involving Weak# at the end of compilation.
+-- This is unrelated to unpacked sums but we need to include the error in the expected output for the test to pass.
+
+module UnpackedSums7 where
+
+data T = MkT {-# UNPACK #-} !MI
+
+data MI = NoI | JI Int
+
+t = MkT (JI 5)


=====================================
testsuite/tests/unboxedsums/unpack_sums_7.stderr
=====================================
@@ -0,0 +1,2 @@
+Exception during Weak# finalization (ignored): <stdout>: hFlush: resource vanished (Broken pipe)
+Exception during Weak# finalization (ignored): <stdout>: hFlush: resource vanished (Broken pipe)


=====================================
testsuite/tests/unboxedsums/unpack_sums_8.hs
=====================================
@@ -0,0 +1,29 @@
+{-# LANGUAGE UnboxedTuples #-}
+{-# LANGUAGE UnboxedSums #-}
+
+module Main where
+
+data Void
+data WithVoid = LV Void | RV
+data EnumT = L | R
+    deriving Show
+
+data BoxEnum = BoxEnum {-# UNPACK #-} !EnumT
+    deriving Show
+
+l = BoxEnum L
+r = BoxEnum R
+
+main = do
+    print l
+    print r
+
+
+data BoxWithVoid = BoxWithVoid {-# UNPACK #-} !WithVoid
+wv = BoxWithVoid (LV undefined)
+
+data BoxVoid = BoxVoid {-# UNPACK #-} Void
+bv = BoxVoid undefined
+
+data BoxSum = BoxS {-# UNPACK #-} !(# Int | Char #)
+bs = BoxS (# 1 | #)


=====================================
testsuite/tests/unboxedsums/unpack_sums_8.stdout
=====================================
@@ -0,0 +1,2 @@
+BoxEnum L
+BoxEnum R


=====================================
testsuite/tests/unboxedsums/unpack_sums_9.hs
=====================================
@@ -0,0 +1,39 @@
+
+module UnpackedSums8 where
+
+-- Unpack a sum of 100 ints in each constructor
+data Unpackee
+   = U !Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+
+    | O Word Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+        Int Int Int Int Int Int Int Int Int Int
+
+data Box = Box {-# UNPACK #-} !Unpackee
+
+b = Box $ U 0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0
+            0 0 0 0 0 0 0 0 0 0



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/dbefb3f64f262987e9a77432fa946198f221d2ca...612cc569bcb1f77db8d9a9ced006156c0f73e0af

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/dbefb3f64f262987e9a77432fa946198f221d2ca...612cc569bcb1f77db8d9a9ced006156c0f73e0af
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20221114/4d299f06/attachment-0001.html>


More information about the ghc-commits mailing list