[Git][ghc/ghc][wip/T17521] Rewrite and document

Jaro Reinders (@Noughtmare) gitlab at gitlab.haskell.org
Fri Aug 11 11:19:58 UTC 2023



Jaro Reinders pushed to branch wip/T17521 at Glasgow Haskell Compiler / GHC


Commits:
8d90cc76 by Jaro Reinders at 2023-08-11T13:19:50+02:00
Rewrite and document

- - - - -


2 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/Utils.hs


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -399,11 +399,9 @@ At top level, however, there are three exceptions to this rule:
              S1 = S1
       We allow this top-level unlifted binding to exist.
 
-(TL3) A boxed top-level binding is allowed to bind the application of a data
-      constructor worker to trivial arguments. These bindings are guaranteed
-      to not require any evaluation and can thus be compiled to static data.
-      Unboxed top-level bindings are still not allowed because references
-      to them might have to be pointers.
+(TL3) A boxed top-level binding is allowed to bind the application of
+      unlifted data constructor values.
+      See Note [Core top-level unlifted data-con values].
 
 Note [Core let-can-float invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -498,6 +496,46 @@ parts of the compilation pipeline.
   in the object file, the content of the exported literal is given a label with
   the _bytes suffix.
 
+Note [Core top-level unlifted data-con values]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As another exception to the usual rule that top-level binders must be lifted,
+we allow binding unlifted data constructor values at the top level. This allows
+us to store these values directly as data in the binary that we produce, instead
+of allocating them potentially many times if they're inside a tight loop.
+
+However, we have to be very careful that we only allow data constructors that
+are really values.
+
+* We only consider data constructor workers and not wrappers, because wrappers
+  are generally not fully evaluated. See Note [The need for a wrapper].
+
+* Even data constructor workers might still be expanded by the STG rewriter to
+  perform some work, if they have arguments that are marked strict.
+  See Note [Data-con worker strictness].
+
+* If the data constructor has unlifted arguments, then those could cause further
+  evaluation to be necessary, unless they are fully evaluated data constructor
+  values themselves.
+
+Furthermore, there is another complication. The data constructor worker may be
+applied to a variable which is defined in another module, or worse, in an
+hs-boot file. So, we cannot always get all the information we need and even for
+variables defined in the same module it might still be hard or computationally
+expensive to collect the necessary information.
+
+So, for the first incarnation of this feature we choose very restrictive
+conditions, which are still useful in practice. We allow top-level unlifted
+data constructor workers if they are applied to arguments that are one of:
+
+* A Literal
+
+* Any expressions of lifted type, but only if that argument is not
+  marked strict.
+
+* An unlifted variable
+
+In the future, we hope to relax this condition (#23811).
+
 Note [NON-BOTTOM-DICTS invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It is a global invariant (not checkable by Lint) that


=====================================
compiler/GHC/Core/Utils.hs
=====================================
@@ -2017,43 +2017,65 @@ exprIsTopLevelBindable expr ty
     -- consequently we must use 'mightBeUnliftedType' rather than 'isUnliftedType',
     -- as the latter would panic.
   || exprIsTickedString expr
-  || isBoxedType ty && exprIsDefinitelyWHNF expr
+  || isBoxedType ty && exprIsDataConValue expr
 
 -- | Check if the expression is zero or more Ticks wrapped around a literal
 -- string.
 exprIsTickedString :: CoreExpr -> Bool
 exprIsTickedString = isJust . exprIsTickedString_maybe
 
--- | This function is a very conservative approximation that only deals with
--- trivial expressions and datacon worker applications. Additionally, this
--- conservatively rejects data constructors that are applied to lifted variables
--- in argument positions that are marked strict.
--- In the future, we hope to loosen this requirement (See #23811).
-exprIsDefinitelyWHNF :: CoreExpr -> Bool
-exprIsDefinitelyWHNF x = is_definitely_whnf False x
+-- | This function checks if its argument is a data constructor value (WHNF).
+-- We use this function to determine if unlifted expressions can be floated
+-- to the top level.
+-- See Note [Core top-level unlifted data-con values] in GHC.Core
+exprIsDataConValue :: CoreExpr -> Bool
+exprIsDataConValue x = is_datacon_app x
   where
-    -- The first argument of 'is_definitely_whnf' indicates whether the
-    -- expression is forced due to being a strict argument of an enclosing data
-    -- constructor. If that is the case and the expression is a variable, then
-    -- we are only sure it is fully evaluated if the variable has an unlifted
-    -- type.
-    is_definitely_whnf True v at Var{} = definitelyUnliftedType (exprType v)
-    is_definitely_whnf _ x
+    -- We cannot use @exprIsHNF@ because it does not handle
+    -- data constructor worker strictness properly.
+    -- See Note [Data-con worker strictness] in GHC.Core.DataCon.
+    is_datacon_app x
       | (Var v, xs) <- collect_args x
       , Just dc <- isDataConWorkId_maybe v
-      = and (zipWith is_definitely_whnf (map isMarkedStrict (dataConRepStrictness dc)) xs)
-      | otherwise = exprIsTrivial x
-
-    -- This is slightly different from 'collectArgs' as it looks through ticks and casts
-    -- and it only collects run-time arguments.
+      = and $
+          zipWith3 arg_ok
+            (map (typeLevity_maybe . scaledThing) (dataConRepArgTys dc))
+            (map isMarkedStrict (dataConRepStrictness dc))
+            xs
+    is_datacon_app _ = False
+
+    arg_ok _ _ Lit{} = True
+    arg_ok (Just Lifted) False _ = True
+    arg_ok (Just Unlifted) _ Var{} = True
+    -- We allow nested constructor applications; we trust that they will
+    -- be ANFised in the future. E.g. given the unlifted data type:
+    --
+    --    data UNat :: UnliftedType = UZero | USucc UNat
+    --
+    -- If we lift an expression to the top level we might get:
+    --
+    --    foo = USucc (USucc UNil)
+    --
+    -- But that needs to be expanded (ANFised) to:
+    --
+    --    foo = USucc foo1
+    --    foo1 = USucc foo2
+    --    foo2 = UNil
+    --
+    -- This does always seem to happen and is checked by the Core linter.
+    arg_ok (Just Unlifted) _ x = is_datacon_app x
+    arg_ok _ _ _ = False
+
+    -- This is slightly different from 'collectArgs' as it looks through ticks
+    -- and casts and it only collects run-time arguments.
     collect_args expr = go expr []
       where
         go (App f a)  as
           | isRuntimeArg a = go f (a:as)
-          | otherwise      = go f as
-        go (Tick _ e) as   = go e as
-        go (Cast e _) as   = go e as
-        go e          as   = (e, as)
+          | otherwise = go f as
+        go (Tick _ e) as = go e as
+        go (Cast e _) as = go e as
+        go e          as = (e, as)
 
 -- | Extract a literal string from an expression that is zero or more Ticks
 -- wrapped around a literal string. Returns Nothing if the expression has a



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/8d90cc76d4b3d2156bddf35a48ec8906ca2e7a9c

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/8d90cc76d4b3d2156bddf35a48ec8906ca2e7a9c
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230811/b80d973f/attachment-0001.html>


More information about the ghc-commits mailing list