[Git][ghc/ghc][master] Take more care with unlifted bindings in the specialiser

Marge Bot (@marge-bot) gitlab at gitlab.haskell.org
Tue Feb 28 23:55:21 UTC 2023



Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC


Commits:
7192ef91 by Simon Peyton Jones at 2023-02-28T18:54:59-05:00
Take more care with unlifted bindings in the specialiser

As #22998 showed, we were floating an unlifted binding to top
level, which breaks a Core invariant.

The fix is easy, albeit a little bit conservative.  See
Note [Care with unlifted bindings] in GHC.Core.Opt.Specialise

- - - - -


5 changed files:

- compiler/GHC/Core.hs
- compiler/GHC/Core/Opt/Specialise.hs
- + testsuite/tests/simplCore/should_run/T22998.hs
- + testsuite/tests/simplCore/should_run/T22998.stdout
- testsuite/tests/simplCore/should_run/all.T


Changes:

=====================================
compiler/GHC/Core.hs
=====================================
@@ -366,68 +366,32 @@ a Coercion, (sym c).
 
 Note [Core letrec invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The right hand sides of all top-level and recursive @let at s
-/must/ be of lifted type (see "Type#type_classification" for
-the meaning of /lifted/ vs. /unlifted/).
+The Core letrec invariant:
 
-There is one exception to this rule, top-level @let at s are
-allowed to bind primitive string literals: see
-Note [Core top-level string literals].
+    The right hand sides of all
+      /top-level/ or /recursive/
+    bindings must be of lifted type
 
-Note [Core top-level string literals]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-As an exception to the usual rule that top-level binders must be lifted,
-we allow binding primitive string literals (of type Addr#) of type Addr# at the
-top level. This allows us to share string literals earlier in the pipeline and
-crucially allows other optimizations in the Core2Core pipeline to fire.
-Consider,
+    There is one exception to this rule, top-level @let at s are
+    allowed to bind primitive string literals: see
+    Note [Core top-level string literals].
 
-  f n = let a::Addr# = "foo"#
-        in \x -> blah
+See "Type#type_classification" in GHC.Core.Type
+for the meaning of "lifted" vs. "unlifted").
 
-In order to be able to inline `f`, we would like to float `a` to the top.
-Another option would be to inline `a`, but that would lead to duplicating string
-literals, which we want to avoid. See #8472.
-
-The solution is simply to allow top-level unlifted binders. We can't allow
-arbitrary unlifted expression at the top-level though, unlifted binders cannot
-be thunks, so we just allow string literals.
-
-We allow the top-level primitive string literals to be wrapped in Ticks
-in the same way they can be wrapped when nested in an expression.
-CoreToSTG currently discards Ticks around top-level primitive string literals.
-See #14779.
-
-Also see Note [Compilation plan for top-level string literals].
-
-Note [Compilation plan for top-level string literals]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Here is a summary on how top-level string literals are handled by various
-parts of the compilation pipeline.
-
-* In the source language, there is no way to bind a primitive string literal
-  at the top level.
-
-* In Core, we have a special rule that permits top-level Addr# bindings. See
-  Note [Core top-level string literals]. Core-to-core passes may introduce
-  new top-level string literals.
-
-* In STG, top-level string literals are explicitly represented in the syntax
-  tree.
-
-* A top-level string literal may end up exported from a module. In this case,
-  in the object file, the content of the exported literal is given a label with
-  the _bytes suffix.
+For the non-top-level, non-recursive case see Note [Core let-can-float invariant].
 
 Note [Core let-can-float invariant]
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The let-can-float invariant:
 
-    The right hand side of a non-recursive 'Let'
-    /may/ be of unlifted type, but only if
+    The right hand side of a /non-top-level/, /non-recursive/ binding
+    may be of unlifted type, but only if
     the expression is ok-for-speculation
     or the 'Let' is for a join point.
 
+    (For top-level or recursive lets see Note [Core letrec invariant].)
+
 This means that the let can be floated around
 without difficulty. For example, this is OK:
 
@@ -466,6 +430,53 @@ we need to allow lots of things in the arguments of a call.
 
 TL;DR: we relaxed the let/app invariant to become the let-can-float invariant.
 
+Note [Core top-level string literals]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+As an exception to the usual rule that top-level binders must be lifted,
+we allow binding primitive string literals (of type Addr#) of type Addr# at the
+top level. This allows us to share string literals earlier in the pipeline and
+crucially allows other optimizations in the Core2Core pipeline to fire.
+Consider,
+
+  f n = let a::Addr# = "foo"#
+        in \x -> blah
+
+In order to be able to inline `f`, we would like to float `a` to the top.
+Another option would be to inline `a`, but that would lead to duplicating string
+literals, which we want to avoid. See #8472.
+
+The solution is simply to allow top-level unlifted binders. We can't allow
+arbitrary unlifted expression at the top-level though, unlifted binders cannot
+be thunks, so we just allow string literals.
+
+We allow the top-level primitive string literals to be wrapped in Ticks
+in the same way they can be wrapped when nested in an expression.
+CoreToSTG currently discards Ticks around top-level primitive string literals.
+See #14779.
+
+Also see Note [Compilation plan for top-level string literals].
+
+Note [Compilation plan for top-level string literals]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Here is a summary on how top-level string literals are handled by various
+parts of the compilation pipeline.
+
+* In the source language, there is no way to bind a primitive string literal
+  at the top level.
+
+* In Core, we have a special rule that permits top-level Addr# bindings. See
+  Note [Core top-level string literals]. Core-to-core passes may introduce
+  new top-level string literals.
+
+  See GHC.Core.Utils.exprIsTopLevelBindable, and exprIsTickedString
+
+* In STG, top-level string literals are explicitly represented in the syntax
+  tree.
+
+* A top-level string literal may end up exported from a module. In this case,
+  in the object file, the content of the exported literal is given a label with
+  the _bytes suffix.
+
 Note [NON-BOTTOM-DICTS invariant]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 It is a global invariant (not checkable by Lint) that


=====================================
compiler/GHC/Core/Opt/Specialise.hs
=====================================
@@ -27,7 +27,7 @@ import GHC.Core
 import GHC.Core.Make      ( mkLitRubbish )
 import GHC.Core.Unify     ( tcMatchTy )
 import GHC.Core.Rules
-import GHC.Core.Utils     ( exprIsTrivial
+import GHC.Core.Utils     ( exprIsTrivial, exprIsTopLevelBindable
                           , mkCast, exprType
                           , stripTicksTop, mkInScopeSetBndrs )
 import GHC.Core.FVs
@@ -1515,7 +1515,10 @@ specBind top_lvl env (NonRec fn rhs) do_body
                          = [mkDB $ NonRec b r | (b,r) <- pairs]
                            ++ fromOL dump_dbs
 
-       ; if float_all then
+             can_float_this_one = exprIsTopLevelBindable rhs (idType fn)
+             -- exprIsTopLevelBindable: see Note [Care with unlifted bindings]
+
+       ; if float_all && can_float_this_one then
              -- Rather than discard the calls mentioning the bound variables
              -- we float this (dictionary) binding along with the others
               return ([], body', all_free_uds `snocDictBinds` final_binds)
@@ -1876,6 +1879,28 @@ even in the case that @x = False@! Instead, we add a dummy 'Void#' argument to
 the specialisation '$sfInt' ($sfInt :: Void# -> Array# Int) in order to
 preserve laziness.
 
+Note [Care with unlifted bindings]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Consider (#22998)
+    f x = let x::ByteArray# = <some literal>
+              n::Natural    = NB x
+          in wombat @192827 (n |> co)
+where
+  co :: Natural ~ KnownNat 192827
+  wombat :: forall (n:Nat). KnownNat n => blah
+
+Left to itself, the specialiser would float the bindings for `x` and `n` to top
+level, so we can specialise `wombat`.  But we can't have a top-level ByteArray#
+(see Note [Core letrec invariant] in GHC.Core).  Boo.
+
+This is pretty exotic, so we take a simple way out: in specBind (the NonRec
+case) do not float the binding itself unless it satisfies exprIsTopLevelBindable.
+This is conservative: maybe the RHS of `x` has a free var that would stop it
+floating to top level anyway; but that is hard to spot (since we don't know what
+the non-top-level in-scope binders are) and rare (since the binding must satisfy
+Note [Core let-can-float invariant] in GHC.Core).
+
+
 Note [Specialising Calls]
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 Suppose we have a function with a complicated type:


=====================================
testsuite/tests/simplCore/should_run/T22998.hs
=====================================
@@ -0,0 +1,10 @@
+{-# LANGUAGE DataKinds #-}
+module Main where
+
+import Data.Proxy (Proxy(Proxy))
+import GHC.TypeLits (natVal)
+
+main :: IO ()
+main = print x
+  where
+    x = natVal @18446744073709551616 Proxy + natVal @18446744073709551616 Proxy


=====================================
testsuite/tests/simplCore/should_run/T22998.stdout
=====================================
@@ -0,0 +1 @@
+36893488147419103232


=====================================
testsuite/tests/simplCore/should_run/all.T
=====================================
@@ -108,3 +108,5 @@ test('T21575', normal, compile_and_run, ['-O'])
 test('T21575b', [], multimod_compile_and_run, ['T21575b', '-O'])
 test('T20836', normal, compile_and_run, ['-O0']) # Should not time out; See #20836
 test('T22448', normal, compile_and_run, ['-O1'])
+test('T22998', normal, compile_and_run, ['-O0 -fspecialise -dcore-lint'])
+



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7192ef91c855e1fae6997f75cfde76aafd0b4bcf

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7192ef91c855e1fae6997f75cfde76aafd0b4bcf
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230228/d3de20f8/attachment-0001.html>


More information about the ghc-commits mailing list