[Git][ghc/ghc][wip/T23536-teo] Make template-haskell a stage1 package

Teo Camarasu (@teo) gitlab at gitlab.haskell.org
Wed Apr 10 13:21:51 UTC 2024



Teo Camarasu pushed to branch wip/T23536-teo at Glasgow Haskell Compiler / GHC


Commits:
d2a7ca25 by Teo Camarasu at 2024-04-10T14:21:32+01:00
Make template-haskell a stage1 package

Promoting template-haskell from a stage0 to a stage1 package means that
we can much more easily refactor template-haskell.

We implement this by vendoring the in-tree `template-haskell` into
`ghc-boot` thus allowing `stage1:ghc` to depend on the new interface of
the library including the `Binary` instances.

This is controlled by a `bootstrap-th` cabal flag on `ghc-boot`.

When building `template-haskell` modules as part of this vendoring we do
not have access to quote syntax, so we cannot use variable quote
notation (`'Just`). So we either replace these with hand-written `Name`s
or hide the code behind CPP.

We can remove the `th_hack` from hadrian, which was required when
building stage0 packages using the in-tree `template-haskell` library.

For more details see Note [Bootstrapping Template Haskell].

Resolves #23536

Co-Authored-By: Sebastian Graf <sgraf1337 at gmail.com>

- - - - -


11 changed files:

- compiler/GHC/Tc/Gen/Splice.hs
- compiler/ghc.cabal.in
- hadrian/src/Rules/Dependencies.hs
- hadrian/src/Rules/ToolArgs.hs
- hadrian/src/Settings/Default.hs
- hadrian/src/Settings/Packages.hs
- libraries/ghc-boot/ghc-boot.cabal.in
- libraries/ghci/ghci.cabal.in
- libraries/template-haskell/Language/Haskell/TH/Syntax.hs
- libraries/template-haskell/vendored-filepath/System/FilePath/Posix.hs
- libraries/template-haskell/vendored-filepath/System/FilePath/Windows.hs


Changes:

=====================================
compiler/GHC/Tc/Gen/Splice.hs
=====================================
@@ -2916,3 +2916,98 @@ tcGetInterp = do
    case hsc_interp hsc_env of
       Nothing -> liftIO $ throwIO (InstallationError "Template haskell requires a target code interpreter")
       Just i  -> pure i
+
+-- Note [Bootstrapping Template Haskell]
+-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- Staged Metaprogramming as implemented in Template Haskell introduces a whole
+-- new dimension of staging to the already staged bootstrapping process.
+-- The `template-haskell` library plays a crucial role in this process.
+--
+-- Nomenclature:
+--
+--   boot/stage0 compiler: An already released compiler used to compile GHC
+--   stage(N+1) compiler: The result of compiling GHC from source with stage(N)
+--       Recall that any code compiled by the stage1 compiler should be binary
+--       identical to the same code compiled by later stages.
+--   boot TH: the `template-haskell` that comes with (and is linked to) the
+--       boot/stage0 compiler
+--   in-tree TH: the `template-haskell` library that lives in GHC's repository.
+--       Recall that building in-tree TH with the stage1 compiler yields a binary
+--       that is identical to the in-tree TH compiled by stage2.
+--   boot library: A library such as bytestring or containers that GHC depends on.
+--       CONFUSINGLY, we build these libraries with the boot compiler as well as
+--       the stage1 compiler; thus the "boot" in boot library does not refer to a
+--       stage.
+--
+-- Here is how we bootstrap `template-haskell` in tandem with GHC:
+--
+--  1. Link the stage1 compiler against the boot TH library.
+--  2. When building the stage1 compiler, vendor the parts relevant to serialising
+--     the (new, in-tree) TH AST into `ghc-boot`, thus shadowing definitions in the
+--     implicitly linked boot TH.
+--  3. Build the in-tree TH with the stage1 compiler.
+--  4. Build and link the stage2 compiler against the in-tree TH.
+--
+-- Observations:
+--
+--  A. The vendoring in (2) means that the fully qualified name of the in-tree TH
+--     AST will be, e.g., `ghc-boot:...VarE`, not `template-haskell:...VarE`.
+--     That is OK, because we need it just for the `Binary` instance, which does
+--     not depend on the fully qualified name of the type to serialise!
+--     Importantly, Note [Hard-wiring in-tree template-haskell for desugaring quotes]
+--     is unaffected, because the desugaring refers to names in stage1 TH, i.e.,
+--     the next compiler stage.
+--
+-- (Rejected) alternative designs:
+--
+--  1b. Build the in-tree TH with the stage0 compiler and link the stage1 compiler
+--      against it. This is what we did until Apr 24 and it is problematic (#23536):
+--        * (It rules out using TH in GHC, for example to derive GHC.Core.Map types,
+--           because the boot compiler expects the boot TH AST in splices, but, e.g.,
+--           splice functions in GHC.Core.Map.TH would return the in-tree TH AST.
+--           However, at the moment, we are not using TH in GHC anyway.)
+--        * Ultimately, we must link the stage1 compiler against a single version
+--          of template-haskell.
+--          (SG: at least I think that is the case. Can someone verify? Otherwise
+--               it would be conceivable to build just ghc against in-tree TH and
+--               keep the boot libraries built against boot TH.)
+--          (TC: this might be unblocked in the future if we have a separate
+--               package DB for splices.)
+--        * If the single version is the in-tree TH, we have to recompile all boot
+--          libraries (e.g. bytestring, containers) with this new TH version.
+--        * But the boot libraries must *not* be built against a non-boot TH version.
+--          The reason is Note [Hard-wiring in-tree template-haskell for desugaring quotes]:
+--          The boot compiler will desugar quotes wrt. names in the boot TH version.
+--          A quote like `[| unsafePackLenLiteral |]` in bytestring will desugar
+--          to `varE (mkNameS "unsafePackLenLiteral")`, and all
+--          those smart constructors refer to locations in *boot TH*, because that
+--          is all that the boot GHC knows about.
+--          If the in-tree TH were to move or rename the definition of
+--          `mkNameS`, the boot compiler would report a linker error when
+--          compiling bytestring.
+--        * (Stopping to use quotes in bytestring is no solution, either, because
+--           the `Lift` type class is wired-in as well.
+--           Only remaining option: provide an entirely TH-less variant of every
+--           boot library. That would place a huge burden on maintainers and is
+--           thus rejected.)
+--        * We have thus made it impossible to refactor in-tree TH.
+--          This problem was discussed in #23536.
+--  2b. Instead of vendoring, build a CPP'd version of in-tree TH by the boot
+--      compiler under a changed package-id, e.g., `template-haskell-next`, and
+--      build stage1 GHC against that.
+--      SG: Why not?
+
+-- Note [Hard-wiring in-tree template-haskell for desugaring quotes]
+-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- To desugar Template Haskell quotes, GHC needs to wire in a bunch of Names in the
+-- `template-haskell` library as Note [Known-key names], in GHC.Builtin.Names.TH.
+-- Consider
+-- > foo :: Q Exp
+-- > foo = [| unwords ["hello", "world"] |]
+-- this desugars to Core that looks like this
+-- > varE (mkNameS "unwords") `appE` listE [litE (stringE "hello"), litE (stringE "world")]
+-- And all these smart constructors are known-key.
+-- NB: Since the constructors are known-key, it is impossible to link this program
+-- against another template-haskell library in which, e.g., `varE` was moved into a
+-- different module. So effectively, GHC is hard-wired against the in-tree
+-- template-haskell library.


=====================================
compiler/ghc.cabal.in
=====================================
@@ -115,7 +115,6 @@ Library
                    containers >= 0.6.2.1 && < 0.8,
                    array      >= 0.1 && < 0.6,
                    filepath   >= 1   && < 1.6,
-                   template-haskell == 2.22.*,
                    hpc        >= 0.6 && < 0.8,
                    transformers >= 0.5 && < 0.7,
                    exceptions == 0.10.*,


=====================================
hadrian/src/Rules/Dependencies.hs
=====================================
@@ -35,7 +35,10 @@ extra_dependencies =
 
   where
     th_internal = (templateHaskell, "Language.Haskell.TH.Lib.Internal")
-    dep (p1, m1) (p2, m2) s = do
+    dep (p1, m1) (p2, m2) s =
+      -- We use the boot compiler's `template-haskell` library when building stage0,
+      -- so we don't need to register dependencies.
+      if isStage0 s then pure [] else do
         let context = Context s p1 (error "extra_dependencies: way not set") (error "extra_dependencies: iplace not set")
         ways <- interpretInContext context getLibraryWays
         mapM (\way -> (,) <$> path s way p1 m1 <*> path s way p2 m2) (S.toList ways)


=====================================
hadrian/src/Rules/ToolArgs.hs
=====================================
@@ -85,25 +85,13 @@ multiSetup pkg_s = do
       need (srcs ++ gens)
       let rexp m = ["-reexported-module", m]
       let hidir = root </> "interfaces" </> pkgPath p
-      writeFile' (resp_file root p) (intercalate "\n" (th_hack arg_list
+      writeFile' (resp_file root p) (intercalate "\n" (arg_list
                                                       ++  modules cd
                                                       ++ concatMap rexp (reexportModules cd)
                                                       ++ ["-outputdir", hidir]))
       return (resp_file root p)
 
 
-    -- The template-haskell package is compiled with -this-unit-id=template-haskell but
-    -- everything which depends on it depends on `-package-id-template-haskell-2.17.0.0`
-    -- and so the logic for detetecting which home-units depend on what is defeated.
-    -- The workaround here is just to rewrite all the `-package-id` arguments to
-    -- point to `template-haskell` instead which works for the multi-repl case.
-    -- See #20887
-    th_hack :: [String] -> [String]
-    th_hack ((isPrefixOf "-package-id template-haskell" -> True) : xs) = "-package-id" : "template-haskell" : xs
-    th_hack (x:xs) = x : th_hack xs
-    th_hack [] = []
-
-
 toolRuleBody :: FilePath -> Action ()
 toolRuleBody fp = do
   mm <- dirMap
@@ -158,7 +146,6 @@ toolTargets = [ binary
               -- , ghc     -- # depends on ghc library
               -- , runGhc  -- # depends on ghc library
               , ghcBoot
-              , ghcBootTh
               , ghcPlatform
               , ghcToolchain
               , ghcToolchainBin
@@ -172,7 +159,6 @@ toolTargets = [ binary
               , mtl
               , parsec
               , time
-              , templateHaskell
               , text
               , transformers
               , semaphoreCompat


=====================================
hadrian/src/Settings/Default.hs
=====================================
@@ -93,7 +93,6 @@ stage0Packages = do
              , ghc
              , runGhc
              , ghcBoot
-             , ghcBootTh
              , ghcPlatform
              , ghcHeap
              , ghcToolchain
@@ -108,7 +107,6 @@ stage0Packages = do
              , parsec
              , semaphoreCompat
              , time
-             , templateHaskell
              , text
              , transformers
              , unlit
@@ -143,6 +141,7 @@ stage1Packages = do
         , deepseq
         , exceptions
         , ghc
+        , ghcBootTh
         , ghcBignum
         , ghcCompact
         , ghcExperimental
@@ -156,6 +155,7 @@ stage1Packages = do
         , pretty
         , rts
         , semaphoreCompat
+        , templateHaskell
         , stm
         , unlit
         , xhtml


=====================================
hadrian/src/Settings/Packages.hs
=====================================
@@ -121,6 +121,10 @@ packageArgs = do
           , builder (Cc CompileC) ? (not <$> flag CcLlvmBackend) ?
             input "**/cbits/atomic.c"  ? arg "-Wno-sync-nand" ]
 
+        -------------------------------- ghcBoot ------------------------------
+        , package ghcBoot ?
+            builder (Cabal Flags) ? (stage0 `cabalFlag` "bootstrap-th")
+
         --------------------------------- ghci ---------------------------------
         , package ghci ? mconcat
           [


=====================================
libraries/ghc-boot/ghc-boot.cabal.in
=====================================
@@ -35,6 +35,15 @@ source-repository head
     location: https://gitlab.haskell.org/ghc/ghc.git
     subdir:   libraries/ghc-boot
 
+Flag bootstrap-th
+        Description:
+          Enabled when building the stage1 compiler in order to vendor the in-tree
+          `template-haskell` library, while allowing dependencies to depend on the
+          boot `template-haskell` library.
+          See Note [Bootstrapping Template Haskell]
+        Default: False
+        Manual: True
+
 Library
     default-language: Haskell2010
     other-extensions: DeriveGeneric, RankNTypes, ScopedTypeVariables
@@ -56,13 +65,6 @@ Library
             GHC.UniqueSubdir
             GHC.Version
 
-    -- reexport modules from ghc-boot-th so that packages don't have to import
-    -- both ghc-boot and ghc-boot-th. It makes the dependency graph easier to
-    -- understand and to refactor.
-    reexported-modules:
-              GHC.LanguageExtensions.Type
-            , GHC.ForeignSrcLang.Type
-            , GHC.Lexeme
 
     -- reexport platform modules from ghc-platform
     reexported-modules:
@@ -81,7 +83,49 @@ Library
                    filepath   >= 1.3 && < 1.6,
                    deepseq    >= 1.4 && < 1.6,
                    ghc-platform >= 0.1,
-                   ghc-boot-th == @ProjectVersionMunged@
+    if flag(bootstrap-th)
+      cpp-options: -DBOOTSTRAP_TH
+      build-depends:
+              ghc-prim
+              , pretty
+      -- we vendor ghc-boot-th and template-haskell while bootstrapping TH.
+      -- This is to avoid having two copies of ghc-boot-th and template-haskell
+      -- in the build graph: one from the boot compiler and the in-tree one.
+      hs-source-dirs: . ../ghc-boot-th ../template-haskell ../template-haskell/vendored-filepath
+      exposed-modules:
+              GHC.LanguageExtensions.Type
+            , GHC.ForeignSrcLang.Type
+            , GHC.Lexeme
+            , Language.Haskell.TH
+            , Language.Haskell.TH.Lib
+            , Language.Haskell.TH.Ppr
+            , Language.Haskell.TH.PprLib
+            , Language.Haskell.TH.Quote
+            , Language.Haskell.TH.Syntax
+            , Language.Haskell.TH.LanguageExtensions
+            , Language.Haskell.TH.CodeDo
+            , Language.Haskell.TH.Lib.Internal
+
+      other-modules:
+              Language.Haskell.TH.Lib.Map
+            , System.FilePath
+            , System.FilePath.Posix
+            , System.FilePath.Windows
+    else
+      hs-source-dirs: .
+      build-depends:
+              ghc-boot-th         == @ProjectVersionMunged@
+            , template-haskell    == 2.22.0.0
+      -- reexport modules from ghc-boot-th and template-haskell so that packages
+      -- don't have to import all of ghc-boot, ghc-boot-th and template-haskell.
+      -- It makes the dependency graph easier to understand and to refactor
+      -- and reduces the amount of cabal flags we need to use for bootstrapping TH.
+      reexported-modules:
+              GHC.LanguageExtensions.Type
+            , GHC.ForeignSrcLang.Type
+            , GHC.Lexeme
+            , Language.Haskell.TH
+            , Language.Haskell.TH.Syntax
     if !os(windows)
         build-depends:
                    unix       >= 2.7 && < 2.9


=====================================
libraries/ghci/ghci.cabal.in
=====================================
@@ -84,7 +84,6 @@ library
         filepath         >= 1.4 && < 1.6,
         ghc-boot         == @ProjectVersionMunged@,
         ghc-heap         == @ProjectVersionMunged@,
-        template-haskell == 2.22.*,
         transformers     >= 0.5 && < 0.7
 
     if !os(windows)


=====================================
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
=====================================
@@ -34,49 +34,52 @@ module Language.Haskell.TH.Syntax
     -- $infix
     ) where
 
-import qualified Data.Fixed as Fixed
+import Prelude
 import Data.Data hiding (Fixity(..))
 import Data.IORef
 import System.IO.Unsafe ( unsafePerformIO )
 import System.FilePath
 import GHC.IO.Unsafe    ( unsafeDupableInterleaveIO )
-import Control.Monad (liftM)
 import Control.Monad.IO.Class (MonadIO (..))
 import Control.Monad.Fix (MonadFix (..))
-import Control.Applicative (Applicative(..))
 import Control.Exception (BlockedIndefinitelyOnMVar (..), catch, throwIO)
 import Control.Exception.Base (FixIOException (..))
 import Control.Concurrent.MVar (newEmptyMVar, readMVar, putMVar)
 import System.IO        ( hPutStrLn, stderr )
-import Data.Char        ( isAlpha, isAlphaNum, isUpper, ord )
+import Data.Char        ( isAlpha, isAlphaNum, isUpper )
 import Data.Int
 import Data.List.NonEmpty ( NonEmpty(..) )
-import Data.Void        ( Void, absurd )
 import Data.Word
 import Data.Ratio
-import GHC.CString      ( unpackCString# )
 import GHC.Generics     ( Generic )
-import GHC.Types        ( Int(..), Word(..), Char(..), Double(..), Float(..),
-                          TYPE, RuntimeRep(..), Levity(..), Multiplicity (..) )
 import qualified Data.Kind as Kind (Type)
-import GHC.Prim         ( Int#, Word#, Char#, Double#, Float#, Addr# )
 import GHC.Ptr          ( Ptr, plusPtr )
 import GHC.Lexeme       ( startsVarSym, startsVarId )
 import GHC.ForeignSrcLang.Type
 import Language.Haskell.TH.LanguageExtensions
-import Numeric.Natural
 import Prelude hiding (Applicative(..))
 import Foreign.ForeignPtr
 import Foreign.C.String
 import Foreign.C.Types
+import GHC.Types        (TYPE, RuntimeRep(..), Levity(..))
 
+#ifndef BOOTSTRAP_TH
+import Control.Monad (liftM)
+import Data.Char (ord)
+import qualified Data.Fixed as Fixed
+import GHC.Prim         ( Int#, Word#, Char#, Double#, Float#, Addr# )
+import GHC.Types        ( Int(..), Word(..), Char(..), Double(..), Float(..))
+import GHC.CString      ( unpackCString# )
+import GHC.ForeignPtr (ForeignPtr(..), ForeignPtrContents(..))
+import Data.Void        ( Void, absurd )
+import Numeric.Natural
 import Data.Array.Byte (ByteArray(..))
 import GHC.Exts
   ( ByteArray#, unsafeFreezeByteArray#, copyAddrToByteArray#, newByteArray#
   , isByteArrayPinned#, isTrue#, sizeofByteArray#, unsafeCoerce#, byteArrayContents#
   , copyByteArray#, newPinnedByteArray#)
-import GHC.ForeignPtr (ForeignPtr(..), ForeignPtrContents(..))
 import GHC.ST (ST(..), runST)
+#endif
 
 -----------------------------------------------------
 --
@@ -1018,6 +1021,8 @@ class Lift (t :: TYPE r) where
   liftTyped :: Quote m => t -> Code m t
 
 
+-- See Note [Bootstrapping Template Haskell]
+#ifndef BOOTSTRAP_TH
 -- If you add any instances here, consider updating test th/TH_Lift
 instance Lift Integer where
   liftTyped x = unsafeCodeCoerce (lift x)
@@ -1384,10 +1389,11 @@ rightName = 'Right
 
 nonemptyName :: Name
 nonemptyName = '(:|)
+#endif
 
 oneName, manyName :: Name
-oneName  = 'One
-manyName = 'Many
+oneName  = mkNameG DataName "ghc-prim" "GHC.Types" "One"
+manyName = mkNameG DataName "ghc-prim" "GHC.Types" "Many"
 
 -----------------------------------------------------
 --


=====================================
libraries/template-haskell/vendored-filepath/System/FilePath/Posix.hs
=====================================
@@ -102,6 +102,7 @@ module System.FilePath.Posix
     )
     where
 
+import Prelude
 import Data.Char(toLower, toUpper, isAsciiLower, isAsciiUpper)
 import Data.Maybe(isJust)
 import Data.List(stripPrefix, isSuffixOf)


=====================================
libraries/template-haskell/vendored-filepath/System/FilePath/Windows.hs
=====================================
@@ -102,6 +102,7 @@ module System.FilePath.Windows
     )
     where
 
+import Prelude
 import Data.Char(toLower, toUpper, isAsciiLower, isAsciiUpper)
 import Data.Maybe(isJust)
 import Data.List(stripPrefix, isSuffixOf)



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d2a7ca25b24b1963e33a55a6e00cbd886815de21

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d2a7ca25b24b1963e33a55a6e00cbd886815de21
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240410/c66604fb/attachment-0001.html>


More information about the ghc-commits mailing list