[Git][ghc/ghc][wip/T23536-teo] Make template-haskell a stage1 package
Teo Camarasu (@teo)
gitlab at gitlab.haskell.org
Wed Apr 10 13:21:51 UTC 2024
Teo Camarasu pushed to branch wip/T23536-teo at Glasgow Haskell Compiler / GHC
Commits:
d2a7ca25 by Teo Camarasu at 2024-04-10T14:21:32+01:00
Make template-haskell a stage1 package
Promoting template-haskell from a stage0 to a stage1 package means that
we can much more easily refactor template-haskell.
We implement this by vendoring the in-tree `template-haskell` into
`ghc-boot` thus allowing `stage1:ghc` to depend on the new interface of
the library including the `Binary` instances.
This is controlled by a `bootstrap-th` cabal flag on `ghc-boot`.
When building `template-haskell` modules as part of this vendoring we do
not have access to quote syntax, so we cannot use variable quote
notation (`'Just`). So we either replace these with hand-written `Name`s
or hide the code behind CPP.
We can remove the `th_hack` from hadrian, which was required when
building stage0 packages using the in-tree `template-haskell` library.
For more details see Note [Bootstrapping Template Haskell].
Resolves #23536
Co-Authored-By: Sebastian Graf <sgraf1337 at gmail.com>
- - - - -
11 changed files:
- compiler/GHC/Tc/Gen/Splice.hs
- compiler/ghc.cabal.in
- hadrian/src/Rules/Dependencies.hs
- hadrian/src/Rules/ToolArgs.hs
- hadrian/src/Settings/Default.hs
- hadrian/src/Settings/Packages.hs
- libraries/ghc-boot/ghc-boot.cabal.in
- libraries/ghci/ghci.cabal.in
- libraries/template-haskell/Language/Haskell/TH/Syntax.hs
- libraries/template-haskell/vendored-filepath/System/FilePath/Posix.hs
- libraries/template-haskell/vendored-filepath/System/FilePath/Windows.hs
Changes:
=====================================
compiler/GHC/Tc/Gen/Splice.hs
=====================================
@@ -2916,3 +2916,98 @@ tcGetInterp = do
case hsc_interp hsc_env of
Nothing -> liftIO $ throwIO (InstallationError "Template haskell requires a target code interpreter")
Just i -> pure i
+
+-- Note [Bootstrapping Template Haskell]
+-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- Staged Metaprogramming as implemented in Template Haskell introduces a whole
+-- new dimension of staging to the already staged bootstrapping process.
+-- The `template-haskell` library plays a crucial role in this process.
+--
+-- Nomenclature:
+--
+-- boot/stage0 compiler: An already released compiler used to compile GHC
+-- stage(N+1) compiler: The result of compiling GHC from source with stage(N)
+-- Recall that any code compiled by the stage1 compiler should be binary
+-- identical to the same code compiled by later stages.
+-- boot TH: the `template-haskell` that comes with (and is linked to) the
+-- boot/stage0 compiler
+-- in-tree TH: the `template-haskell` library that lives in GHC's repository.
+-- Recall that building in-tree TH with the stage1 compiler yields a binary
+-- that is identical to the in-tree TH compiled by stage2.
+-- boot library: A library such as bytestring or containers that GHC depends on.
+-- CONFUSINGLY, we build these libraries with the boot compiler as well as
+-- the stage1 compiler; thus the "boot" in boot library does not refer to a
+-- stage.
+--
+-- Here is how we bootstrap `template-haskell` in tandem with GHC:
+--
+-- 1. Link the stage1 compiler against the boot TH library.
+-- 2. When building the stage1 compiler, vendor the parts relevant to serialising
+-- the (new, in-tree) TH AST into `ghc-boot`, thus shadowing definitions in the
+-- implicitly linked boot TH.
+-- 3. Build the in-tree TH with the stage1 compiler.
+-- 4. Build and link the stage2 compiler against the in-tree TH.
+--
+-- Observations:
+--
+-- A. The vendoring in (2) means that the fully qualified name of the in-tree TH
+-- AST will be, e.g., `ghc-boot:...VarE`, not `template-haskell:...VarE`.
+-- That is OK, because we need it just for the `Binary` instance, which does
+-- not depend on the fully qualified name of the type to serialise!
+-- Importantly, Note [Hard-wiring in-tree template-haskell for desugaring quotes]
+-- is unaffected, because the desugaring refers to names in stage1 TH, i.e.,
+-- the next compiler stage.
+--
+-- (Rejected) alternative designs:
+--
+-- 1b. Build the in-tree TH with the stage0 compiler and link the stage1 compiler
+-- against it. This is what we did until Apr 24 and it is problematic (#23536):
+-- * (It rules out using TH in GHC, for example to derive GHC.Core.Map types,
+-- because the boot compiler expects the boot TH AST in splices, but, e.g.,
+-- splice functions in GHC.Core.Map.TH would return the in-tree TH AST.
+-- However, at the moment, we are not using TH in GHC anyway.)
+-- * Ultimately, we must link the stage1 compiler against a single version
+-- of template-haskell.
+-- (SG: at least I think that is the case. Can someone verify? Otherwise
+-- it would be conceivable to build just ghc against in-tree TH and
+-- keep the boot libraries built against boot TH.)
+-- (TC: this might be unblocked in the future if we have a separate
+-- package DB for splices.)
+-- * If the single version is the in-tree TH, we have to recompile all boot
+-- libraries (e.g. bytestring, containers) with this new TH version.
+-- * But the boot libraries must *not* be built against a non-boot TH version.
+-- The reason is Note [Hard-wiring in-tree template-haskell for desugaring quotes]:
+-- The boot compiler will desugar quotes wrt. names in the boot TH version.
+-- A quote like `[| unsafePackLenLiteral |]` in bytestring will desugar
+-- to `varE (mkNameS "unsafePackLenLiteral")`, and all
+-- those smart constructors refer to locations in *boot TH*, because that
+-- is all that the boot GHC knows about.
+-- If the in-tree TH were to move or rename the definition of
+-- `mkNameS`, the boot compiler would report a linker error when
+-- compiling bytestring.
+-- * (Stopping to use quotes in bytestring is no solution, either, because
+-- the `Lift` type class is wired-in as well.
+-- Only remaining option: provide an entirely TH-less variant of every
+-- boot library. That would place a huge burden on maintainers and is
+-- thus rejected.)
+-- * We have thus made it impossible to refactor in-tree TH.
+-- This problem was discussed in #23536.
+-- 2b. Instead of vendoring, build a CPP'd version of in-tree TH by the boot
+-- compiler under a changed package-id, e.g., `template-haskell-next`, and
+-- build stage1 GHC against that.
+-- SG: Why not?
+
+-- Note [Hard-wiring in-tree template-haskell for desugaring quotes]
+-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- To desugar Template Haskell quotes, GHC needs to wire in a bunch of Names in the
+-- `template-haskell` library as Note [Known-key names], in GHC.Builtin.Names.TH.
+-- Consider
+-- > foo :: Q Exp
+-- > foo = [| unwords ["hello", "world"] |]
+-- this desugars to Core that looks like this
+-- > varE (mkNameS "unwords") `appE` listE [litE (stringE "hello"), litE (stringE "world")]
+-- And all these smart constructors are known-key.
+-- NB: Since the constructors are known-key, it is impossible to link this program
+-- against another template-haskell library in which, e.g., `varE` was moved into a
+-- different module. So effectively, GHC is hard-wired against the in-tree
+-- template-haskell library.
=====================================
compiler/ghc.cabal.in
=====================================
@@ -115,7 +115,6 @@ Library
containers >= 0.6.2.1 && < 0.8,
array >= 0.1 && < 0.6,
filepath >= 1 && < 1.6,
- template-haskell == 2.22.*,
hpc >= 0.6 && < 0.8,
transformers >= 0.5 && < 0.7,
exceptions == 0.10.*,
=====================================
hadrian/src/Rules/Dependencies.hs
=====================================
@@ -35,7 +35,10 @@ extra_dependencies =
where
th_internal = (templateHaskell, "Language.Haskell.TH.Lib.Internal")
- dep (p1, m1) (p2, m2) s = do
+ dep (p1, m1) (p2, m2) s =
+ -- We use the boot compiler's `template-haskell` library when building stage0,
+ -- so we don't need to register dependencies.
+ if isStage0 s then pure [] else do
let context = Context s p1 (error "extra_dependencies: way not set") (error "extra_dependencies: iplace not set")
ways <- interpretInContext context getLibraryWays
mapM (\way -> (,) <$> path s way p1 m1 <*> path s way p2 m2) (S.toList ways)
=====================================
hadrian/src/Rules/ToolArgs.hs
=====================================
@@ -85,25 +85,13 @@ multiSetup pkg_s = do
need (srcs ++ gens)
let rexp m = ["-reexported-module", m]
let hidir = root </> "interfaces" </> pkgPath p
- writeFile' (resp_file root p) (intercalate "\n" (th_hack arg_list
+ writeFile' (resp_file root p) (intercalate "\n" (arg_list
++ modules cd
++ concatMap rexp (reexportModules cd)
++ ["-outputdir", hidir]))
return (resp_file root p)
- -- The template-haskell package is compiled with -this-unit-id=template-haskell but
- -- everything which depends on it depends on `-package-id-template-haskell-2.17.0.0`
- -- and so the logic for detetecting which home-units depend on what is defeated.
- -- The workaround here is just to rewrite all the `-package-id` arguments to
- -- point to `template-haskell` instead which works for the multi-repl case.
- -- See #20887
- th_hack :: [String] -> [String]
- th_hack ((isPrefixOf "-package-id template-haskell" -> True) : xs) = "-package-id" : "template-haskell" : xs
- th_hack (x:xs) = x : th_hack xs
- th_hack [] = []
-
-
toolRuleBody :: FilePath -> Action ()
toolRuleBody fp = do
mm <- dirMap
@@ -158,7 +146,6 @@ toolTargets = [ binary
-- , ghc -- # depends on ghc library
-- , runGhc -- # depends on ghc library
, ghcBoot
- , ghcBootTh
, ghcPlatform
, ghcToolchain
, ghcToolchainBin
@@ -172,7 +159,6 @@ toolTargets = [ binary
, mtl
, parsec
, time
- , templateHaskell
, text
, transformers
, semaphoreCompat
=====================================
hadrian/src/Settings/Default.hs
=====================================
@@ -93,7 +93,6 @@ stage0Packages = do
, ghc
, runGhc
, ghcBoot
- , ghcBootTh
, ghcPlatform
, ghcHeap
, ghcToolchain
@@ -108,7 +107,6 @@ stage0Packages = do
, parsec
, semaphoreCompat
, time
- , templateHaskell
, text
, transformers
, unlit
@@ -143,6 +141,7 @@ stage1Packages = do
, deepseq
, exceptions
, ghc
+ , ghcBootTh
, ghcBignum
, ghcCompact
, ghcExperimental
@@ -156,6 +155,7 @@ stage1Packages = do
, pretty
, rts
, semaphoreCompat
+ , templateHaskell
, stm
, unlit
, xhtml
=====================================
hadrian/src/Settings/Packages.hs
=====================================
@@ -121,6 +121,10 @@ packageArgs = do
, builder (Cc CompileC) ? (not <$> flag CcLlvmBackend) ?
input "**/cbits/atomic.c" ? arg "-Wno-sync-nand" ]
+ -------------------------------- ghcBoot ------------------------------
+ , package ghcBoot ?
+ builder (Cabal Flags) ? (stage0 `cabalFlag` "bootstrap-th")
+
--------------------------------- ghci ---------------------------------
, package ghci ? mconcat
[
=====================================
libraries/ghc-boot/ghc-boot.cabal.in
=====================================
@@ -35,6 +35,15 @@ source-repository head
location: https://gitlab.haskell.org/ghc/ghc.git
subdir: libraries/ghc-boot
+Flag bootstrap-th
+ Description:
+ Enabled when building the stage1 compiler in order to vendor the in-tree
+ `template-haskell` library, while allowing dependencies to depend on the
+ boot `template-haskell` library.
+ See Note [Bootstrapping Template Haskell]
+ Default: False
+ Manual: True
+
Library
default-language: Haskell2010
other-extensions: DeriveGeneric, RankNTypes, ScopedTypeVariables
@@ -56,13 +65,6 @@ Library
GHC.UniqueSubdir
GHC.Version
- -- reexport modules from ghc-boot-th so that packages don't have to import
- -- both ghc-boot and ghc-boot-th. It makes the dependency graph easier to
- -- understand and to refactor.
- reexported-modules:
- GHC.LanguageExtensions.Type
- , GHC.ForeignSrcLang.Type
- , GHC.Lexeme
-- reexport platform modules from ghc-platform
reexported-modules:
@@ -81,7 +83,49 @@ Library
filepath >= 1.3 && < 1.6,
deepseq >= 1.4 && < 1.6,
ghc-platform >= 0.1,
- ghc-boot-th == @ProjectVersionMunged@
+ if flag(bootstrap-th)
+ cpp-options: -DBOOTSTRAP_TH
+ build-depends:
+ ghc-prim
+ , pretty
+ -- we vendor ghc-boot-th and template-haskell while bootstrapping TH.
+ -- This is to avoid having two copies of ghc-boot-th and template-haskell
+ -- in the build graph: one from the boot compiler and the in-tree one.
+ hs-source-dirs: . ../ghc-boot-th ../template-haskell ../template-haskell/vendored-filepath
+ exposed-modules:
+ GHC.LanguageExtensions.Type
+ , GHC.ForeignSrcLang.Type
+ , GHC.Lexeme
+ , Language.Haskell.TH
+ , Language.Haskell.TH.Lib
+ , Language.Haskell.TH.Ppr
+ , Language.Haskell.TH.PprLib
+ , Language.Haskell.TH.Quote
+ , Language.Haskell.TH.Syntax
+ , Language.Haskell.TH.LanguageExtensions
+ , Language.Haskell.TH.CodeDo
+ , Language.Haskell.TH.Lib.Internal
+
+ other-modules:
+ Language.Haskell.TH.Lib.Map
+ , System.FilePath
+ , System.FilePath.Posix
+ , System.FilePath.Windows
+ else
+ hs-source-dirs: .
+ build-depends:
+ ghc-boot-th == @ProjectVersionMunged@
+ , template-haskell == 2.22.0.0
+ -- reexport modules from ghc-boot-th and template-haskell so that packages
+ -- don't have to import all of ghc-boot, ghc-boot-th and template-haskell.
+ -- It makes the dependency graph easier to understand and to refactor
+ -- and reduces the amount of cabal flags we need to use for bootstrapping TH.
+ reexported-modules:
+ GHC.LanguageExtensions.Type
+ , GHC.ForeignSrcLang.Type
+ , GHC.Lexeme
+ , Language.Haskell.TH
+ , Language.Haskell.TH.Syntax
if !os(windows)
build-depends:
unix >= 2.7 && < 2.9
=====================================
libraries/ghci/ghci.cabal.in
=====================================
@@ -84,7 +84,6 @@ library
filepath >= 1.4 && < 1.6,
ghc-boot == @ProjectVersionMunged@,
ghc-heap == @ProjectVersionMunged@,
- template-haskell == 2.22.*,
transformers >= 0.5 && < 0.7
if !os(windows)
=====================================
libraries/template-haskell/Language/Haskell/TH/Syntax.hs
=====================================
@@ -34,49 +34,52 @@ module Language.Haskell.TH.Syntax
-- $infix
) where
-import qualified Data.Fixed as Fixed
+import Prelude
import Data.Data hiding (Fixity(..))
import Data.IORef
import System.IO.Unsafe ( unsafePerformIO )
import System.FilePath
import GHC.IO.Unsafe ( unsafeDupableInterleaveIO )
-import Control.Monad (liftM)
import Control.Monad.IO.Class (MonadIO (..))
import Control.Monad.Fix (MonadFix (..))
-import Control.Applicative (Applicative(..))
import Control.Exception (BlockedIndefinitelyOnMVar (..), catch, throwIO)
import Control.Exception.Base (FixIOException (..))
import Control.Concurrent.MVar (newEmptyMVar, readMVar, putMVar)
import System.IO ( hPutStrLn, stderr )
-import Data.Char ( isAlpha, isAlphaNum, isUpper, ord )
+import Data.Char ( isAlpha, isAlphaNum, isUpper )
import Data.Int
import Data.List.NonEmpty ( NonEmpty(..) )
-import Data.Void ( Void, absurd )
import Data.Word
import Data.Ratio
-import GHC.CString ( unpackCString# )
import GHC.Generics ( Generic )
-import GHC.Types ( Int(..), Word(..), Char(..), Double(..), Float(..),
- TYPE, RuntimeRep(..), Levity(..), Multiplicity (..) )
import qualified Data.Kind as Kind (Type)
-import GHC.Prim ( Int#, Word#, Char#, Double#, Float#, Addr# )
import GHC.Ptr ( Ptr, plusPtr )
import GHC.Lexeme ( startsVarSym, startsVarId )
import GHC.ForeignSrcLang.Type
import Language.Haskell.TH.LanguageExtensions
-import Numeric.Natural
import Prelude hiding (Applicative(..))
import Foreign.ForeignPtr
import Foreign.C.String
import Foreign.C.Types
+import GHC.Types (TYPE, RuntimeRep(..), Levity(..))
+#ifndef BOOTSTRAP_TH
+import Control.Monad (liftM)
+import Data.Char (ord)
+import qualified Data.Fixed as Fixed
+import GHC.Prim ( Int#, Word#, Char#, Double#, Float#, Addr# )
+import GHC.Types ( Int(..), Word(..), Char(..), Double(..), Float(..))
+import GHC.CString ( unpackCString# )
+import GHC.ForeignPtr (ForeignPtr(..), ForeignPtrContents(..))
+import Data.Void ( Void, absurd )
+import Numeric.Natural
import Data.Array.Byte (ByteArray(..))
import GHC.Exts
( ByteArray#, unsafeFreezeByteArray#, copyAddrToByteArray#, newByteArray#
, isByteArrayPinned#, isTrue#, sizeofByteArray#, unsafeCoerce#, byteArrayContents#
, copyByteArray#, newPinnedByteArray#)
-import GHC.ForeignPtr (ForeignPtr(..), ForeignPtrContents(..))
import GHC.ST (ST(..), runST)
+#endif
-----------------------------------------------------
--
@@ -1018,6 +1021,8 @@ class Lift (t :: TYPE r) where
liftTyped :: Quote m => t -> Code m t
+-- See Note [Bootstrapping Template Haskell]
+#ifndef BOOTSTRAP_TH
-- If you add any instances here, consider updating test th/TH_Lift
instance Lift Integer where
liftTyped x = unsafeCodeCoerce (lift x)
@@ -1384,10 +1389,11 @@ rightName = 'Right
nonemptyName :: Name
nonemptyName = '(:|)
+#endif
oneName, manyName :: Name
-oneName = 'One
-manyName = 'Many
+oneName = mkNameG DataName "ghc-prim" "GHC.Types" "One"
+manyName = mkNameG DataName "ghc-prim" "GHC.Types" "Many"
-----------------------------------------------------
--
=====================================
libraries/template-haskell/vendored-filepath/System/FilePath/Posix.hs
=====================================
@@ -102,6 +102,7 @@ module System.FilePath.Posix
)
where
+import Prelude
import Data.Char(toLower, toUpper, isAsciiLower, isAsciiUpper)
import Data.Maybe(isJust)
import Data.List(stripPrefix, isSuffixOf)
=====================================
libraries/template-haskell/vendored-filepath/System/FilePath/Windows.hs
=====================================
@@ -102,6 +102,7 @@ module System.FilePath.Windows
)
where
+import Prelude
import Data.Char(toLower, toUpper, isAsciiLower, isAsciiUpper)
import Data.Maybe(isJust)
import Data.List(stripPrefix, isSuffixOf)
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d2a7ca25b24b1963e33a55a6e00cbd886815de21
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/d2a7ca25b24b1963e33a55a6e00cbd886815de21
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240410/c66604fb/attachment-0001.html>
More information about the ghc-commits
mailing list