[Git][ghc/ghc][wip/T23536-teo] 2 commits: fixup! Make template-haskell a stage1 package
Teo Camarasu (@teo)
gitlab at gitlab.haskell.org
Wed Apr 10 09:34:02 UTC 2024
Teo Camarasu pushed to branch wip/T23536-teo at Glasgow Haskell Compiler / GHC
Commits:
11b974fe by Teo Camarasu at 2024-04-10T10:11:17+01:00
fixup! Make template-haskell a stage1 package
- - - - -
01b8ad82 by Teo Camarasu at 2024-04-10T10:15:46+01:00
try not always re-exporting template-haskell in ghc-boot
- - - - -
5 changed files:
- compiler/GHC/Tc/Gen/Splice.hs
- compiler/ghc.cabal.in
- hadrian/src/Settings/Packages.hs
- libraries/ghc-boot/ghc-boot.cabal.in
- libraries/ghci/ghci.cabal.in
Changes:
=====================================
compiler/GHC/Tc/Gen/Splice.hs
=====================================
@@ -2919,105 +2919,125 @@ tcGetInterp = do
-- Note [Bootstrapping Template Haskell]
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- Template Haskell requires special attention when bootstrapping GHC.
+-- The boot compiler (stage0) comes bundled with one version of
+-- `template-haskell`. The bootstrap (stage1) compiler transitions to
+-- using the in-tree version, which we distribute with our final compiler
+-- (stage2). This is tricky because implementing Template Haskell requires cross
+-- stage dependencies. We can divide Template Haskell into 3 capabilitie:
+-- implementing running splices either using the internal interpreter, or the
+-- external interpreter, and desugaring quote syntax. A splice (next stage) depends on the
+-- interface of the `template-haskell` library (built in previous stage), which
+-- must match the version built into the compiler, in order for that splice to
+-- be runnable. The exposed interface includes the `Q` monad, the `Quasi` and
+-- `Quote` typeclasses, and the TH AST. To desugar quote syntax, we use a sort
+-- of late binding where we emit references to known-key definitions in
+-- `GHC.Builtin.Names.TH` pointing to identifiers matching the in-tree library.
--
--- Template Haskell requires special attention when compiling GHC.
--- The implementation of the Template Haskell set of features requires tight
--- coupling between the compiler and the `template-haskell` library.
--- This complicates the bootstrapping story as compatibility constraints are
--- placed on the version of `template-haskell` used to compile GHC during a
--- particular stage and the version bundled with it.
+-- In our current implementation we resolve these constraints by vendoring the
+-- in-tree `template-haskell` library into the `ghc-boot` package while
+-- compiling the stage1 compiler. This allows the stage1 compiler to depend on
+-- the in-tree version of the interface without requiring the `template-haskell`
+-- library to be built in stage0.
--
--- These constraints can be divided by the features they are used to implement,
--- namely running splices either directly or via the external interpreter, and
--- desugaring bracket syntax.
+-- Let's illustrate how Template Haskell works through an example.
+-- Take the following code, which uses both quotes and splices.
+-- > main = print $([|unwords ["hello", "world"]|])
--
--- (C1) Executing splices within the compiler: In order to execute a splice
--- within the compiler, we must be able to compile and load code built against
--- the same version of the `template-haskell` library as the compiler. This
--- is an ABI compatibility constraint between the `template-haskell` version of
--- the compiler and the splice.
--- (C2) Executing splices through the external interpreter: In order to execute
--- a splice via the external interpreter, we serialise bytecode, run it with the
--- external interpreter, and communicate back the result through a binary
--- serialised interface. This is a binary serialisation compatibilty constraint
--- between the `template-haskell` version of the compiler and the splice.
--- (C3) Desugaring bracket syntax: Bracket syntax is desugared by referring to a
--- special wired-in package whose package id is `template-haskell`. So for
--- instance an expression `'Just` gets desugared to something of type
--- `template-haskell:Language.Haskell.TH.Syntax.Name`. Importantly, while this
--- identifier is wired-in, the identity of the `template-haskell` package is
--- not. So for instance we can successfully use an expression like
--- `'Just :: Name` while compiling the `template-haskell` package as long as its
--- package id is set to `template-haskell` as `Name` will resolve the the local
--- identifier in the package (and the LHS and RHS will align). On the other
--- hand, if we don't set the special package id, the type of the expression will
--- be `template-haskell:...Name` while the `Name` on the RHS will resolve to the
--- local identifier and we will get a type error. So, bracket syntax assumes the
--- presence of a particular API in the `template-haskell` package, but it allows
---
--- These constraints are ranked from strongest to weakest. They only apply if we
--- want to support the particular feature associated with them.
+-- The splice is run during renaming. This consists of the following steps.
+-- 1. The quote is desugared becoming something equivalent to
+-- `varE (mkNameS "unwords") `appE` listE [litE (stringE "hello"), litE (stringE "world")]`
+-- Notice we are introducing references to identifiers from the
+-- `template-haskell` library.
+-- This desguaring is implemented in `GHC.HsToCore.Quote.dsBracket`.
+-- 2. The core expression of the quote is turned into bytecode.
+-- 3. The bytecode is run in the interpreter. If we are using the internal
+-- interpreter this is the `ghc` executable. If we are using the external
+-- interpreter this is the `ghc-iserv` executable.
+-- 4. Optionally the interpreter must load object code for packages mentioned in
+-- the bytecode. In this instance that is just `template-haskell`.
+-- 5. The bytecode is run.
+-- 6. The result is a TH AST, which we convert back to a GHC AST using the
+-- functions from `GHC.ThToHs`.
--
--- The tricky case is what do to when building the bootstrapping (stage1) GHC.
--- The stage2 GHC is simpler as it can use the in-tree `template-haskell`
--- package built by the stage1 GHC.
+-- At several points in the process different features required depending on the
+-- `template-haskell` library. These constraints conditionally apply to any version of the
+-- compiler that enables the relevant feature.
--
--- We should note that we cannot feasibly use the internal interpreter with a
--- stage1 GHC. This is because the stage1 GHC was compiled with the stage0 GHC,
--- which we assume is a different version. In order to run a splice that too
--- would need to be compiled with the stage0 GHC, and so would all its
--- dependencies.
--- This allows us to disregard (C1) for the stage1 case.
+-- (C1) running a splice using the internal interpreter: if we run a splice by
+-- loading the bytecode into the internal interpreter, then our compiler must be
+-- linked against the same version of the `template-haskell` library as the
+-- splice. For the stage2 compiler, this means a user splice (stage3) is linked
+-- against the in-tree `template-haskell` library (stage1), which is the same
+-- as what the stage2 compiler is linked against.
+-- This constraint is introduced by step 4 and step 6. Usages of `reify`
+-- functions from `Quasi` would also entail this constraint. If there were a
+-- discrepancy between these two versions of `template-haskell`, for instance if
+-- the splice's AST laccked the `StringE` constructor, then we would run into a
+-- runtime or linker error. This constraint doesn't apply to the stage1 compiler
+-- because we currently don't support using the internal interpreter at that
+-- stage at all.
--
--- In the past, we used to build the stage1 GHC and all its dependencies against
--- the in-tree `template-haskell` library. This meant that we sacrificed (C2)
--- because they are likely not serialisation compatible. We could not sacrifice
--- (C3) because dependencies of GHC (such as `containers` and
--- `template-haskell`) used bracket syntax to define `Lift` instances. This
--- meant that the interface assumed by the boot compiler to implement bracket
--- desugaring could not be modified (not even through CPP as (C1) would
--- constrain us in future stages where we do support the internal interpreter).
--- Yet, bracket syntax did work and gave us splices that desugared to code that
--- referenced the in-tree version of `template-haskell` not the one the boot
--- compiler required. So they could never be run.
+-- (C2) running a splice using the external interpreter: if we run a splice by
+-- loading the bytecode into the external interpreter, then our `iserv`
+-- executable must be linked against the same version of the `template-haskell`
+-- library as the splice. The compiler on the other hand only needs to be depend
+-- on a binary serialisation that is compatible with `template-haskell` library.
+-- This is a similar but weaker constraint on the compiler than the previous
+-- one. It is possible to use this with the stage1 compiler by using an `iserv`
+-- executable that is compiled against the stage1 `template-haskell` in-tree
+-- library. If there is a discrepancy in the binary interface then we will run
+-- into a runtime error if we fail to parse a message.
--
--- Our current strategy is to not build `template-haskell` as a stage0 package.
--- All of GHCs dependencies depend on the boot compilers version, and produce
--- runnable splices. How do we deal with the stage1 compiler's dependency on
--- `template-haskell`? There are two options. (D1) depend on the boot
--- compiler's version for stage1 and then depend on the in-tree one in stage2.
--- This violates (C1) and (C2), so we wouldn't be able to run splices at all
--- with the stage1 compiler. Additionally this would introduce quite a bit of
--- CPP into the compiler and mean we would have to stub out much of the
--- template-haskell machinery or have an unrunable compatibilty shim. Or (D2)
--- depend on the in-tree version.
+-- (C3) desugaring quote syntax: the desugaring of quote syntax is hard coded
+-- into each version of the compiler. `GHC.Builtin.Names.TH` declares many names
+-- as known-key, inducing a coupling relationship quite to that of `ghc-internal`.
+-- This is so that quotes can be desugared to calls to TH AST (smart)
+-- constructors and uses of the Lift type class.
+-- Furthermore, name quotes such as `'Just` are resolved relative to the module
+-- under compilation and then desugar to a
+-- `template-haskell:Language.Haskell.TH.Syntax.Name`, where the exact location
+-- is determined by `GHC.Builtin.Names.TH`. This location must match the
+-- `template-haskell` library against which the quote is linked.
+-- This implements a sort of late binding, where identifiers will be mapped to
+-- whatever package called `template-haskell` is in scope.
+-- The late binding makes it possible to use quotes while compiling the
+-- `template-haskell` library itself as identifiers will correctly be mapped to
+-- the current package, rather than the one from the previous stage.
+-- If we attempt to use quotes, but don't have a `template-haskell` package in
+-- the package db with the appropriate interface we will run into type errors.
+-- For instance if it lacked the `varE` function or had a function with the
+-- wrong arity, then we would encounter an error when a module using the quote
+-- is compiled.
--
--- (D2) is what we implement, but it is complicated by the fact that it means we
--- practically have two versions of `template-haskell` in the dependency graph
--- of the stage1 compiler. To avoid this, we recall that we only need
--- serliasation compatibility (as per (C2)), so we can avoid a direct dependency
--- on the in-tree version by vendoring it. We choose to vendor it into the
--- `ghc-boot` package as both `lib:ghc` and `ghci` require a dependency on the
--- `template-haskell` interface as they define the two ends of the protocol.
--- This allows us to still run splices through the external interpreter.
+-- These 3 constraints inform our implementation strategy when building the
+-- stage1 compiler. We are making use of the stage0 compiler, so if we want to
+-- keep the possibility of using Template Haskell in the bootstrapping process
+-- open (#22069), then our package DB must contain the `template-haskell`
+-- library that our stage0 compiler is linked against (C1-3).
+-- Yet, for the stage1 compiler, (C1) doesn't apply as we cannot use the
+-- internal interpreter. We do not need to link the stage1 compiler against
+-- the exact same `template-haskell` library that user splices compiled using
+-- the stage1 compiler will need to use. Therefore, we vendor the modules from
+-- the in-tree `template-haskell` library into the `ghc-boot` package. As a
+-- common dependency of both, this allows the `ghc` and `ghci` libraries to
+-- depend on the interface and serialisation from the in-tree `template-haskell`
+-- library. This comes with a trade-off: we cannot make use of quotes while
+-- compiling these modules as the identifiers would resolve to reflect the
+-- stage0 compiler's expected structure (C3).
--
--- We should note a futher edge-case with this approach. When compiling our
--- vendored `template-haskell` library, we run afoul of (C3). The library
--- defines several `Name`s using bracket syntax. As this package doesn't claim
--- to be the wired-in package but it defines its own `Name` type, we get a type
--- discrepancy with the `Name` type from the boot compiler's `template-haskell`
--- library. Most of these are only used to define `Lift` instances, so in the
--- vendored case we simply hide them behind CPP. Yet, there is one distinct use
--- of a `Name`. We have a `Name` for the constructors of the `Multiplicity`
--- type, which are also used in the pretty-printing module. We construct these
--- manulally instead. This allows us to completely avoid using bracket syntax
--- for compiling the vendored `template-haskell` package.
+-- Then the stage1 compiler is used to compile the in-tree `template-haskell`
+-- library, which is linked into the stage2 compiler and distributed alongside
+-- it.
--
--- To summarise, our current approach allows us to use the external interpreter
--- to run splices and allows bracket syntax to be desugared correctly. In order
--- to implement this we vendor the `template-haskell` library into `ghc-boot`
--- and take special care to not use bracket syntax in those modules as that
--- would incorrectly produce code that uses identifiers from the boot compiler's
--- `template-haskell` library.
+-- Historically, we used to build the in-tree `template-haskell` library in
+-- stage0. This meant that qoutes would resolve to splices that couldn't be run
+-- by the stage0 compiler (C1-2), eg, the in-tree library might contain a new
+-- constructor the stage0 compiler wasn't aware of. It also made it difficult to
+-- refactor the `template-haskell` library as it needed to be conservative with
+-- regard to the identifiers the stage0 compiler was expecting in order to
+-- desugar quotes (C3). See #23536.
--
--- See #23536.
+-- The current setup allows both stage0 and stage2 to use the full gamut of
+-- Template Haskell functionality, while stage1 can use all but the internal
+-- interpreter to run splices.
=====================================
compiler/ghc.cabal.in
=====================================
@@ -82,6 +82,15 @@ Flag hadrian-stage0
Default: False
Manual: True
+Flag bootstrap-th
+ Description:
+ Enabled when building the stage1 compiler in order to vendor the in-tree
+ `template-haskell` library, while allowing dependencies to depend on the
+ boot `template-haskell` library.
+ See Note [Bootstrapping Template Haskell]
+ Default: False
+ Manual: True
+
Library
Default-Language: GHC2021
Exposed: False
@@ -124,6 +133,10 @@ Library
ghc-heap == @ProjectVersionMunged@,
ghci == @ProjectVersionMunged@
+ if !flag(bootstrap-th)
+ Build-Depends:
+ template-haskell == 2.22.0.0
+
if os(windows)
Build-Depends: Win32 >= 2.3 && < 2.15
else
=====================================
hadrian/src/Settings/Packages.hs
=====================================
@@ -87,6 +87,7 @@ packageArgs = do
-- We do it through a cabal flag in ghc.cabal
, stageVersion < makeVersion [9,8,1] ? arg "+hadrian-stage0"
, flag StaticLibzstd `cabalFlag` "static-libzstd"
+ , stage0 `cabalFlag` "bootstrap-th"
]
, builder (Haddock BuildPackage) ? arg ("--optghc=-I" ++ path) ]
@@ -155,9 +156,12 @@ packageArgs = do
-- compiler comes with the same versions as the one we are building.
--
builder (Cabal Setup) ? cabalExtraDirs ffiIncludeDir ffiLibraryDir
- , builder (Cabal Flags) ? ifM stage0
- (andM [cross, bootCross] `cabalFlag` "internal-interpreter")
- (arg "internal-interpreter")
+ , builder (Cabal Flags) ? mconcat
+ [ ifM stage0
+ (andM [cross, bootCross] `cabalFlag` "internal-interpreter")
+ (arg "internal-interpreter")
+ , stage0 `cabalFlag` "bootstrap-th"
+ ]
]
=====================================
libraries/ghc-boot/ghc-boot.cabal.in
=====================================
@@ -115,17 +115,13 @@ Library
hs-source-dirs: .
build-depends:
ghc-boot-th == @ProjectVersionMunged@
- , template-haskell == 2.22.0.0
- -- reexport modules from ghc-boot-th and template-haskell so that packages
- -- don't have to import all of ghc-boot, ghc-boot-th and template-haskell.
- -- It makes the dependency graph easier to understand and to refactor
- -- and reduces the amount of cabal flags we need to use for bootstrapping TH.
+ -- reexport modules from ghc-boot-th that packages
+ -- don't have to import all of ghc-boot, and ghc-boot-th.
+ -- It makes the dependency graph easier to understand.
reexported-modules:
GHC.LanguageExtensions.Type
, GHC.ForeignSrcLang.Type
, GHC.Lexeme
- , Language.Haskell.TH
- , Language.Haskell.TH.Syntax
if !os(windows)
build-depends:
unix >= 2.7 && < 2.9
=====================================
libraries/ghci/ghci.cabal.in
=====================================
@@ -22,6 +22,15 @@ Flag internal-interpreter
Default: False
Manual: True
+Flag bootstrap-th
+ Description:
+ Enabled when building the stage1 compiler in order to vendor the in-tree
+ `template-haskell` library, while allowing dependencies to depend on the
+ boot `template-haskell` library.
+ See Note [Bootstrapping Template Haskell]
+ Default: False
+ Manual: True
+
source-repository head
type: git
location: https://gitlab.haskell.org/ghc/ghc.git
@@ -86,5 +95,10 @@ library
ghc-heap == @ProjectVersionMunged@,
transformers >= 0.5 && < 0.7
+ if !flag(bootstrap-th)
+ Build-Depends:
+ template-haskell == 2.22.0.0
+
+
if !os(windows)
Build-Depends: unix >= 2.7 && < 2.9
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1f9392adfba41999c47bc632379f101bef7d9eed...01b8ad82b641ecb1b6859774e342548049a3bd32
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1f9392adfba41999c47bc632379f101bef7d9eed...01b8ad82b641ecb1b6859774e342548049a3bd32
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240410/4df83d52/attachment-0001.html>
More information about the ghc-commits
mailing list