[Git][ghc/ghc][wip/T23536-teo] 2 commits: fixup! Make template-haskell a stage1 package

Teo Camarasu (@teo) gitlab at gitlab.haskell.org
Wed Apr 10 09:34:02 UTC 2024



Teo Camarasu pushed to branch wip/T23536-teo at Glasgow Haskell Compiler / GHC


Commits:
11b974fe by Teo Camarasu at 2024-04-10T10:11:17+01:00
fixup! Make template-haskell a stage1 package

- - - - -
01b8ad82 by Teo Camarasu at 2024-04-10T10:15:46+01:00
try not always re-exporting template-haskell in ghc-boot

- - - - -


5 changed files:

- compiler/GHC/Tc/Gen/Splice.hs
- compiler/ghc.cabal.in
- hadrian/src/Settings/Packages.hs
- libraries/ghc-boot/ghc-boot.cabal.in
- libraries/ghci/ghci.cabal.in


Changes:

=====================================
compiler/GHC/Tc/Gen/Splice.hs
=====================================
@@ -2919,105 +2919,125 @@ tcGetInterp = do
 
 -- Note [Bootstrapping Template Haskell]
 -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- Template Haskell requires special attention when bootstrapping GHC.
+-- The boot compiler (stage0) comes bundled with one version of
+-- `template-haskell`. The bootstrap (stage1) compiler transitions to
+-- using the in-tree version, which we distribute with our final compiler
+-- (stage2). This is tricky because implementing Template Haskell requires cross
+-- stage dependencies. We can divide Template Haskell into 3 capabilitie:
+-- implementing running splices either using the internal interpreter, or the
+-- external interpreter, and desugaring quote syntax. A splice (next stage) depends on the
+-- interface of the `template-haskell` library (built in previous stage), which
+-- must match the version built into the compiler, in order for that splice to
+-- be runnable. The exposed interface includes the `Q` monad, the `Quasi` and
+-- `Quote` typeclasses, and the TH AST. To desugar quote syntax, we use a sort
+-- of late binding where we emit references to known-key definitions in
+-- `GHC.Builtin.Names.TH` pointing to identifiers matching the in-tree library.
 --
--- Template Haskell requires special attention when compiling GHC.
--- The implementation of the Template Haskell set of features requires tight
--- coupling between the compiler and the `template-haskell` library.
--- This complicates the bootstrapping story as compatibility constraints are
--- placed on the version of `template-haskell` used to compile GHC during a
--- particular stage and the version bundled with it.
+-- In our current implementation we resolve these constraints by vendoring the
+-- in-tree `template-haskell` library into the `ghc-boot` package while
+-- compiling the stage1 compiler. This allows the stage1 compiler to depend on
+-- the in-tree version of the interface without requiring the `template-haskell`
+-- library to be built in stage0.
 --
--- These constraints can be divided by the features they are used to implement,
--- namely running splices either directly or via the external interpreter, and
--- desugaring bracket syntax.
+-- Let's illustrate how Template Haskell works through an example.
+-- Take the following code, which uses both quotes and splices.
+-- > main = print $([|unwords ["hello", "world"]|])
 --
--- (C1) Executing splices within the compiler: In order to execute a splice
--- within the compiler, we must be able to compile and load code built against
--- the same version of the `template-haskell` library as the compiler. This
--- is an ABI compatibility constraint between the `template-haskell` version of
--- the compiler and the splice.
--- (C2) Executing splices through the external interpreter: In order to execute
--- a splice via the external interpreter, we serialise bytecode, run it with the
--- external interpreter, and communicate back the result through a binary
--- serialised interface. This is a binary serialisation compatibilty constraint
--- between the `template-haskell` version of the compiler and the splice.
--- (C3) Desugaring bracket syntax: Bracket syntax is desugared by referring to a
--- special wired-in package whose package id is `template-haskell`. So for
--- instance an expression `'Just` gets desugared to something of type
--- `template-haskell:Language.Haskell.TH.Syntax.Name`. Importantly, while this
--- identifier is wired-in, the identity of the `template-haskell` package is
--- not. So for instance we can successfully use an expression like
--- `'Just :: Name` while compiling the `template-haskell` package as long as its
--- package id is set to `template-haskell` as `Name` will resolve the the local
--- identifier in the package (and the LHS and RHS will align). On the other
--- hand, if we don't set the special package id, the type of the expression will
--- be `template-haskell:...Name` while the `Name` on the RHS will resolve to the
--- local identifier and we will get a type error. So, bracket syntax assumes the
--- presence of a particular API in the `template-haskell` package, but it allows
---
--- These constraints are ranked from strongest to weakest. They only apply if we
--- want to support the particular feature associated with them.
+-- The splice is run during renaming. This consists of the following steps.
+-- 1. The quote is desugared becoming something equivalent to
+-- `varE (mkNameS "unwords") `appE` listE [litE (stringE "hello"), litE (stringE "world")]`
+-- Notice we are introducing references to identifiers from the
+-- `template-haskell` library.
+-- This desguaring is implemented in `GHC.HsToCore.Quote.dsBracket`.
+-- 2. The core expression of the quote is turned into bytecode.
+-- 3. The bytecode is run in the interpreter. If we are using the internal
+-- interpreter this is the `ghc` executable. If we are using the external
+-- interpreter this is the `ghc-iserv` executable.
+-- 4. Optionally the interpreter must load object code for packages mentioned in
+-- the bytecode. In this instance that is just `template-haskell`.
+-- 5. The bytecode is run.
+-- 6. The result is a TH AST, which we convert back to a GHC AST using the
+-- functions from `GHC.ThToHs`.
 --
--- The tricky case is what do to when building the bootstrapping (stage1) GHC.
--- The stage2 GHC is simpler as it can use the in-tree `template-haskell`
--- package built by the stage1 GHC.
+-- At several points in the process different features required depending on the
+-- `template-haskell` library. These constraints conditionally apply to any version of the
+-- compiler that enables the relevant feature.
 --
--- We should note that we cannot feasibly use the internal interpreter with a
--- stage1 GHC. This is because the stage1 GHC was compiled with the stage0 GHC,
--- which we assume is a different version. In order to run a splice that too
--- would need to be compiled with the stage0 GHC, and so would all its
--- dependencies.
--- This allows us to disregard (C1) for the stage1 case.
+-- (C1) running a splice using the internal interpreter: if we run a splice by
+-- loading the bytecode into the internal interpreter, then our compiler must be
+-- linked against the same version of the `template-haskell` library as the
+-- splice. For the stage2 compiler, this means a user splice (stage3) is linked
+-- against the in-tree `template-haskell` library (stage1), which is the same
+-- as what the stage2 compiler is linked against.
+-- This constraint is introduced by step 4 and step 6. Usages of `reify`
+-- functions from `Quasi` would also entail this constraint. If there were a
+-- discrepancy between these two versions of `template-haskell`, for instance if
+-- the splice's AST laccked the `StringE` constructor, then we would run into a
+-- runtime or linker error. This constraint doesn't apply to the stage1 compiler
+-- because we currently don't support using the internal interpreter at that
+-- stage at all.
 --
--- In the past, we used to build the stage1 GHC and all its dependencies against
--- the in-tree `template-haskell` library. This meant that we sacrificed (C2)
--- because they are likely not serialisation compatible. We could not sacrifice
--- (C3) because dependencies of GHC (such as `containers` and
--- `template-haskell`) used bracket syntax to define `Lift` instances. This
--- meant that the interface assumed by the boot compiler to implement bracket
--- desugaring could not be modified (not even through CPP as (C1) would
--- constrain us in future stages where we do support the internal interpreter).
--- Yet, bracket syntax did work and gave us splices that desugared to code that
--- referenced the in-tree version of `template-haskell` not the one the boot
--- compiler required. So they could never be run.
+-- (C2) running a splice using the external interpreter: if we run a splice by
+-- loading the bytecode into the external interpreter, then our `iserv`
+-- executable must be linked against the same version of the `template-haskell`
+-- library as the splice. The compiler on the other hand only needs to be depend
+-- on a binary serialisation that is compatible with `template-haskell` library.
+-- This is a similar but weaker constraint on the compiler than the previous
+-- one. It is possible to use this with the stage1 compiler by using an `iserv`
+-- executable that is compiled against the stage1 `template-haskell` in-tree
+-- library. If there is a discrepancy in the binary interface then we will run
+-- into a runtime error if we fail to parse a message.
 --
--- Our current strategy is to not build `template-haskell` as a stage0 package.
--- All of GHCs dependencies depend on the boot compilers version, and produce
--- runnable splices. How do we deal with the stage1 compiler's dependency on
--- `template-haskell`? There are two options. (D1) depend on the boot
--- compiler's version for stage1 and then depend on the in-tree one in stage2.
--- This violates (C1) and (C2), so we wouldn't be able to run splices at all
--- with the stage1 compiler. Additionally this would introduce quite a bit of
--- CPP into the compiler and mean we would have to stub out much of the
--- template-haskell machinery or have an unrunable compatibilty shim. Or (D2)
--- depend on the in-tree version.
+-- (C3) desugaring quote syntax: the desugaring of quote syntax is hard coded
+-- into each version of the compiler. `GHC.Builtin.Names.TH` declares many names
+-- as known-key, inducing a coupling relationship quite to that of `ghc-internal`.
+-- This is so that quotes can be desugared to calls to TH AST (smart)
+-- constructors and uses of the Lift type class.
+-- Furthermore, name quotes such as `'Just` are resolved relative to the module
+-- under compilation and then desugar to a
+-- `template-haskell:Language.Haskell.TH.Syntax.Name`, where the exact location
+-- is determined by `GHC.Builtin.Names.TH`. This location must match the
+-- `template-haskell` library against which the quote is linked.
+-- This implements a sort of late binding, where identifiers will be mapped to
+-- whatever package called `template-haskell` is in scope.
+-- The late binding makes it possible to use quotes while compiling the
+-- `template-haskell` library itself as identifiers will correctly be mapped to
+-- the current package, rather than the one from the previous stage.
+-- If we attempt to use quotes, but don't have a `template-haskell` package in
+-- the package db with the appropriate interface we will run into type errors.
+-- For instance if it lacked the `varE` function or had a function with the
+-- wrong arity, then we would encounter an error when a module using the quote
+-- is compiled.
 --
--- (D2) is what we implement, but it is complicated by the fact that it means we
--- practically have two versions of `template-haskell` in the dependency graph
--- of the stage1 compiler. To avoid this, we recall that we only need
--- serliasation compatibility (as per (C2)), so we can avoid a direct dependency
--- on the in-tree version by vendoring it. We choose to vendor it into the
--- `ghc-boot` package as both `lib:ghc` and `ghci` require a dependency on the
--- `template-haskell` interface as they define the two ends of the protocol.
--- This allows us to still run splices through the external interpreter.
+-- These 3 constraints inform our implementation strategy when building the
+-- stage1 compiler. We are making use of the stage0 compiler, so if we want to
+-- keep the possibility of using Template Haskell in the bootstrapping process
+-- open (#22069), then our package DB must contain the `template-haskell`
+-- library that our stage0 compiler is linked against (C1-3).
+-- Yet, for the stage1 compiler, (C1) doesn't apply as we cannot use the
+-- internal interpreter. We do not need to link the stage1 compiler against
+-- the exact same `template-haskell` library that user splices compiled using
+-- the stage1 compiler will need to use. Therefore, we vendor the modules from
+-- the in-tree `template-haskell` library into the `ghc-boot` package. As a
+-- common dependency of both, this allows the `ghc` and `ghci` libraries to
+-- depend on the interface and serialisation from the in-tree `template-haskell`
+-- library. This comes with a trade-off: we cannot make use of quotes while
+-- compiling these modules as the identifiers would resolve to reflect the
+-- stage0 compiler's expected structure (C3).
 --
--- We should note a futher edge-case with this approach. When compiling our
--- vendored `template-haskell` library, we run afoul of (C3). The library
--- defines several `Name`s using bracket syntax. As this package doesn't claim
--- to be the wired-in package but it defines its own `Name` type, we get a type
--- discrepancy with the `Name` type from the boot compiler's `template-haskell`
--- library. Most of these are only used to define `Lift` instances, so in the
--- vendored case we simply hide them behind CPP. Yet, there is one distinct use
--- of a `Name`. We have a `Name` for the constructors of the `Multiplicity`
--- type, which are also used in the pretty-printing module. We construct these
--- manulally instead. This allows us to completely avoid using bracket syntax
--- for compiling the vendored `template-haskell` package.
+-- Then the stage1 compiler is used to compile the in-tree `template-haskell`
+-- library, which is linked into the stage2 compiler and distributed alongside
+-- it.
 --
--- To summarise, our current approach allows us to use the external interpreter
--- to run splices and allows bracket syntax to be desugared correctly. In order
--- to implement this we vendor the `template-haskell` library into `ghc-boot`
--- and take special care to not use bracket syntax in those modules as that
--- would incorrectly produce code that uses identifiers from the boot compiler's
--- `template-haskell` library.
+-- Historically, we used to build the in-tree `template-haskell` library in
+-- stage0. This meant that qoutes would resolve to splices that couldn't be run
+-- by the stage0 compiler (C1-2), eg, the in-tree library might contain a new
+-- constructor the stage0 compiler wasn't aware of. It also made it difficult to
+-- refactor the `template-haskell` library as it needed to be conservative with
+-- regard to the identifiers the stage0 compiler was expecting in order to
+-- desugar quotes (C3). See #23536.
 --
--- See #23536.
+-- The current setup allows both stage0 and stage2 to use the full gamut of
+-- Template Haskell functionality, while stage1 can use all but the internal
+-- interpreter to run splices.


=====================================
compiler/ghc.cabal.in
=====================================
@@ -82,6 +82,15 @@ Flag hadrian-stage0
     Default: False
     Manual: True
 
+Flag bootstrap-th
+    Description:
+      Enabled when building the stage1 compiler in order to vendor the in-tree
+      `template-haskell` library, while allowing dependencies to depend on the
+      boot `template-haskell` library.
+      See Note [Bootstrapping Template Haskell]
+    Default: False
+    Manual: True
+
 Library
     Default-Language: GHC2021
     Exposed: False
@@ -124,6 +133,10 @@ Library
                    ghc-heap   == @ProjectVersionMunged@,
                    ghci == @ProjectVersionMunged@
 
+    if !flag(bootstrap-th)
+        Build-Depends:
+            template-haskell    == 2.22.0.0
+
     if os(windows)
         Build-Depends: Win32  >= 2.3 && < 2.15
     else


=====================================
hadrian/src/Settings/Packages.hs
=====================================
@@ -87,6 +87,7 @@ packageArgs = do
             -- We do it through a cabal flag in ghc.cabal
             , stageVersion < makeVersion [9,8,1] ? arg "+hadrian-stage0"
             , flag StaticLibzstd `cabalFlag` "static-libzstd"
+            , stage0 `cabalFlag` "bootstrap-th"
             ]
 
           , builder (Haddock BuildPackage) ? arg ("--optghc=-I" ++ path) ]
@@ -155,9 +156,12 @@ packageArgs = do
           -- compiler comes with the same versions as the one we are building.
           --
             builder (Cabal Setup) ? cabalExtraDirs ffiIncludeDir ffiLibraryDir
-          , builder (Cabal Flags) ? ifM stage0
-              (andM [cross, bootCross] `cabalFlag` "internal-interpreter")
-              (arg "internal-interpreter")
+          , builder (Cabal Flags) ? mconcat
+             [ ifM stage0
+                (andM [cross, bootCross] `cabalFlag` "internal-interpreter")
+                (arg "internal-interpreter")
+             , stage0 `cabalFlag` "bootstrap-th"
+             ]
 
           ]
 


=====================================
libraries/ghc-boot/ghc-boot.cabal.in
=====================================
@@ -115,17 +115,13 @@ Library
       hs-source-dirs: .
       build-depends:
               ghc-boot-th         == @ProjectVersionMunged@
-            , template-haskell  == 2.22.0.0
-      -- reexport modules from ghc-boot-th and template-haskell so that packages
-      -- don't have to import all of ghc-boot, ghc-boot-th and template-haskell.
-      -- It makes the dependency graph easier to understand and to refactor
-      -- and reduces the amount of cabal flags we need to use for bootstrapping TH.
+      -- reexport modules from ghc-boot-th that packages
+      -- don't have to import all of ghc-boot, and ghc-boot-th.
+      -- It makes the dependency graph easier to understand.
       reexported-modules:
               GHC.LanguageExtensions.Type
             , GHC.ForeignSrcLang.Type
             , GHC.Lexeme
-            , Language.Haskell.TH
-            , Language.Haskell.TH.Syntax
     if !os(windows)
         build-depends:
                    unix       >= 2.7 && < 2.9


=====================================
libraries/ghci/ghci.cabal.in
=====================================
@@ -22,6 +22,15 @@ Flag internal-interpreter
     Default: False
     Manual: True
 
+Flag bootstrap-th
+    Description:
+      Enabled when building the stage1 compiler in order to vendor the in-tree
+      `template-haskell` library, while allowing dependencies to depend on the
+      boot `template-haskell` library.
+      See Note [Bootstrapping Template Haskell]
+    Default: False
+    Manual: True
+
 source-repository head
     type:     git
     location: https://gitlab.haskell.org/ghc/ghc.git
@@ -86,5 +95,10 @@ library
         ghc-heap         == @ProjectVersionMunged@,
         transformers     >= 0.5 && < 0.7
 
+    if !flag(bootstrap-th)
+        Build-Depends:
+            template-haskell    == 2.22.0.0
+
+
     if !os(windows)
         Build-Depends: unix >= 2.7 && < 2.9



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1f9392adfba41999c47bc632379f101bef7d9eed...01b8ad82b641ecb1b6859774e342548049a3bd32

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1f9392adfba41999c47bc632379f101bef7d9eed...01b8ad82b641ecb1b6859774e342548049a3bd32
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240410/4df83d52/attachment-0001.html>


More information about the ghc-commits mailing list