[Git][ghc/ghc][wip/fendor/ghc-iface-sharing-avoid-reserialisation] 5 commits: Add deduplication table for `IfaceType`

Hannes Siebenhandl (@fendor) gitlab at gitlab.haskell.org
Mon May 13 10:14:07 UTC 2024



Hannes Siebenhandl pushed to branch wip/fendor/ghc-iface-sharing-avoid-reserialisation at Glasgow Haskell Compiler / GHC


Commits:
be8ab7c1 by Fendor at 2024-05-10T13:44:40+02:00
Add deduplication table for `IfaceType`

The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions when byte code is
embedded into the `.hi` file via `-fwrite-if-simplified-core` or
`-fbyte-code-and-object-code`.
Loading such `.hi` files from disk introduces many duplicates of
memory expensive values in `IfaceType`, such as `IfaceTyCon`,
`IfaceTyConApp`, `IA_Arg` and many more.

We improve the memory behaviour of GHCi by adding an additional
deduplication table for `IfaceType` to the serialisation of `ModIface`,
similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.

To provide some numbers, we evaluated this patch on the agda code base.
We loaded the full library from the `.hi` files, which contained the
embedded core expressions (`-fwrite-if-simplified-core`).

Before this patch:

* Load time: 11.7 s, 2.5 GB maximum residency.

After this patch:

* Load time:  7.3 s, 1.7 GB maximum residency.

This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously.

For example, on agda, we reduce the size of `.hi` files (with
`-fwrite-if-simplified-core`):

* Before: 101 MB on disk
* Now:     24 MB on disk

This has even a beneficial side effect on the cabal store. We reduce the
size of the store on disk:

* Before: 341 MB on disk
* Now:    310 MB on disk

Note, none of the dependencies have been compiled with
`-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple
locations in a `ModIface`.

We also add IfaceType deduplication table to .hie serialisation and
refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.

Bump haddock submodule to accomodate for changes to the deduplication
table layout and binary interface.

- - - - -
060dbe69 by Fendor at 2024-05-10T13:45:29+02:00
Add run-time configurability of `.hi` file compression

Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.

The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels with variable impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.

We introduce three compression levels:

* `1`: `Normal` mode. This is the least amount of compression.
    It deduplicates only `Name` and `FastString`s, and is naturally the
    fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
  marginally slower than `Normal` mode. In general, it should be safe to
  always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
  resulting in minimal .hi files, but at the cost of additional
  compilation time.

Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`, as we write out a byte that
indicates the next value has been deduplicated.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.

Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.

-------------------------
Metric Decrease:
    MultiLayerModulesDefsGhciWithCore
    T16875
    T21839c
    T24471
    hard_hole_fits
    libdir
-------------------------

- - - - -
a2ad64e5 by Matthew Pickering at 2024-05-10T13:45:29+02:00
Introduce regression tests for `.hi` file sizes

Add regression tests to track how `-fwrite-if-compression` levels affect
the size of `.hi` files.

- - - - -
3f5d1595 by Fendor at 2024-05-13T12:12:54+02:00
Improve sharing of duplicated values in `ModIface`

As a `ModIface` contains often duplicated values that are not
necessarily shared, we improve sharing by serialising the `ModIface`
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in
`--make` mode. The peak memory usage is especially reduced when
generating interface files with core expressions
(`-fwrite-if-simplified-core`).

On agda, this reduces the peak memory usage:

* `2.2 GB` to `1.9 GB` for a ghci session.

On `lib:Cabal`, we report:

* `570 MB` to `500 MB` for a ghci session
* `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc

There is a small impact on execution time, around 2% on the agda code
base.

- - - - -
4b6741fc by Fendor at 2024-05-13T12:13:05+02:00
Avoid unneccessarily re-serialising the `ModIface`

To reduce memory usage of `ModIface`, we serialise `ModIface` to an
in-memory byte array, which implicitly shares duplicated values.

This serailised byte array can be reused to avoid work when we actually
write the `ModIface` to disk.
We introduce a new field to `ModIface` which allows us to save the byte
array, and write it to disk if the `ModIface` wasn't changed after the
initial serialisation.

This requires us to change absolute offsets, for example to jump to the
deduplication table for `Name` or `FastString` with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.

This leads to new primitives for `ModIface`, which help to construct
relative offsets.

Bump Haddock submodule, to account for interface file changes.

-------------------------
Metric Increase:
    MultiComponentModules
    MultiLayerModules
    T10421
    T12425
    T13035
    T13701
    T13719
    T14697
    T18730
    T9198
    mhu-perf
-------------------------

These metric increases may look bad, but they are all completely benign,
we simply allocate 1 MB per module for `shareIface`. As this allocation
is quite quick, it has a neglible impact on run-time performance.

- - - - -


30 changed files:

- compiler/GHC.hs
- compiler/GHC/Driver/DynFlags.hs
- compiler/GHC/Driver/Main.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/Iface/Binary.hs
- compiler/GHC/Iface/Ext/Binary.hs
- compiler/GHC/Iface/Ext/Fields.hs
- compiler/GHC/Iface/Load.hs
- compiler/GHC/Iface/Make.hs
- compiler/GHC/Iface/Recomp.hs
- compiler/GHC/Iface/Recomp/Binary.hs
- compiler/GHC/Iface/Rename.hs
- compiler/GHC/Iface/Type.hs
- compiler/GHC/StgToJS/Object.hs
- compiler/GHC/Tc/Errors/Hole.hs
- compiler/GHC/Tc/Gen/Splice.hs
- compiler/GHC/Tc/Utils/Backpack.hs
- compiler/GHC/Unit/Module/ModIface.hs
- compiler/GHC/Utils/Binary.hs
- docs/users_guide/using-optimisation.rst
- + testsuite/tests/iface/IfaceSharingIfaceType.hs
- + testsuite/tests/iface/IfaceSharingName.hs
- + testsuite/tests/iface/Lib.hs
- + testsuite/tests/iface/Makefile
- + testsuite/tests/iface/all.T
- + testsuite/tests/iface/if_faststring.hs
- + testsuite/tests/iface/if_ifacetype.hs
- + testsuite/tests/iface/if_name.hs
- testsuite/tests/plugins/simple-plugin/Simple/RemovePlugin.hs
- utils/haddock


Changes:

=====================================
compiler/GHC.hs
=====================================
@@ -98,7 +98,32 @@ module GHC (
         lookupGlobalName,
         findGlobalAnns,
         mkNamePprCtxForModule,
-        ModIface, ModIface_(..),
+        ModIface,
+        mi_module,
+        mi_sig_of,
+        mi_hsc_src,
+        mi_src_hash,
+        mi_hi_bytes,
+        mi_deps,
+        mi_usages,
+        mi_exports,
+        mi_used_th,
+        mi_fixities,
+        mi_warns,
+        mi_anns,
+        mi_insts,
+        mi_fam_insts,
+        mi_rules,
+        mi_decls,
+        mi_extra_decls,
+        mi_globals,
+        mi_hpc,
+        mi_trust,
+        mi_trust_pkg,
+        mi_complete_matches,
+        mi_docs,
+        mi_final_exts,
+        mi_ext_fields,
         SafeHaskellMode(..),
 
         -- * Printing


=====================================
compiler/GHC/Driver/DynFlags.hs
=====================================
@@ -207,6 +207,7 @@ data DynFlags = DynFlags {
   dmdUnboxWidth         :: !Int,        -- ^ Whether DmdAnal should optimistically put an
                                         --   Unboxed demand on returned products with at most
                                         --   this number of fields
+  ifCompression         :: Int,
   specConstrThreshold   :: Maybe Int,   -- ^ Threshold for SpecConstr
   specConstrCount       :: Maybe Int,   -- ^ Max number of specialisations for any one function
   specConstrRecursive   :: Int,         -- ^ Max number of specialisations for recursive types
@@ -546,6 +547,7 @@ defaultDynFlags mySettings =
         maxPmCheckModels        = 30,
         simplTickFactor         = 100,
         dmdUnboxWidth           = 3,      -- Default: Assume an unboxed demand on function bodies returning a triple
+        ifCompression           = 2,      -- Default: Apply safe compressions
         specConstrThreshold     = Just 2000,
         specConstrCount         = Just 3,
         specConstrRecursive     = 3,


=====================================
compiler/GHC/Driver/Main.hs
=====================================
@@ -163,7 +163,7 @@ import GHC.JS.Syntax
 
 import GHC.IfaceToCore  ( typecheckIface, typecheckWholeCoreBindings )
 
-import GHC.Iface.Load   ( ifaceStats, writeIface )
+import GHC.Iface.Load   ( ifaceStats, writeIface, flagsToIfCompression )
 import GHC.Iface.Make
 import GHC.Iface.Recomp
 import GHC.Iface.Tidy
@@ -612,7 +612,7 @@ extract_renamed_stuff mod_summary tc_result = do
         -- enables the option which keeps the renamed source.
         hieFile <- mkHieFile mod_summary tc_result (fromJust rn_info)
         let out_file = ml_hie_file $ ms_location mod_summary
-        liftIO $ writeHieFile out_file hieFile
+        liftIO $ writeHieFile (flagsToIfCompression dflags) out_file hieFile
         liftIO $ putDumpFileMaybe logger Opt_D_dump_hie "HIE AST" FormatHaskell (ppr $ hie_asts hieFile)
 
         -- Validate HIE files
@@ -966,10 +966,11 @@ loadByteCode iface mod_sum = do
 --------------------------------------------------------------
 
 
+
 -- Knot tying!  See Note [Knot-tying typecheckIface]
 -- See Note [ModDetails and --make mode]
 initModDetails :: HscEnv -> ModIface -> IO ModDetails
-initModDetails hsc_env iface =
+initModDetails hsc_env iface = do
   fixIO $ \details' -> do
     let act hpt  = addToHpt hpt (moduleName $ mi_module iface)
                                 (HomeModInfo iface details' emptyHomeModInfoLinkable)
@@ -1207,7 +1208,7 @@ hscMaybeWriteIface logger dflags is_simple iface old_iface mod_location = do
           withTiming logger
               (text "WriteIface"<+>brackets (text iface_name))
               (const ())
-              (writeIface logger profile iface_name iface)
+              (writeIface logger profile (flagsToIfCompression dflags) iface_name iface)
 
     if (write_interface || force_write_interface) then do
 


=====================================
compiler/GHC/Driver/Session.hs
=====================================
@@ -1695,6 +1695,9 @@ dynamic_flags_deps = [
   , make_ord_flag defFlag "fno-refinement-level-hole-fits"
       (noArg (\d -> d { refLevelHoleFits = Nothing }))
 
+  , make_ord_flag defFlag "fwrite-if-compression"
+      (intSuffix (\n d -> d { ifCompression = n }))
+
   , make_dep_flag defGhcFlag "fllvm-pass-vectors-in-regs"
             (noArg id)
             "vectors registers are now passed in registers by default."


=====================================
compiler/GHC/Iface/Binary.hs
=====================================
@@ -14,9 +14,12 @@ module GHC.Iface.Binary (
         writeBinIface,
         readBinIface,
         readBinIfaceHeader,
+        CompressionIFace(..),
         getSymtabName,
         CheckHiWay(..),
         TraceBinIFace(..),
+        getIfaceWithExtFields,
+        putIfaceWithExtFields,
         getWithUserData,
         putWithUserData,
 
@@ -25,6 +28,8 @@ module GHC.Iface.Binary (
         putName,
         putSymbolTable,
         BinSymbolTable(..),
+        initWriteIfaceType, initReadIfaceTypeTable,
+        putAllTables,
     ) where
 
 import GHC.Prelude
@@ -46,14 +51,21 @@ import GHC.Types.SrcLoc
 import GHC.Platform
 import GHC.Settings.Constants
 import GHC.Utils.Fingerprint
+import GHC.Iface.Type (IfaceType(..), getIfaceType, putIfaceType, ifaceTypeSharedByte)
 
+import Control.Monad
 import Data.Array
 import Data.Array.IO
 import Data.Array.Unsafe
 import Data.Char
-import Data.Word
 import Data.IORef
-import Control.Monad
+import Data.Map.Strict (Map)
+import Data.Word
+import System.IO.Unsafe
+import Data.Typeable (Typeable)
+import qualified GHC.Data.Strict as Strict
+import Data.Function ((&))
+
 
 -- ---------------------------------------------------------------------------
 -- Reading and writing binary interface files
@@ -66,6 +78,21 @@ data TraceBinIFace
    = TraceBinIFace (SDoc -> IO ())
    | QuietBinIFace
 
+data CompressionIFace
+  = NormalCompression
+  -- ^ Perform the normal compression operations,
+  -- such as deduplicating 'Name's and 'FastString's
+  | SafeExtraCompression
+  -- ^ Perform some extra compression steps that have minimal impact
+  -- on the run-time of 'ghc'.
+  --
+  -- This reduces the size of '.hi' files significantly in some cases
+  -- and reduces overall memory usage in certain scenarios.
+  | MaximalCompression
+  -- ^ Try to compress as much as possible.
+  --
+  -- Yields the smallest '.hi' files but at the cost of additional run-time.
+
 -- | Read an interface file header, checking the magic number, version, and
 -- way. Returns the hash of the source file and a BinHandle which points at the
 -- start of the rest of the interface file data.
@@ -133,17 +160,27 @@ readBinIface
 readBinIface profile name_cache checkHiWay traceBinIface hi_path = do
     (src_hash, bh) <- readBinIfaceHeader profile name_cache checkHiWay traceBinIface hi_path
 
-    extFields_p <- get bh
+    mod_iface <- getIfaceWithExtFields name_cache bh
 
-    mod_iface <- getWithUserData name_cache bh
+    return $ mod_iface
+      & addSourceFingerprint src_hash
 
-    seekBinReader bh extFields_p
-    extFields <- get bh
 
-    return mod_iface
-      { mi_ext_fields = extFields
-      , mi_src_hash = src_hash
-      }
+getIfaceWithExtFields :: NameCache -> ReadBinHandle -> IO ModIface
+getIfaceWithExtFields name_cache bh = do
+  start <- tellBinReader bh
+  extFields_p_rel <- getRelBin bh
+
+  mod_iface <- getWithUserData name_cache bh
+
+  seekBinReader bh start
+  seekBinReaderRel bh extFields_p_rel
+  extFields <- get bh
+  modIfaceData <- freezeBinHandle2 bh start
+  pure $ mod_iface
+    & set_mi_ext_fields extFields
+    & set_mi_hi_bytes (FullIfaceBinHandle $ Strict.Just modIfaceData)
+
 
 -- | This performs a get action after reading the dictionary and symbol
 -- table. It is necessary to run this before trying to deserialise any
@@ -158,30 +195,43 @@ getWithUserData name_cache bh = do
 -- Reading names has the side effect of adding them into the given NameCache.
 getTables :: NameCache -> ReadBinHandle -> IO ReadBinHandle
 getTables name_cache bh = do
+    bhRef <- newIORef (error "used too soon")
+    -- It is important this is passed to 'getTable'
+    -- See Note [Lazy ReaderUserData during IfaceType serialisation]
+    ud <- unsafeInterleaveIO (readIORef bhRef)
+
     fsReaderTable <- initFastStringReaderTable
     nameReaderTable <- initNameReaderTable name_cache
-
-
-    -- The order of these deserialisation matters!
-    --
-    -- See Note [Order of deduplication tables during iface binary serialisation] for details.
-    fsTable <- Binary.forwardGet bh (getTable fsReaderTable bh)
-    let
-      fsReader = mkReaderFromTable fsReaderTable fsTable
-      bhFs = addReaderToUserData (mkSomeBinaryReader fsReader) bh
-
-    nameTable <- Binary.forwardGet bh (getTable nameReaderTable bhFs)
-    let
-      nameReader = mkReaderFromTable nameReaderTable nameTable
-      bhName = addReaderToUserData (mkSomeBinaryReader nameReader) bhFs
-
-    pure bhName
+    ifaceTypeReaderTable <- initReadIfaceTypeTable ud
+
+    let -- For any 'ReaderTable', we decode the table that is found at the location
+        -- the forward reference points to.
+        -- After decoding the table, we create a 'BinaryReader' and immediately
+        -- add it to the 'ReaderUserData' of 'ReadBinHandle'.
+        decodeReaderTable :: Typeable a => ReaderTable a -> ReadBinHandle -> IO ReadBinHandle
+        decodeReaderTable tbl bh0 = do
+          table <- Binary.forwardGetRel bh (getTable tbl bh0)
+          let binaryReader = mkReaderFromTable tbl table
+          pure $ addReaderToUserData binaryReader bh0
+
+    -- Decode all the tables and populate the 'ReaderUserData'.
+    bhFinal <- foldM (\bh0 act -> act bh0) bh
+      -- The order of these deserialisation matters!
+      --
+      -- See Note [Order of deduplication tables during iface binary serialisation] for details.
+      [ decodeReaderTable fsReaderTable
+      , decodeReaderTable nameReaderTable
+      , decodeReaderTable ifaceTypeReaderTable
+      ]
+
+    writeIORef bhRef (getReaderUserData bhFinal)
+    pure bhFinal
 
 -- | Write an interface file.
 --
 -- See Note [Deduplication during iface binary serialisation] for details.
-writeBinIface :: Profile -> TraceBinIFace -> FilePath -> ModIface -> IO ()
-writeBinIface profile traceBinIface hi_path mod_iface = do
+writeBinIface :: Profile -> TraceBinIFace -> CompressionIFace -> FilePath -> ModIface -> IO ()
+writeBinIface profile traceBinIface compressionLevel hi_path mod_iface = do
     bh <- openBinMem initBinMemSize
     let platform = profilePlatform profile
     put_ bh (binaryInterfaceMagic platform)
@@ -192,26 +242,31 @@ writeBinIface profile traceBinIface hi_path mod_iface = do
     put_  bh tag
     put_  bh (mi_src_hash mod_iface)
 
-    extFields_p_p <- tellBinWriter bh
-    put_ bh extFields_p_p
-
-    putWithUserData traceBinIface bh mod_iface
-
-    extFields_p <- tellBinWriter bh
-    putAt bh extFields_p_p extFields_p
-    seekBinWriter bh extFields_p
-    put_ bh (mi_ext_fields mod_iface)
+    putIfaceWithExtFields traceBinIface compressionLevel bh mod_iface
 
     -- And send the result to the file
     writeBinMem bh hi_path
 
+-- | Puts the 'ModIface' to the 'WriteBinHandle'.
+--
+-- This avoids serialisation of the 'ModIface' if the fields 'mi_hi_bytes' contains a
+-- 'Just' value. This fields can only be populated by reading the 'ModIface' using
+-- 'getIfaceWithExtFields' and not modifying it in any way afterwards.
+putIfaceWithExtFields :: TraceBinIFace -> CompressionIFace -> WriteBinHandle -> ModIface -> IO ()
+putIfaceWithExtFields traceBinIface compressionLevel bh mod_iface =
+  case mi_hi_bytes mod_iface of
+    FullIfaceBinHandle Strict.Nothing -> do
+      forwardPutRel_ bh (\_ -> put_ bh (mi_ext_fields mod_iface)) $ do
+        putWithUserData traceBinIface compressionLevel bh mod_iface
+    FullIfaceBinHandle (Strict.Just binData) -> putFullBinData bh binData
+
 -- | Put a piece of data with an initialised `UserData` field. This
 -- is necessary if you want to serialise Names or FastStrings.
 -- It also writes a symbol table and the dictionary.
 -- This segment should be read using `getWithUserData`.
-putWithUserData :: Binary a => TraceBinIFace -> WriteBinHandle -> a -> IO ()
-putWithUserData traceBinIface bh payload = do
-  (name_count, fs_count, _b) <- putWithTables bh (\bh' -> put bh' payload)
+putWithUserData :: Binary a => TraceBinIFace -> CompressionIFace -> WriteBinHandle -> a -> IO ()
+putWithUserData traceBinIface compressionLevel bh payload = do
+  (name_count, fs_count, _b) <- putWithTables compressionLevel bh (\bh' -> put bh' payload)
 
   case traceBinIface of
     QuietBinIFace         -> return ()
@@ -234,11 +289,12 @@ putWithUserData traceBinIface bh payload = do
 -- It returns (number of names, number of FastStrings, payload write result)
 --
 -- See Note [Order of deduplication tables during iface binary serialisation]
-putWithTables :: WriteBinHandle -> (WriteBinHandle -> IO b) -> IO (Int, Int, b)
-putWithTables bh' put_payload = do
+putWithTables :: CompressionIFace -> WriteBinHandle -> (WriteBinHandle -> IO b) -> IO (Int, Int, b)
+putWithTables compressionLevel bh' put_payload = do
   -- Initialise deduplicating tables.
   (fast_wt, fsWriter) <- initFastStringWriterTable
   (name_wt, nameWriter) <- initNameWriterTable
+  (ifaceType_wt, ifaceTypeWriter) <- initWriteIfaceType compressionLevel
 
   -- Initialise the 'WriterUserData'.
   let writerUserData = mkWriterUserData
@@ -250,6 +306,7 @@ putWithTables bh' put_payload = do
         --
         -- See Note [Binary UserData]
         , mkSomeBinaryWriter @BindingName  $ mkWriter (\bh name -> putEntry nameWriter bh (getBindingName name))
+        , mkSomeBinaryWriter @IfaceType ifaceTypeWriter
         ]
   let bh = setWriterUserData bh' writerUserData
 
@@ -257,18 +314,24 @@ putWithTables bh' put_payload = do
     -- The order of these entries matters!
     --
     -- See Note [Order of deduplication tables during iface binary serialisation] for details.
-    putAllTables bh [fast_wt, name_wt] $ do
+    putAllTables bh [fast_wt, name_wt, ifaceType_wt] $ do
       put_payload bh
 
   return (name_count, fs_count, r)
- where
-  putAllTables _ [] act = do
-    a <- act
-    pure ([], a)
-  putAllTables bh (x : xs) act = do
-    (r, (res, a)) <- forwardPut bh (const $ putTable x bh) $ do
-      putAllTables bh xs act
-    pure (r : res, a)
+
+-- | Write all deduplication tables to disk after serialising the
+-- main payload.
+--
+-- Writes forward pointers to the deduplication tables before writing the payload
+-- to allow deserialisation *before* the payload is read again.
+putAllTables :: WriteBinHandle -> [WriterTable] -> IO b -> IO ([Int], b)
+putAllTables _ [] act = do
+  a <- act
+  pure ([], a)
+putAllTables bh (x : xs) act = do
+  (r, (res, a)) <- forwardPutRel bh (const $ putTable x bh) $ do
+    putAllTables bh xs act
+  pure (r : res, a)
 
 -- | Initial ram buffer to allocate for writing interface files
 initBinMemSize :: Int
@@ -418,7 +481,7 @@ to the table we need to deserialise first.
 What deduplication tables exist and the order of serialisation is currently statically specified
 in 'putWithTables'. 'putWithTables' also takes care of the serialisation of used deduplication tables.
 The deserialisation of the deduplication tables happens 'getTables', using 'Binary' utility
-functions such as 'forwardGet'.
+functions such as 'forwardGetRel'.
 
 Here, a visualisation of the table structure we currently have (ignoring 'ExtensibleFields'):
 
@@ -445,11 +508,86 @@ Here, a visualisation of the table structure we currently have (ignoring 'Extens
 
 -}
 
+{-
+Note [Lazy ReaderUserData during IfaceType serialisation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Serialising recursive data types, such as 'IfaceType', requires some trickery
+to inject the deduplication table at the right moment.
+
+When we serialise a value of 'IfaceType', we might encounter new 'IfaceType' values.
+For example, 'IfaceAppTy' has an 'IfaceType' field, which we want to deduplicate as well.
+Thus, when we serialise an 'IfaceType', we might add new 'IfaceType's to the 'GenericSymbolTable'
+(i.e., the deduplication table). These 'IfaceType's are then subsequently also serialised to disk,
+and uncover new 'IfaceType' values, etc...
+In other words, when we serialise an 'IfaceType' we write it out using a post-order traversal.
+See 'putGenericSymbolTable' for the implementation.
+
+Now, when we deserialise the deduplication table, reading the first element of the deduplication table
+will fail, as deserialisation requires that we read the child elements first. Remember, we wrote them to disk
+using a post-order traversal.
+To make this work, we therefore use 'lazyGet'' to lazily read the parent 'IfaceType', but delay the actual
+deserialisation. We just assume that once you need to force a value, the deduplication table for 'IfaceType'
+will be available.
+
+That's where 'bhRef' comes into play:
+
+@
+    bhRef <- newIORef (error "used too soon")
+    ud <- unsafeInterleaveIO (readIORef bhRef)
+    ...
+    ifaceTypeReaderTable <- initReadIfaceTypeTable ud
+    ...
+    writeIORef bhRef (getReaderUserData bhFinal)
+@
+
+'ud' is the 'ReaderUserData' that will eventually contain the deduplication table for 'IfaceType'.
+As deserialisation of the 'IfaceType' needs the deduplication table, we provide a
+promise that it will exist in the future (represented by @unsafeInterleaveIO (readIORef bhRef)@).
+We pass 'ud' to 'initReadIfaceTypeTable', so the deserialisation will use the promised deduplication table.
+
+Once we have "read" the deduplication table, it will be available in 'bhFinal', and we fulfill the promise
+that the deduplication table for 'IfaceType' exists when forced.
+-}
 
 -- -----------------------------------------------------------------------------
 -- The symbol table
 --
 
+initReadIfaceTypeTable :: ReaderUserData -> IO (ReaderTable IfaceType)
+initReadIfaceTypeTable ud = do
+  pure $
+    ReaderTable
+      { getTable = getGenericSymbolTable (\bh -> lazyGet' getIfaceType (setReaderUserData bh ud))
+      , mkReaderFromTable = \tbl -> mkReader (getGenericSymtab tbl)
+      }
+
+initWriteIfaceType :: CompressionIFace -> IO (WriterTable, BinaryWriter IfaceType)
+initWriteIfaceType compressionLevel = do
+  sym_tab <- initGenericSymbolTable @(Map IfaceType)
+  pure
+    ( WriterTable
+        { putTable = putGenericSymbolTable sym_tab (lazyPut' putIfaceType)
+        }
+    , mkWriter $ ifaceWriter sym_tab
+    )
+  where
+    ifaceWriter sym_tab = case compressionLevel of
+      NormalCompression -> literalIfaceTypeSerialiser
+      SafeExtraCompression -> ifaceTyConAppSerialiser sym_tab
+      MaximalCompression -> fullIfaceTypeSerialiser sym_tab
+
+    ifaceTyConAppSerialiser sym_tab bh ty = case ty of
+      IfaceTyConApp {} -> do
+        put_ bh ifaceTypeSharedByte
+        putGenericSymTab sym_tab bh ty
+      _ -> putIfaceType bh ty
+
+    fullIfaceTypeSerialiser sym_tab bh ty = do
+      put_ bh ifaceTypeSharedByte
+      putGenericSymTab sym_tab bh ty
+
+    literalIfaceTypeSerialiser = putIfaceType
+
 
 initNameReaderTable :: NameCache -> IO (ReaderTable Name)
 initNameReaderTable cache = do


=====================================
compiler/GHC/Iface/Ext/Binary.hs
=====================================
@@ -38,22 +38,21 @@ import Data.ByteString            ( ByteString )
 import qualified Data.ByteString  as BS
 import qualified Data.ByteString.Char8 as BSC
 import Data.Word                  ( Word8, Word32 )
-import Control.Monad              ( replicateM, when, forM_ )
+import Control.Monad              ( replicateM, when, forM_, foldM )
 import System.Directory           ( createDirectoryIfMissing )
 import System.FilePath            ( takeDirectory )
 
 import GHC.Iface.Ext.Types
+import GHC.Iface.Binary (initWriteIfaceType, putAllTables, initReadIfaceTypeTable, CompressionIFace)
+import GHC.Iface.Type (IfaceType)
+import System.IO.Unsafe (unsafeInterleaveIO)
+import qualified GHC.Utils.Binary as Binary
 
 data HieSymbolTable = HieSymbolTable
   { hie_symtab_next :: !FastMutInt
   , hie_symtab_map  :: !(IORef (UniqFM Name (Int, HieName)))
   }
 
-data HieDictionary = HieDictionary
-  { hie_dict_next :: !FastMutInt -- The next index to use
-  , hie_dict_map  :: !(IORef (UniqFM FastString (Int,FastString))) -- indexed by FastString
-  }
-
 initBinMemSize :: Int
 initBinMemSize = 1024*1024
 
@@ -73,9 +72,12 @@ putBinLine bh xs = do
   putByte bh 10 -- newline char
 
 -- | Write a `HieFile` to the given `FilePath`, with a proper header and
--- symbol tables for `Name`s and `FastString`s
-writeHieFile :: FilePath -> HieFile -> IO ()
-writeHieFile hie_file_path hiefile = do
+-- symbol tables for `Name`s, `FastString`s and `IfaceType`.
+--
+-- The compression level specified via `CompressionIFace` is currently a no-op and
+-- has no effect on the compression of the `HieFile`.
+writeHieFile :: CompressionIFace -> FilePath -> HieFile -> IO ()
+writeHieFile compression hie_file_path hiefile = do
   bh0 <- openBinMem initBinMemSize
 
   -- Write the header: hieHeader followed by the
@@ -84,58 +86,58 @@ writeHieFile hie_file_path hiefile = do
   putBinLine bh0 $ BSC.pack $ show hieVersion
   putBinLine bh0 $ ghcVersion
 
-  -- remember where the dictionary pointer will go
-  dict_p_p <- tellBinWriter bh0
-  put_ bh0 dict_p_p
+  (fs_tbl, fs_w) <- initFastStringWriterTable
+  (name_tbl, name_w) <- initWriteNameTable
+  (iface_tbl, iface_w) <- initWriteIfaceType compression
 
-  -- remember where the symbol table pointer will go
-  symtab_p_p <- tellBinWriter bh0
-  put_ bh0 symtab_p_p
+  let bh = setWriterUserData bh0 $ mkWriterUserData
+        [ mkSomeBinaryWriter @IfaceType iface_w
+        , mkSomeBinaryWriter @Name name_w
+        , mkSomeBinaryWriter @BindingName (simpleBindingNameWriter name_w)
+        , mkSomeBinaryWriter @FastString fs_w
+        ]
 
-  -- Make some initial state
-  symtab_next <- newFastMutInt 0
-  symtab_map <- newIORef emptyUFM :: IO (IORef (UniqFM Name (Int, HieName)))
-  let hie_symtab = HieSymbolTable {
-                      hie_symtab_next = symtab_next,
-                      hie_symtab_map  = symtab_map }
-  dict_next_ref <- newFastMutInt 0
-  dict_map_ref <- newIORef emptyUFM
-  let hie_dict = HieDictionary {
-                      hie_dict_next = dict_next_ref,
-                      hie_dict_map  = dict_map_ref }
-
-  -- put the main thing
-  let bh = setWriterUserData bh0
-          $ newWriteState (putName hie_symtab)
-                          (putName hie_symtab)
-                          (putFastString hie_dict)
-  put_ bh hiefile
-
-  -- write the symtab pointer at the front of the file
-  symtab_p <- tellBinWriter bh
-  putAt bh symtab_p_p symtab_p
-  seekBinWriter bh symtab_p
-
-  -- write the symbol table itself
-  symtab_next' <- readFastMutInt symtab_next
-  symtab_map'  <- readIORef symtab_map
-  putSymbolTable bh symtab_next' symtab_map'
-
-  -- write the dictionary pointer at the front of the file
-  dict_p <- tellBinWriter bh
-  putAt bh dict_p_p dict_p
-  seekBinWriter bh dict_p
-
-  -- write the dictionary itself
-  dict_next <- readFastMutInt dict_next_ref
-  dict_map  <- readIORef dict_map_ref
-  putDictionary bh dict_next dict_map
+  -- Discard number of written elements
+  -- Order matters! See Note [Order of deduplication tables during iface binary serialisation]
+  _ <- putAllTables bh [fs_tbl, name_tbl, iface_tbl] $ do
+    put_ bh hiefile
 
   -- and send the result to the file
   createDirectoryIfMissing True (takeDirectory hie_file_path)
   writeBinMem bh hie_file_path
   return ()
 
+initWriteNameTable :: IO (WriterTable, BinaryWriter Name)
+initWriteNameTable = do
+  symtab_next <- newFastMutInt 0
+  symtab_map <- newIORef emptyUFM
+  let bin_symtab =
+        HieSymbolTable
+          { hie_symtab_next = symtab_next
+          , hie_symtab_map = symtab_map
+          }
+
+  let put_symtab bh = do
+        name_count <- readFastMutInt symtab_next
+        symtab_map <- readIORef symtab_map
+        putSymbolTable bh name_count symtab_map
+        pure name_count
+
+  return
+    ( WriterTable
+        { putTable = put_symtab
+        }
+    , mkWriter $ putName bin_symtab
+    )
+
+initReadNameTable :: NameCache -> IO (ReaderTable Name)
+initReadNameTable cache = do
+  return $
+    ReaderTable
+      { getTable = \bh -> getSymbolTable bh cache
+      , mkReaderFromTable = \tbl -> mkReader (getSymTabName tbl)
+      }
+
 data HieFileResult
   = HieFileResult
   { hie_file_result_version :: Integer
@@ -216,50 +218,37 @@ readHieFileHeader file bh0 = do
 
 readHieFileContents :: ReadBinHandle -> NameCache -> IO HieFile
 readHieFileContents bh0 name_cache = do
-  dict <- get_dictionary bh0
-  -- read the symbol table so we are capable of reading the actual data
-  bh1 <- do
-      let bh1 = setReaderUserData bh0
-              $ newReadState (error "getSymtabName")
-                             (getDictFastString dict)
-      symtab <- get_symbol_table bh1
-      let bh1' = setReaderUserData bh1
-               $ newReadState (getSymTabName symtab)
-                              (getDictFastString dict)
-      return bh1'
+  bhRef <- newIORef (error "used too soon")
+  -- It is important this is passed to 'getTable'
+  -- See Note [Lazy ReaderUserData during IfaceType serialisation]
+  ud <- unsafeInterleaveIO (readIORef bhRef)
+
+  fsReaderTable <- initFastStringReaderTable
+  nameReaderTable <- initReadNameTable name_cache
+  ifaceTypeReaderTable <- initReadIfaceTypeTable ud
 
+  -- read the symbol table so we are capable of reading the actual data
+  bh1 <-
+    foldM (\bh tblReader -> tblReader bh) bh0
+      -- The order of these deserialisation matters!
+      --
+      -- See Note [Order of deduplication tables during iface binary serialisation] for details.
+      [ get_dictionary fsReaderTable
+      , get_dictionary nameReaderTable
+      , get_dictionary ifaceTypeReaderTable
+      ]
+
+  writeIORef bhRef (getReaderUserData bh1)
   -- load the actual data
   get bh1
   where
-    get_dictionary bin_handle = do
-      dict_p <- get bin_handle
-      data_p <- tellBinReader bin_handle
-      seekBinReader bin_handle dict_p
-      dict <- getDictionary bin_handle
-      seekBinReader bin_handle data_p
-      return dict
-
-    get_symbol_table bh1 = do
-      symtab_p <- get bh1
-      data_p'  <- tellBinReader bh1
-      seekBinReader bh1 symtab_p
-      symtab <- getSymbolTable bh1 name_cache
-      seekBinReader bh1 data_p'
-      return symtab
-
-putFastString :: HieDictionary -> WriteBinHandle -> FastString -> IO ()
-putFastString HieDictionary { hie_dict_next = j_r,
-                              hie_dict_map  = out_r}  bh f
-  = do
-    out <- readIORef out_r
-    let !unique = getUnique f
-    case lookupUFM_Directly out unique of
-        Just (j, _)  -> put_ bh (fromIntegral j :: Word32)
-        Nothing -> do
-           j <- readFastMutInt j_r
-           put_ bh (fromIntegral j :: Word32)
-           writeFastMutInt j_r (j + 1)
-           writeIORef out_r $! addToUFM_Directly out unique (j, f)
+    get_dictionary tbl bin_handle = do
+      fsTable <- Binary.forwardGetRel bin_handle (getTable tbl bin_handle)
+      let
+        fsReader = mkReaderFromTable tbl fsTable
+        bhFs = addReaderToUserData fsReader bin_handle
+      pure bhFs
+
 
 putSymbolTable :: WriteBinHandle -> Int -> UniqFM Name (Int,HieName) -> IO ()
 putSymbolTable bh next_off symtab = do


=====================================
compiler/GHC/Iface/Ext/Fields.hs
=====================================
@@ -41,7 +41,7 @@ instance Binary ExtensibleFields where
     -- to point to the start of each payload:
     forM_ header_entries $ \(field_p_p, dat) -> do
       field_p <- tellBinWriter bh
-      putAt bh field_p_p field_p
+      putAtRel bh field_p_p field_p
       seekBinWriter bh field_p
       put_ bh dat
 
@@ -50,11 +50,11 @@ instance Binary ExtensibleFields where
 
     -- Get the names and field pointers:
     header_entries <- replicateM n $
-      (,) <$> get bh <*> get bh
+      (,) <$> get bh <*> getRelBin bh
 
     -- Seek to and get each field's payload:
     fields <- forM header_entries $ \(name, field_p) -> do
-      seekBinReader bh field_p
+      seekBinReaderRel bh field_p
       dat <- get bh
       return (name, dat)
 


=====================================
compiler/GHC/Iface/Load.hs
=====================================
@@ -26,6 +26,7 @@ module GHC.Iface.Load (
         loadInterface,
         loadSysInterface, loadUserInterface, loadPluginInterface,
         findAndReadIface, readIface, writeIface,
+        flagsToIfCompression,
         moduleFreeHolesPrecise,
         needWiredInHomeIface, loadWiredInHomeIface,
 
@@ -116,6 +117,7 @@ import System.FilePath
 import System.Directory
 import GHC.Driver.Env.KnotVars
 import GHC.Iface.Errors.Types
+import Data.Function ((&))
 
 {-
 ************************************************************************
@@ -514,14 +516,12 @@ loadInterface doc_str mod from
         ; new_eps_anns      <- tcIfaceAnnotations (mi_anns iface)
         ; new_eps_complete_matches <- tcIfaceCompleteMatches (mi_complete_matches iface)
 
-        ; let { final_iface = iface {
-                                mi_decls     = panic "No mi_decls in PIT",
-                                mi_insts     = panic "No mi_insts in PIT",
-                                mi_fam_insts = panic "No mi_fam_insts in PIT",
-                                mi_rules     = panic "No mi_rules in PIT",
-                                mi_anns      = panic "No mi_anns in PIT"
-                              }
-               }
+        ; let final_iface = iface
+                               & set_mi_decls     (panic "No mi_decls in PIT")
+                               & set_mi_insts     (panic "No mi_insts in PIT")
+                               & set_mi_fam_insts (panic "No mi_fam_insts in PIT")
+                               & set_mi_rules     (panic "No mi_rules in PIT")
+                               & set_mi_anns      (panic "No mi_anns in PIT")
 
         ; let bad_boot = mi_boot iface == IsBoot
                           && isJust (lookupKnotVars (if_rec_types gbl_env) mod)
@@ -965,11 +965,19 @@ read_file logger name_cache unit_state dflags wanted_mod file_path = do
 
 
 -- | Write interface file
-writeIface :: Logger -> Profile -> FilePath -> ModIface -> IO ()
-writeIface logger profile hi_file_path new_iface
+writeIface :: Logger -> Profile -> CompressionIFace -> FilePath -> ModIface -> IO ()
+writeIface logger profile compression_level hi_file_path new_iface
     = do createDirectoryIfMissing True (takeDirectory hi_file_path)
          let printer = TraceBinIFace (debugTraceMsg logger 3)
-         writeBinIface profile printer hi_file_path new_iface
+         writeBinIface profile printer compression_level hi_file_path new_iface
+
+flagsToIfCompression :: DynFlags -> CompressionIFace
+flagsToIfCompression dflags
+  | n <= 1 = NormalCompression
+  | n == 2 = SafeExtraCompression
+  -- n >= 3
+  | otherwise = MaximalCompression
+  where n = ifCompression dflags
 
 -- | @readIface@ tries just the one file.
 --
@@ -1009,13 +1017,13 @@ readIface dflags name_cache wanted_mod file_path = do
 -- See Note [GHC.Prim] in primops.txt.pp.
 ghcPrimIface :: ModIface
 ghcPrimIface
-  = empty_iface {
-        mi_exports  = ghcPrimExports,
-        mi_decls    = [],
-        mi_fixities = fixities,
-        mi_final_exts = (mi_final_exts empty_iface){ mi_fix_fn = mkIfaceFixCache fixities },
-        mi_docs = Just ghcPrimDeclDocs -- See Note [GHC.Prim Docs]
-        }
+  = empty_iface
+      & set_mi_exports  ghcPrimExports
+      & set_mi_decls    []
+      & set_mi_fixities fixities
+      & set_mi_final_exts ((mi_final_exts empty_iface){ mi_fix_fn = mkIfaceFixCache fixities })
+      & set_mi_docs (Just ghcPrimDeclDocs) -- See Note [GHC.Prim Docs]
+
   where
     empty_iface = emptyFullModIface gHC_PRIM
 
@@ -1099,7 +1107,7 @@ pprModIfaceSimple unit_state iface =
 --
 -- The UnitState is used to pretty-print units
 pprModIface :: UnitState -> ModIface -> SDoc
-pprModIface unit_state iface at ModIface{ mi_final_exts = exts }
+pprModIface unit_state iface
  = vcat [ text "interface"
                 <+> ppr (mi_module iface) <+> pp_hsc_src (mi_hsc_src iface)
                 <+> (if mi_orphan exts then text "[orphan module]" else Outputable.empty)
@@ -1140,6 +1148,7 @@ pprModIface unit_state iface at ModIface{ mi_final_exts = exts }
         , text "extensible fields:" $$ nest 2 (pprExtensibleFields (mi_ext_fields iface))
         ]
   where
+    exts = mi_final_exts iface
     pp_hsc_src HsBootFile = text "[boot]"
     pp_hsc_src HsigFile   = text "[hsig]"
     pp_hsc_src HsSrcFile  = Outputable.empty


=====================================
compiler/GHC/Iface/Make.hs
=====================================
@@ -69,10 +69,13 @@ import GHC.Types.HpcInfo
 import GHC.Types.CompleteMatch
 import GHC.Types.SourceText
 import GHC.Types.SrcLoc ( unLoc )
+import GHC.Types.Name.Cache
 
 import GHC.Utils.Outputable
 import GHC.Utils.Panic
 import GHC.Utils.Logger
+import GHC.Utils.Binary
+import GHC.Iface.Binary
 
 import GHC.Data.FastString
 import GHC.Data.Maybe
@@ -141,14 +144,44 @@ mkFullIface hsc_env partial_iface mb_stg_infos mb_cmm_infos = do
 
     full_iface <-
       {-# SCC "addFingerprints" #-}
-      addFingerprints hsc_env partial_iface{ mi_decls = decls }
+      addFingerprints hsc_env (set_mi_decls decls partial_iface)
 
     -- Debug printing
     let unit_state = hsc_units hsc_env
     putDumpFileMaybe (hsc_logger hsc_env) Opt_D_dump_hi "FINAL INTERFACE" FormatText
       (pprModIface unit_state full_iface)
+    final_iface <- shareIface (hsc_NC hsc_env) (flagsToIfCompression $ hsc_dflags hsc_env) full_iface
+    return final_iface
+
+-- | Compress an 'ModIface' and share as many values as possible, depending on the 'CompressionIFace' level.
+-- See Note [Sharing of ModIface].
+--
+-- We compress the 'ModIface' by serialising the 'ModIface' to an in-memory byte array, and then deserialising it.
+-- The deserialisation will deduplicate certain values depending on the 'CompressionIFace' level.
+-- See Note [Deduplication during iface binary serialisation] for how we do that.
+--
+-- Additionally, we cache the serialised byte array, so if the 'ModIface' is not modified
+-- after calling 'shareIface', 'writeBinIface' will reuse that buffer without serialising the 'ModIface' again.
+-- Modifying the 'ModIface' forces us to re-serialise it again.
+shareIface :: NameCache -> CompressionIFace -> ModIface -> IO ModIface
+shareIface _ NormalCompression mi = do
+  -- In 'NormalCompression', the sharing isn't reducing the memory usage, as 'Name's and 'FastString's are
+  -- already shared, and at this compression level, we don't compress/share anything else.
+  -- Thus, for a brief moment we simply double the memory residency for no reason.
+  -- Therefore, we only try to share expensive values if the compression mode is higher than
+  -- 'NormalCompression'
+  pure mi
+shareIface nc compressionLevel  mi = do
+  bh <- openBinMem (1024 * 1024)
+  start <- tellBinWriter bh
+  putIfaceWithExtFields QuietBinIFace compressionLevel bh mi
+  rbh <- shrinkBinBuffer bh
+  seekBinReader rbh start
+  res <- getIfaceWithExtFields nc rbh
+  let resiface = restoreFromOldModIface mi res
+  forceModIface resiface
+  return resiface
 
-    return full_iface
 
 updateDecl :: [IfaceDecl] -> Maybe StgCgInfos -> Maybe CmmCgInfos -> [IfaceDecl]
 updateDecl decls Nothing Nothing = decls
@@ -302,40 +335,40 @@ mkIface_ hsc_env
         icomplete_matches = map mkIfaceCompleteMatch complete_matches
         !rdrs = maybeGlobalRdrEnv rdr_env
 
-    ModIface {
-          mi_module      = this_mod,
+    emptyPartialModIface this_mod
           -- Need to record this because it depends on the -instantiated-with flag
           -- which could change
-          mi_sig_of      = if semantic_mod == this_mod
+          & set_mi_sig_of      ( if semantic_mod == this_mod
                             then Nothing
-                            else Just semantic_mod,
-          mi_hsc_src     = hsc_src,
-          mi_deps        = deps,
-          mi_usages      = usages,
-          mi_exports     = mkIfaceExports exports,
+                            else Just semantic_mod)
+          & set_mi_hsc_src     ( hsc_src)
+          & set_mi_deps        ( deps)
+          & set_mi_usages      ( usages)
+          & set_mi_exports     ( mkIfaceExports exports)
 
           -- Sort these lexicographically, so that
           -- the result is stable across compilations
-          mi_insts       = sortBy cmp_inst     iface_insts,
-          mi_fam_insts   = sortBy cmp_fam_inst iface_fam_insts,
-          mi_rules       = sortBy cmp_rule     iface_rules,
-
-          mi_fixities    = fixities,
-          mi_warns       = warns,
-          mi_anns        = annotations,
-          mi_globals     = rdrs,
-          mi_used_th     = used_th,
-          mi_decls       = decls,
-          mi_extra_decls = extra_decls,
-          mi_hpc         = isHpcUsed hpc_info,
-          mi_trust       = trust_info,
-          mi_trust_pkg   = pkg_trust_req,
-          mi_complete_matches = icomplete_matches,
-          mi_docs        = docs,
-          mi_final_exts  = (),
-          mi_ext_fields  = emptyExtensibleFields,
-          mi_src_hash = ms_hs_hash mod_summary
-          }
+          & set_mi_insts       ( sortBy cmp_inst     iface_insts)
+          & set_mi_fam_insts   ( sortBy cmp_fam_inst iface_fam_insts)
+          & set_mi_rules       ( sortBy cmp_rule     iface_rules)
+
+          & set_mi_fixities    ( fixities)
+          & set_mi_warns       ( warns)
+          & set_mi_anns        ( annotations)
+          & set_mi_globals     ( rdrs)
+          & set_mi_used_th     ( used_th)
+          & set_mi_decls       ( decls)
+          & set_mi_extra_decls ( extra_decls)
+          & set_mi_hpc         ( isHpcUsed hpc_info)
+          & set_mi_trust       ( trust_info)
+          & set_mi_trust_pkg   ( pkg_trust_req)
+          & set_mi_complete_matches ( icomplete_matches)
+          & set_mi_docs        ( docs)
+          & set_mi_final_exts  ( ())
+          & set_mi_ext_fields  ( emptyExtensibleFields)
+          & set_mi_src_hash ( ms_hs_hash mod_summary)
+          & set_mi_hi_bytes ( PartialIfaceBinHandle)
+
   where
      cmp_rule     = lexicalCompareFS `on` ifRuleName
      -- Compare these lexicographically by OccName, *not* by unique,
@@ -502,3 +535,22 @@ That is, in Y,
 In the result of mkIfaceExports, the names are grouped by defining module,
 so we may need to split up a single Avail into multiple ones.
 -}
+
+{-
+Note [Sharing of ModIface]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+A 'ModIface' contains many duplicated values such as 'Name', 'FastString' and 'IfaceType'.
+'Name's and 'FastString's are already deduplicated by default using the 'NameCache' and
+'FastStringTable' respectively.
+However, 'IfaceType' can be quite expensive in terms of memory usage.
+To improve the sharing of 'IfaceType', we introduced deduplication tables during
+serialisation of 'ModIface', see Note [Deduplication during iface binary serialisation].
+
+We can improve the sharing of 'ModIface' at run-time as well, by serialising the 'ModIface' to
+an in-memory buffer, and then deserialising it again.
+This implicitly shares duplicated values.
+
+To avoid re-serialising the 'ModIface' when writing it to disk, we save the serialised 'ModIface' buffer
+in 'mi_hi_bytes_' field of said 'ModIface'. This buffer is written to disk directly in 'putIfaceWithExtFields'.
+If we have to modify the 'ModIface' after 'shareIface' is called, the buffer needs to be discarded.
+-}


=====================================
compiler/GHC/Iface/Recomp.hs
=====================================
@@ -1283,7 +1283,9 @@ addFingerprints hsc_env iface0
       , mi_fix_fn         = fix_fn
       , mi_hash_fn        = lookupOccEnv local_env
       }
-    final_iface = iface0 { mi_decls = sorted_decls, mi_extra_decls = sorted_extra_decls, mi_final_exts = final_iface_exts }
+    final_iface = completePartialModIface iface0
+        (sorted_decls) (sorted_extra_decls) (final_iface_exts)
+
    --
    return final_iface
 


=====================================
compiler/GHC/Iface/Recomp/Binary.hs
=====================================
@@ -14,6 +14,7 @@ import GHC.Utils.Fingerprint
 import GHC.Utils.Binary
 import GHC.Types.Name
 import GHC.Utils.Panic.Plain
+import GHC.Iface.Type (putIfaceType)
 
 fingerprintBinMem :: WriteBinHandle -> IO Fingerprint
 fingerprintBinMem bh = withBinBuffer bh f
@@ -34,8 +35,12 @@ computeFingerprint put_nonbinding_name a = do
     put_ bh a
     fingerprintBinMem bh
   where
-    set_user_data bh =
-      setWriterUserData bh $ newWriteState put_nonbinding_name putNameLiterally putFS
+    set_user_data bh = setWriterUserData bh $ mkWriterUserData
+      [ mkSomeBinaryWriter $ mkWriter putIfaceType
+      , mkSomeBinaryWriter $ mkWriter put_nonbinding_name
+      , mkSomeBinaryWriter $ simpleBindingNameWriter $ mkWriter putNameLiterally
+      , mkSomeBinaryWriter $ mkWriter putFS
+      ]
 
 -- | Used when we want to fingerprint a structure without depending on the
 -- fingerprints of external Names that it refers to.


=====================================
compiler/GHC/Iface/Rename.hs
=====================================
@@ -44,6 +44,7 @@ import GHC.Utils.Panic
 import qualified Data.Traversable as T
 
 import Data.IORef
+import Data.Function ((&))
 
 tcRnMsgMaybe :: IO (Either (Messages TcRnMessage) a) -> TcM a
 tcRnMsgMaybe do_this = do
@@ -108,13 +109,14 @@ rnModIface hsc_env insts nsubst iface =
         deps <- rnDependencies (mi_deps iface)
         -- TODO:
         -- mi_rules
-        return iface { mi_module = mod
-                     , mi_sig_of = sig_of
-                     , mi_insts = insts
-                     , mi_fam_insts = fams
-                     , mi_exports = exports
-                     , mi_decls = decls
-                     , mi_deps = deps }
+        return $ iface
+          & set_mi_module mod
+          & set_mi_sig_of sig_of
+          & set_mi_insts insts
+          & set_mi_fam_insts fams
+          & set_mi_exports exports
+          & set_mi_decls decls
+          & set_mi_deps deps
 
 -- | Rename just the exports of a 'ModIface'.  Useful when we're doing
 -- shaping prior to signature merging.


=====================================
compiler/GHC/Iface/Type.hs
=====================================
@@ -33,6 +33,8 @@ module GHC.Iface.Type (
         ifForAllBndrVar, ifForAllBndrName, ifaceBndrName,
         ifTyConBinderVar, ifTyConBinderName,
 
+        -- Binary utilities
+        putIfaceType, getIfaceType, ifaceTypeSharedByte,
         -- Equality testing
         isIfaceLiftedTypeKind,
 
@@ -90,11 +92,13 @@ import GHC.Utils.Misc
 import GHC.Utils.Panic
 import {-# SOURCE #-} GHC.Tc.Utils.TcType ( isMetaTyVar, isTyConableTyVar )
 
+import Data.Maybe (isJust)
+import Data.Proxy
+import qualified Data.Semigroup as Semi
+import Data.Word (Word8)
+import Control.Arrow (first)
 import Control.DeepSeq
 import Control.Monad ((<$!>))
-import Control.Arrow (first)
-import qualified Data.Semigroup as Semi
-import Data.Maybe( isJust )
 
 {-
 ************************************************************************
@@ -2194,39 +2198,80 @@ ppr_parend_preds :: [IfacePredType] -> SDoc
 ppr_parend_preds preds = parens (fsep (punctuate comma (map ppr preds)))
 
 instance Binary IfaceType where
-    put_ _ (IfaceFreeTyVar tv)
-       = pprPanic "Can't serialise IfaceFreeTyVar" (ppr tv)
-           -- See Note [Free TyVars and CoVars in IfaceType]
+   put_ bh ty =
+    case findUserDataWriter Proxy bh of
+      tbl -> putEntry tbl bh ty
 
-    put_ bh (IfaceForAllTy aa ab) = do
-            putByte bh 0
-            put_ bh aa
-            put_ bh ab
-    put_ bh (IfaceTyVar ad) = do
-            putByte bh 1
-            put_ bh ad
-    put_ bh (IfaceAppTy ae af) = do
-            putByte bh 2
-            put_ bh ae
-            put_ bh af
-    put_ bh (IfaceFunTy af aw ag ah) = do
-            putByte bh 3
-            put_ bh af
-            put_ bh aw
-            put_ bh ag
-            put_ bh ah
-    put_ bh (IfaceTyConApp tc tys)
-      = do { putByte bh 5; put_ bh tc; put_ bh tys }
-    put_ bh (IfaceCastTy a b)
-      = do { putByte bh 6; put_ bh a; put_ bh b }
-    put_ bh (IfaceCoercionTy a)
-      = do { putByte bh 7; put_ bh a }
-    put_ bh (IfaceTupleTy s i tys)
-      = do { putByte bh 8; put_ bh s; put_ bh i; put_ bh tys }
-    put_ bh (IfaceLitTy n)
-      = do { putByte bh 9; put_ bh n }
+   get bh = getIfaceTypeShared bh
 
-    get bh = do
+-- | This is the byte tag we expect to read when the next
+-- value is not an 'IfaceType' value, but an offset into a
+-- lookup table.
+-- See Note [Deduplication during iface binary serialisation].
+--
+-- Must not overlap with any byte tag in 'getIfaceType'.
+ifaceTypeSharedByte :: Word8
+ifaceTypeSharedByte = 99
+
+-- | Like 'getIfaceType' but checks for a specific byte tag
+-- that indicates that we won't be able to read a 'IfaceType' value
+-- but rather an offset into a lookup table. Consequentially,
+-- we look up the value for the 'IfaceType' in the look up table.
+--
+-- See Note [Deduplication during iface binary serialisation]
+-- for details.
+getIfaceTypeShared :: ReadBinHandle -> IO IfaceType
+getIfaceTypeShared bh = do
+  start <- tellBinReader bh
+  tag <- getByte bh
+  if ifaceTypeSharedByte == tag
+    then case findUserDataReader Proxy bh of
+            tbl -> getEntry tbl bh
+    else seekBinReader bh start >> getIfaceType bh
+
+-- | Serialises an 'IfaceType' to the given 'WriteBinHandle'.
+--
+-- Serialising inner 'IfaceType''s uses the 'Binary.put' of 'IfaceType' which may be using
+-- a deduplication table. See Note [Deduplication during iface binary serialisation].
+putIfaceType :: WriteBinHandle -> IfaceType -> IO ()
+putIfaceType _ (IfaceFreeTyVar tv)
+  = pprPanic "Can't serialise IfaceFreeTyVar" (ppr tv)
+  -- See Note [Free TyVars and CoVars in IfaceType]
+
+putIfaceType bh (IfaceForAllTy aa ab) = do
+        putByte bh 0
+        put_ bh aa
+        put_ bh ab
+putIfaceType bh (IfaceTyVar ad) = do
+        putByte bh 1
+        put_ bh ad
+putIfaceType bh (IfaceAppTy ae af) = do
+        putByte bh 2
+        put_ bh ae
+        put_ bh af
+putIfaceType bh (IfaceFunTy af aw ag ah) = do
+        putByte bh 3
+        put_ bh af
+        put_ bh aw
+        put_ bh ag
+        put_ bh ah
+putIfaceType bh (IfaceTyConApp tc tys)
+  = do { putByte bh 5; put_ bh tc; put_ bh tys }
+putIfaceType bh (IfaceCastTy a b)
+  = do { putByte bh 6; put_ bh a; put_ bh b }
+putIfaceType bh (IfaceCoercionTy a)
+  = do { putByte bh 7; put_ bh a }
+putIfaceType bh (IfaceTupleTy s i tys)
+  = do { putByte bh 8; put_ bh s; put_ bh i; put_ bh tys }
+putIfaceType bh (IfaceLitTy n)
+  = do { putByte bh 9; put_ bh n }
+
+-- | Deserialises an 'IfaceType' from the given 'ReadBinHandle'.
+--
+-- Reading inner 'IfaceType''s uses the 'Binary.get' of 'IfaceType' which may be using
+-- a deduplication table. See Note [Deduplication during iface binary serialisation].
+getIfaceType :: HasCallStack => ReadBinHandle -> IO IfaceType
+getIfaceType bh = do
             h <- getByte bh
             case h of
               0 -> do aa <- get bh


=====================================
compiler/GHC/StgToJS/Object.hs
=====================================
@@ -314,7 +314,7 @@ putObject bh mod_name deps os = do
   put_ bh (moduleNameString mod_name)
 
   (fs_tbl, fs_writer) <- initFastStringWriterTable
-  let bh_fs = addWriterToUserData (mkSomeBinaryWriter fs_writer) bh
+  let bh_fs = addWriterToUserData fs_writer bh
 
   forwardPut_ bh (const (putTable fs_tbl bh_fs)) $ do
     put_ bh_fs deps


=====================================
compiler/GHC/Tc/Errors/Hole.hs
=====================================
@@ -76,7 +76,7 @@ import GHC.Tc.Utils.Unify ( tcSubTypeSigma )
 
 import GHC.HsToCore.Docs ( extractDocs )
 import GHC.Hs.Doc
-import GHC.Unit.Module.ModIface ( ModIface_(..) )
+import GHC.Unit.Module.ModIface ( mi_docs )
 import GHC.Iface.Load  ( loadInterfaceForName )
 
 import GHC.Builtin.Utils (knownKeyNames)


=====================================
compiler/GHC/Tc/Gen/Splice.hs
=====================================
@@ -1562,7 +1562,8 @@ lookupDeclDoc nm = do
       -- Wasn't in the current module. Try searching other external ones!
       mIface <- getExternalModIface nm
       case mIface of
-        Just ModIface { mi_docs = Just Docs{docs_decls = dmap} } ->
+        Just iface
+          | Just Docs{docs_decls = dmap} <- mi_docs iface ->
           pure $ renderHsDocStrings . map hsDocString <$> lookupUniqMap dmap nm
         _ -> pure Nothing
 
@@ -1578,7 +1579,8 @@ lookupArgDoc i nm = do
     Nothing -> do
       mIface <- getExternalModIface nm
       case mIface of
-        Just ModIface { mi_docs = Just Docs{docs_args = amap} } ->
+        Just iface
+          | Just Docs{docs_args = amap} <- mi_docs iface->
           pure $ renderHsDocString . hsDocString <$> (lookupUniqMap amap nm >>= IntMap.lookup i)
         _ -> pure Nothing
 


=====================================
compiler/GHC/Tc/Utils/Backpack.hs
=====================================
@@ -87,6 +87,7 @@ import Control.Monad
 import Data.List (find)
 
 import GHC.Iface.Errors.Types
+import Data.Function ((&))
 
 checkHsigDeclM :: ModIface -> TyThing -> TyThing -> TcRn ()
 checkHsigDeclM sig_iface sig_thing real_thing = do
@@ -369,8 +370,8 @@ tcRnMergeSignatures hsc_env hpm orig_tcg_env iface =
 
 thinModIface :: [AvailInfo] -> ModIface -> ModIface
 thinModIface avails iface =
-    iface {
-        mi_exports = avails,
+    iface
+        & set_mi_exports avails
         -- mi_fixities = ...,
         -- mi_warns = ...,
         -- mi_anns = ...,
@@ -378,10 +379,9 @@ thinModIface avails iface =
         -- perhaps there might be two IfaceTopBndr that are the same
         -- OccName but different Name.  Requires better understanding
         -- of invariants here.
-        mi_decls = exported_decls ++ non_exported_decls ++ dfun_decls
+        & set_mi_decls (exported_decls ++ non_exported_decls ++ dfun_decls)
         -- mi_insts = ...,
         -- mi_fam_insts = ...,
-    }
   where
     decl_pred occs decl = nameOccName (ifName decl) `elemOccSet` occs
     filter_decls occs = filter (decl_pred occs . snd) (mi_decls iface)


=====================================
compiler/GHC/Unit/Module/ModIface.hs
=====================================
@@ -7,7 +7,61 @@
 
 module GHC.Unit.Module.ModIface
    ( ModIface
-   , ModIface_ (..)
+   , ModIface_
+   , restoreFromOldModIface
+   , addSourceFingerprint
+   , mi_module
+   , mi_sig_of
+   , mi_hsc_src
+   , mi_src_hash
+   , mi_hi_bytes
+   , mi_deps
+   , mi_usages
+   , mi_exports
+   , mi_used_th
+   , mi_fixities
+   , mi_warns
+   , mi_anns
+   , mi_insts
+   , mi_fam_insts
+   , mi_rules
+   , mi_decls
+   , mi_extra_decls
+   , mi_globals
+   , mi_hpc
+   , mi_trust
+   , mi_trust_pkg
+   , mi_complete_matches
+   , mi_docs
+   , mi_final_exts
+   , mi_ext_fields
+   , set_mi_module
+   , set_mi_sig_of
+   , set_mi_hsc_src
+   , set_mi_src_hash
+   , set_mi_hi_bytes
+   , set_mi_deps
+   , set_mi_usages
+   , set_mi_exports
+   , set_mi_used_th
+   , set_mi_fixities
+   , set_mi_warns
+   , set_mi_anns
+   , set_mi_insts
+   , set_mi_fam_insts
+   , set_mi_rules
+   , set_mi_decls
+   , set_mi_extra_decls
+   , set_mi_globals
+   , set_mi_hpc
+   , set_mi_trust
+   , set_mi_trust_pkg
+   , set_mi_complete_matches
+   , set_mi_docs
+   , set_mi_final_exts
+   , set_mi_ext_fields
+   , completePartialModIface
+   , IfaceBinHandle(..)
    , PartialModIface
    , ModIfaceBackend (..)
    , IfaceDeclExts
@@ -58,6 +112,7 @@ import GHC.Utils.Binary
 
 import Control.DeepSeq
 import Control.Exception
+import qualified GHC.Data.Strict as Strict
 
 {- Note [Interface file stages]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -139,7 +194,9 @@ type family IfaceBackendExts (phase :: ModIfacePhase) = bk | bk -> phase where
   IfaceBackendExts 'ModIfaceCore = ()
   IfaceBackendExts 'ModIfaceFinal = ModIfaceBackend
 
-
+data IfaceBinHandle (phase :: ModIfacePhase) where
+  PartialIfaceBinHandle :: IfaceBinHandle 'ModIfaceCore
+  FullIfaceBinHandle :: Strict.Maybe FullBinData -> IfaceBinHandle 'ModIfaceFinal
 
 -- | A 'ModIface' plus a 'ModDetails' summarises everything we know
 -- about a compiled module.  The 'ModIface' is the stuff *before* linking,
@@ -153,62 +210,65 @@ type family IfaceBackendExts (phase :: ModIfacePhase) = bk | bk -> phase where
 --
 -- See Note [Strictness in ModIface] to learn about why some fields are
 -- strict and others are not.
+--
+-- See Note [Private fields in ModIface] to learn why we don't export any of the
+-- fields.
 data ModIface_ (phase :: ModIfacePhase)
   = ModIface {
-        mi_module     :: !Module,             -- ^ Name of the module we are for
-        mi_sig_of     :: !(Maybe Module),     -- ^ Are we a sig of another mod?
+        mi_module_     :: !Module,             -- ^ Name of the module we are for
+        mi_sig_of_     :: !(Maybe Module),     -- ^ Are we a sig of another mod?
 
-        mi_hsc_src    :: !HscSource,          -- ^ Boot? Signature?
+        mi_hsc_src_    :: !HscSource,          -- ^ Boot? Signature?
 
-        mi_deps     :: Dependencies,
+        mi_deps_     :: Dependencies,
                 -- ^ The dependencies of the module.  This is
                 -- consulted for directly-imported modules, but not
                 -- for anything else (hence lazy)
 
-        mi_usages   :: [Usage],
+        mi_usages_   :: [Usage],
                 -- ^ Usages; kept sorted so that it's easy to decide
                 -- whether to write a new iface file (changing usages
                 -- doesn't affect the hash of this module)
                 -- NOT STRICT!  we read this field lazily from the interface file
                 -- It is *only* consulted by the recompilation checker
 
-        mi_exports  :: ![IfaceExport],
+        mi_exports_  :: ![IfaceExport],
                 -- ^ Exports
                 -- Kept sorted by (mod,occ), to make version comparisons easier
                 -- Records the modules that are the declaration points for things
                 -- exported by this module, and the 'OccName's of those things
 
 
-        mi_used_th  :: !Bool,
+        mi_used_th_  :: !Bool,
                 -- ^ Module required TH splices when it was compiled.
                 -- This disables recompilation avoidance (see #481).
 
-        mi_fixities :: [(OccName,Fixity)],
+        mi_fixities_ :: [(OccName,Fixity)],
                 -- ^ Fixities
                 -- NOT STRICT!  we read this field lazily from the interface file
 
-        mi_warns    :: IfaceWarnings,
+        mi_warns_    :: IfaceWarnings,
                 -- ^ Warnings
                 -- NOT STRICT!  we read this field lazily from the interface file
 
-        mi_anns     :: [IfaceAnnotation],
+        mi_anns_     :: [IfaceAnnotation],
                 -- ^ Annotations
                 -- NOT STRICT!  we read this field lazily from the interface file
 
 
-        mi_decls    :: [IfaceDeclExts phase],
+        mi_decls_    :: [IfaceDeclExts phase],
                 -- ^ Type, class and variable declarations
                 -- The hash of an Id changes if its fixity or deprecations change
                 --      (as well as its type of course)
                 -- Ditto data constructors, class operations, except that
                 -- the hash of the parent class/tycon changes
 
-        mi_extra_decls :: Maybe [IfaceBindingX IfaceMaybeRhs IfaceTopBndrInfo],
+        mi_extra_decls_ :: Maybe [IfaceBindingX IfaceMaybeRhs IfaceTopBndrInfo],
                 -- ^ Extra variable definitions which are **NOT** exposed but when
                 -- combined with mi_decls allows us to restart code generation.
                 -- See Note [Interface Files with Core Definitions] and Note [Interface File with Core: Sharing RHSs]
 
-        mi_globals  :: !(Maybe IfGlobalRdrEnv),
+        mi_globals_  :: !(Maybe IfGlobalRdrEnv),
                 -- ^ Binds all the things defined at the top level in
                 -- the /original source/ code for this module. which
                 -- is NOT the same as mi_exports, nor mi_decls (which
@@ -224,36 +284,36 @@ data ModIface_ (phase :: ModIfacePhase)
                 -- 'HomeModInfo', but that leads to more plumbing.
 
                 -- Instance declarations and rules
-        mi_insts       :: [IfaceClsInst],     -- ^ Sorted class instance
-        mi_fam_insts   :: [IfaceFamInst],  -- ^ Sorted family instances
-        mi_rules       :: [IfaceRule],     -- ^ Sorted rules
+        mi_insts_       :: [IfaceClsInst],     -- ^ Sorted class instance
+        mi_fam_insts_   :: [IfaceFamInst],  -- ^ Sorted family instances
+        mi_rules_       :: [IfaceRule],     -- ^ Sorted rules
 
-        mi_hpc       :: !AnyHpcUsage,
+        mi_hpc_       :: !AnyHpcUsage,
                 -- ^ True if this program uses Hpc at any point in the program.
 
-        mi_trust     :: !IfaceTrustInfo,
+        mi_trust_     :: !IfaceTrustInfo,
                 -- ^ Safe Haskell Trust information for this module.
 
-        mi_trust_pkg :: !Bool,
+        mi_trust_pkg_ :: !Bool,
                 -- ^ Do we require the package this module resides in be trusted
                 -- to trust this module? This is used for the situation where a
                 -- module is Safe (so doesn't require the package be trusted
                 -- itself) but imports some trustworthy modules from its own
                 -- package (which does require its own package be trusted).
                 -- See Note [Trust Own Package] in GHC.Rename.Names
-        mi_complete_matches :: ![IfaceCompleteMatch],
+        mi_complete_matches_ :: ![IfaceCompleteMatch],
 
-        mi_docs :: !(Maybe Docs),
+        mi_docs_ :: !(Maybe Docs),
                 -- ^ Docstrings and related data for use by haddock, the ghci
                 -- @:doc@ command, and other tools.
                 --
                 -- @Just _@ @<=>@ the module was built with @-haddock at .
 
-        mi_final_exts :: !(IfaceBackendExts phase),
+        mi_final_exts_ :: !(IfaceBackendExts phase),
                 -- ^ Either `()` or `ModIfaceBackend` for
                 -- a fully instantiated interface.
 
-        mi_ext_fields :: !ExtensibleFields,
+        mi_ext_fields_ :: !ExtensibleFields,
                 -- ^ Additional optional fields, where the Map key represents
                 -- the field name, resulting in a (size, serialized data) pair.
                 -- Because the data is intended to be serialized through the
@@ -262,8 +322,13 @@ data ModIface_ (phase :: ModIfacePhase)
                 -- chosen over `ByteString`s.
                 --
 
-        mi_src_hash :: !Fingerprint
+        mi_src_hash_ :: !Fingerprint,
                 -- ^ Hash of the .hs source, used for recompilation checking.
+        mi_hi_bytes_ :: !(IfaceBinHandle phase)
+                -- ^ A serialised in-memory buffer of this 'ModIface'.
+                -- If this handle is given, we can avoid serialising the 'ModIface'
+                -- when writing this 'ModIface' to disk, and write this buffer to disk instead.
+                -- See Note [Sharing of ModIface].
      }
 
 {-
@@ -343,33 +408,34 @@ renameFreeHoles fhs insts =
 -- See Note [Strictness in ModIface] about where we use lazyPut vs put
 instance Binary ModIface where
    put_ bh (ModIface {
-                 mi_module    = mod,
-                 mi_sig_of    = sig_of,
-                 mi_hsc_src   = hsc_src,
-                 mi_src_hash = _src_hash, -- Don't `put_` this in the instance
+                 mi_module_    = mod,
+                 mi_sig_of_    = sig_of,
+                 mi_hsc_src_   = hsc_src,
+                 mi_src_hash_ = _src_hash, -- Don't `put_` this in the instance
                                           -- because we are going to write it
                                           -- out separately in the actual file
-                 mi_deps      = deps,
-                 mi_usages    = usages,
-                 mi_exports   = exports,
-                 mi_used_th   = used_th,
-                 mi_fixities  = fixities,
-                 mi_warns     = warns,
-                 mi_anns      = anns,
-                 mi_decls     = decls,
-                 mi_extra_decls = extra_decls,
-                 mi_insts     = insts,
-                 mi_fam_insts = fam_insts,
-                 mi_rules     = rules,
-                 mi_hpc       = hpc_info,
-                 mi_trust     = trust,
-                 mi_trust_pkg = trust_pkg,
-                 mi_complete_matches = complete_matches,
-                 mi_docs      = docs,
-                 mi_ext_fields = _ext_fields, -- Don't `put_` this in the instance so we
+                 mi_hi_bytes_  = _hi_bytes, -- TODO: explain
+                 mi_deps_      = deps,
+                 mi_usages_    = usages,
+                 mi_exports_   = exports,
+                 mi_used_th_   = used_th,
+                 mi_fixities_  = fixities,
+                 mi_warns_     = warns,
+                 mi_anns_      = anns,
+                 mi_decls_     = decls,
+                 mi_extra_decls_ = extra_decls,
+                 mi_insts_     = insts,
+                 mi_fam_insts_ = fam_insts,
+                 mi_rules_     = rules,
+                 mi_hpc_       = hpc_info,
+                 mi_trust_     = trust,
+                 mi_trust_pkg_ = trust_pkg,
+                 mi_complete_matches_ = complete_matches,
+                 mi_docs_      = docs,
+                 mi_ext_fields_ = _ext_fields, -- Don't `put_` this in the instance so we
                                               -- can deal with it's pointer in the header
                                               -- when we write the actual file
-                 mi_final_exts = ModIfaceBackend {
+                 mi_final_exts_ = ModIfaceBackend {
                    mi_iface_hash = iface_hash,
                    mi_mod_hash = mod_hash,
                    mi_flag_hash = flag_hash,
@@ -444,33 +510,34 @@ instance Binary ModIface where
         complete_matches <- get bh
         docs        <- lazyGetMaybe bh
         return (ModIface {
-                 mi_module      = mod,
-                 mi_sig_of      = sig_of,
-                 mi_hsc_src     = hsc_src,
-                 mi_src_hash = fingerprint0, -- placeholder because this is dealt
+                 mi_module_      = mod,
+                 mi_sig_of_      = sig_of,
+                 mi_hsc_src_     = hsc_src,
+                 mi_src_hash_ = fingerprint0, -- placeholder because this is dealt
                                              -- with specially when the file is read
-                 mi_deps        = deps,
-                 mi_usages      = usages,
-                 mi_exports     = exports,
-                 mi_used_th     = used_th,
-                 mi_anns        = anns,
-                 mi_fixities    = fixities,
-                 mi_warns       = warns,
-                 mi_decls       = decls,
-                 mi_extra_decls = extra_decls,
-                 mi_globals     = Nothing,
-                 mi_insts       = insts,
-                 mi_fam_insts   = fam_insts,
-                 mi_rules       = rules,
-                 mi_hpc         = hpc_info,
-                 mi_trust       = trust,
-                 mi_trust_pkg   = trust_pkg,
+                 mi_hi_bytes_    = FullIfaceBinHandle Strict.Nothing,
+                 mi_deps_        = deps,
+                 mi_usages_      = usages,
+                 mi_exports_     = exports,
+                 mi_used_th_     = used_th,
+                 mi_anns_        = anns,
+                 mi_fixities_    = fixities,
+                 mi_warns_       = warns,
+                 mi_decls_       = decls,
+                 mi_extra_decls_ = extra_decls,
+                 mi_globals_     = Nothing,
+                 mi_insts_       = insts,
+                 mi_fam_insts_   = fam_insts,
+                 mi_rules_       = rules,
+                 mi_hpc_         = hpc_info,
+                 mi_trust_       = trust,
+                 mi_trust_pkg_   = trust_pkg,
                         -- And build the cached values
-                 mi_complete_matches = complete_matches,
-                 mi_docs        = docs,
-                 mi_ext_fields  = emptyExtensibleFields, -- placeholder because this is dealt
+                 mi_complete_matches_ = complete_matches,
+                 mi_docs_        = docs,
+                 mi_ext_fields_  = emptyExtensibleFields, -- placeholder because this is dealt
                                                          -- with specially when the file is read
-                 mi_final_exts = ModIfaceBackend {
+                 mi_final_exts_ = ModIfaceBackend {
                    mi_iface_hash = iface_hash,
                    mi_mod_hash = mod_hash,
                    mi_flag_hash = flag_hash,
@@ -487,42 +554,45 @@ instance Binary ModIface where
                    mi_hash_fn = mkIfaceHashCache decls
                  }})
 
+
 -- | The original names declared of a certain module that are exported
 type IfaceExport = AvailInfo
 
 emptyPartialModIface :: Module -> PartialModIface
 emptyPartialModIface mod
-  = ModIface { mi_module      = mod,
-               mi_sig_of      = Nothing,
-               mi_hsc_src     = HsSrcFile,
-               mi_src_hash    = fingerprint0,
-               mi_deps        = noDependencies,
-               mi_usages      = [],
-               mi_exports     = [],
-               mi_used_th     = False,
-               mi_fixities    = [],
-               mi_warns       = IfWarnSome [] [],
-               mi_anns        = [],
-               mi_insts       = [],
-               mi_fam_insts   = [],
-               mi_rules       = [],
-               mi_decls       = [],
-               mi_extra_decls = Nothing,
-               mi_globals     = Nothing,
-               mi_hpc         = False,
-               mi_trust       = noIfaceTrustInfo,
-               mi_trust_pkg   = False,
-               mi_complete_matches = [],
-               mi_docs        = Nothing,
-               mi_final_exts  = (),
-               mi_ext_fields  = emptyExtensibleFields
+  = ModIface { mi_module_      = mod,
+               mi_sig_of_      = Nothing,
+               mi_hsc_src_     = HsSrcFile,
+               mi_src_hash_    = fingerprint0,
+               mi_hi_bytes_    = PartialIfaceBinHandle,
+               mi_deps_        = noDependencies,
+               mi_usages_      = [],
+               mi_exports_     = [],
+               mi_used_th_     = False,
+               mi_fixities_    = [],
+               mi_warns_       = IfWarnSome [] [],
+               mi_anns_        = [],
+               mi_insts_       = [],
+               mi_fam_insts_   = [],
+               mi_rules_       = [],
+               mi_decls_       = [],
+               mi_extra_decls_ = Nothing,
+               mi_globals_     = Nothing,
+               mi_hpc_         = False,
+               mi_trust_       = noIfaceTrustInfo,
+               mi_trust_pkg_   = False,
+               mi_complete_matches_ = [],
+               mi_docs_        = Nothing,
+               mi_final_exts_  = (),
+               mi_ext_fields_  = emptyExtensibleFields
              }
 
 emptyFullModIface :: Module -> ModIface
 emptyFullModIface mod =
     (emptyPartialModIface mod)
-      { mi_decls = []
-      , mi_final_exts = ModIfaceBackend
+      { mi_decls_ = []
+      , mi_hi_bytes_ = FullIfaceBinHandle Strict.Nothing
+      , mi_final_exts_ = ModIfaceBackend
         { mi_iface_hash = fingerprint0,
           mi_mod_hash = fingerprint0,
           mi_flag_hash = fingerprint0,
@@ -557,36 +627,36 @@ emptyIfaceHashCache _occ = Nothing
 instance ( NFData (IfaceBackendExts (phase :: ModIfacePhase))
          , NFData (IfaceDeclExts (phase :: ModIfacePhase))
          ) => NFData (ModIface_ phase) where
-  rnf (ModIface{ mi_module, mi_sig_of, mi_hsc_src, mi_deps, mi_usages
-               , mi_exports, mi_used_th, mi_fixities, mi_warns, mi_anns
-               , mi_decls, mi_extra_decls, mi_globals, mi_insts
-               , mi_fam_insts, mi_rules, mi_hpc, mi_trust, mi_trust_pkg
-               , mi_complete_matches, mi_docs, mi_final_exts
-               , mi_ext_fields, mi_src_hash })
-    =     rnf mi_module
-    `seq` rnf mi_sig_of
-    `seq`     mi_hsc_src
-    `seq`     mi_deps
-    `seq`     mi_usages
-    `seq`     mi_exports
-    `seq` rnf mi_used_th
-    `seq`     mi_fixities
-    `seq` rnf mi_warns
-    `seq` rnf mi_anns
-    `seq` rnf mi_decls
-    `seq` rnf mi_extra_decls
-    `seq` rnf mi_globals
-    `seq` rnf mi_insts
-    `seq` rnf mi_fam_insts
-    `seq` rnf mi_rules
-    `seq` rnf mi_hpc
-    `seq`     mi_trust
-    `seq` rnf mi_trust_pkg
-    `seq` rnf mi_complete_matches
-    `seq` rnf mi_docs
-    `seq`     mi_final_exts
-    `seq`     mi_ext_fields
-    `seq` rnf mi_src_hash
+  rnf (ModIface{ mi_module_, mi_sig_of_, mi_hsc_src_, mi_deps_, mi_usages_
+               , mi_exports_, mi_used_th_, mi_fixities_, mi_warns_, mi_anns_
+               , mi_decls_, mi_extra_decls_, mi_globals_, mi_insts_
+               , mi_fam_insts_, mi_rules_, mi_hpc_, mi_trust_, mi_trust_pkg_
+               , mi_complete_matches_, mi_docs_, mi_final_exts_
+               , mi_ext_fields_, mi_src_hash_ })
+    =     rnf mi_module_
+    `seq` rnf mi_sig_of_
+    `seq`     mi_hsc_src_
+    `seq`     mi_deps_
+    `seq`     mi_usages_
+    `seq`     mi_exports_
+    `seq` rnf mi_used_th_
+    `seq`     mi_fixities_
+    `seq` rnf mi_warns_
+    `seq` rnf mi_anns_
+    `seq` rnf mi_decls_
+    `seq` rnf mi_extra_decls_
+    `seq` rnf mi_globals_
+    `seq` rnf mi_insts_
+    `seq` rnf mi_fam_insts_
+    `seq` rnf mi_rules_
+    `seq` rnf mi_hpc_
+    `seq`     mi_trust_
+    `seq` rnf mi_trust_pkg_
+    `seq` rnf mi_complete_matches_
+    `seq` rnf mi_docs_
+    `seq`     mi_final_exts_
+    `seq`     mi_ext_fields_
+    `seq` rnf mi_src_hash_
     `seq` ()
 
 instance NFData (ModIfaceBackend) where
@@ -626,5 +696,226 @@ type WhetherHasOrphans   = Bool
 -- | Does this module define family instances?
 type WhetherHasFamInst = Bool
 
+-- ----------------------------------------------------------------------------
+-- Modify a 'ModIface'.
+-- ----------------------------------------------------------------------------
+
+{-
+Note [Private fields in ModIface]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The fields of 'ModIface' are private, e.g., not exported, to make the API
+impossible to misuse. A 'ModIface' can be "compressed" in-memory using
+'shareIface', which serialises the 'ModIface' to an in-memory buffer.
+This has the advantage of reducing memory usage of 'ModIface', reducing the
+overall memory usage of GHC.
+See Note [Sharing of ModIface].
+
+This in-memory buffer can be reused, if and only if the 'ModIface' is not
+modified after it has been "compressed"/shared via 'shareIface'. Instead of
+serialising 'ModIface', we simply write the in-memory buffer to disk directly.
+
+However, we can't rely that a 'ModIface' isn't modified after 'shareIface' has
+been called. Thus, we make all fields of 'ModIface' private and modification
+only happens via exported update functions, such as 'set_mi_decls'.
+These functions unconditionally clear any in-memory buffer if used, forcing us
+to serialise the 'ModIface' to disk again.
+-}
+
+-- | Given a 'PartialModIface', turn it into a 'ModIface' by completing
+-- missing fields.
+completePartialModIface :: PartialModIface
+  -> [(Fingerprint, IfaceDecl)]
+  -> Maybe [IfaceBindingX IfaceMaybeRhs IfaceTopBndrInfo]
+  -> ModIfaceBackend
+  -> ModIface
+completePartialModIface partial decls extra_decls final_exts = partial
+  { mi_decls_ = decls
+  , mi_extra_decls_ = extra_decls
+  , mi_final_exts_ = final_exts
+  , mi_hi_bytes_ = FullIfaceBinHandle Strict.Nothing
+  }
+
+-- | Add a source fingerprint to a 'ModIface_' without invalidating the byte array
+-- buffer 'mi_hi_bytes'.
+-- This is a variant of 'set_mi_src_hash' which does invalidate the buffer.
+--
+-- The 'mi_src_hash' is computed outside of 'ModIface_' based on the 'ModSummary'.
+addSourceFingerprint :: Fingerprint -> ModIface_ phase -> ModIface_ phase
+addSourceFingerprint val iface = iface { mi_src_hash_ = val }
+
+-- | Copy fields that aren't serialised to disk to the new 'ModIface_'.
+-- This includes especially hashes that are usually stored in the interface
+-- file header and 'mi_globals'.
+--
+-- We need this function after calling 'shareIface', to make sure the
+-- 'ModIface_' doesn't lose any information. This function does not discard
+-- the in-memory byte array buffer 'mi_hi_bytes'.
+restoreFromOldModIface :: ModIface_ phase -> ModIface_ phase -> ModIface_ phase
+restoreFromOldModIface old new = new
+  { mi_globals_ = mi_globals_ old
+  , mi_hsc_src_ = mi_hsc_src_ old
+  , mi_src_hash_ = mi_src_hash_ old
+  }
+
+set_mi_module :: Module -> ModIface_ phase -> ModIface_ phase
+set_mi_module val iface = clear_mi_hi_bytes $ iface { mi_module_ = val }
+
+set_mi_sig_of :: Maybe Module -> ModIface_ phase -> ModIface_ phase
+set_mi_sig_of val iface = clear_mi_hi_bytes $ iface { mi_sig_of_ = val }
+
+set_mi_hsc_src :: HscSource -> ModIface_ phase -> ModIface_ phase
+set_mi_hsc_src val iface = clear_mi_hi_bytes $ iface { mi_hsc_src_ = val }
+
+set_mi_src_hash :: Fingerprint -> ModIface_ phase -> ModIface_ phase
+set_mi_src_hash val iface = clear_mi_hi_bytes $ iface { mi_src_hash_ = val }
+
+set_mi_hi_bytes :: IfaceBinHandle phase -> ModIface_ phase -> ModIface_ phase
+set_mi_hi_bytes val iface = iface { mi_hi_bytes_ = val }
+
+set_mi_deps :: Dependencies -> ModIface_ phase -> ModIface_ phase
+set_mi_deps val iface = clear_mi_hi_bytes $ iface { mi_deps_ = val }
+
+set_mi_usages :: [Usage] -> ModIface_ phase -> ModIface_ phase
+set_mi_usages val iface = clear_mi_hi_bytes $ iface { mi_usages_ = val }
+
+set_mi_exports :: [IfaceExport] -> ModIface_ phase -> ModIface_ phase
+set_mi_exports val iface = clear_mi_hi_bytes $ iface { mi_exports_ = val }
+
+set_mi_used_th :: Bool -> ModIface_ phase -> ModIface_ phase
+set_mi_used_th val iface = clear_mi_hi_bytes $ iface { mi_used_th_ = val }
+
+set_mi_fixities :: [(OccName, Fixity)] -> ModIface_ phase -> ModIface_ phase
+set_mi_fixities val iface = clear_mi_hi_bytes $ iface { mi_fixities_ = val }
+
+set_mi_warns :: IfaceWarnings -> ModIface_ phase -> ModIface_ phase
+set_mi_warns val iface = clear_mi_hi_bytes $ iface { mi_warns_ = val }
+
+set_mi_anns :: [IfaceAnnotation] -> ModIface_ phase -> ModIface_ phase
+set_mi_anns val iface = clear_mi_hi_bytes $ iface { mi_anns_ = val }
+
+set_mi_insts :: [IfaceClsInst] -> ModIface_ phase -> ModIface_ phase
+set_mi_insts val iface = clear_mi_hi_bytes $ iface { mi_insts_ = val }
+
+set_mi_fam_insts :: [IfaceFamInst] -> ModIface_ phase -> ModIface_ phase
+set_mi_fam_insts val iface = clear_mi_hi_bytes $ iface { mi_fam_insts_ = val }
+
+set_mi_rules :: [IfaceRule] -> ModIface_ phase -> ModIface_ phase
+set_mi_rules val iface = clear_mi_hi_bytes $ iface { mi_rules_ = val }
+
+set_mi_decls :: [IfaceDeclExts phase] -> ModIface_ phase -> ModIface_ phase
+set_mi_decls val iface = clear_mi_hi_bytes $ iface { mi_decls_ = val }
+
+set_mi_extra_decls :: Maybe [IfaceBindingX IfaceMaybeRhs IfaceTopBndrInfo] -> ModIface_ phase -> ModIface_ phase
+set_mi_extra_decls val iface = clear_mi_hi_bytes $ iface { mi_extra_decls_ = val }
+
+set_mi_globals :: Maybe IfGlobalRdrEnv -> ModIface_ phase -> ModIface_ phase
+set_mi_globals val iface = clear_mi_hi_bytes $ iface { mi_globals_ = val }
+
+set_mi_hpc :: AnyHpcUsage -> ModIface_ phase -> ModIface_ phase
+set_mi_hpc val iface = clear_mi_hi_bytes $ iface { mi_hpc_ = val }
+
+set_mi_trust :: IfaceTrustInfo -> ModIface_ phase -> ModIface_ phase
+set_mi_trust val iface = clear_mi_hi_bytes $ iface { mi_trust_ = val }
+
+set_mi_trust_pkg :: Bool -> ModIface_ phase -> ModIface_ phase
+set_mi_trust_pkg val iface = clear_mi_hi_bytes $ iface { mi_trust_pkg_ = val }
+
+set_mi_complete_matches :: [IfaceCompleteMatch] -> ModIface_ phase -> ModIface_ phase
+set_mi_complete_matches val iface = clear_mi_hi_bytes $ iface { mi_complete_matches_ = val }
+
+set_mi_docs :: Maybe Docs -> ModIface_ phase -> ModIface_ phase
+set_mi_docs val iface = clear_mi_hi_bytes $  iface { mi_docs_ = val }
+
+set_mi_final_exts :: IfaceBackendExts phase -> ModIface_ phase -> ModIface_ phase
+set_mi_final_exts val iface = clear_mi_hi_bytes $ iface { mi_final_exts_ = val }
+
+set_mi_ext_fields :: ExtensibleFields -> ModIface_ phase -> ModIface_ phase
+set_mi_ext_fields val iface = clear_mi_hi_bytes $ iface { mi_ext_fields_ = val }
+
+-- | Invalidate any byte array buffer we might have.
+clear_mi_hi_bytes :: ModIface_ phase -> ModIface_ phase
+clear_mi_hi_bytes iface = iface
+  { mi_hi_bytes_ = case mi_hi_bytes iface of
+      PartialIfaceBinHandle -> PartialIfaceBinHandle
+      FullIfaceBinHandle _ -> FullIfaceBinHandle Strict.Nothing
+  }
+
+-- ----------------------------------------------------------------------------
+-- Accessor functions of 'ModIface'.
+-- We use these as 'ModIface''s fields are private.
+-- ----------------------------------------------------------------------------
+
+mi_module :: ModIface_ phase -> Module
+mi_module = mi_module_
+
+mi_sig_of :: ModIface_ phase -> Maybe Module
+mi_sig_of = mi_sig_of_
+
+mi_hsc_src :: ModIface_ phase -> HscSource
+mi_hsc_src = mi_hsc_src_
+
+mi_deps :: ModIface_ phase -> Dependencies
+mi_deps = mi_deps_
+
+mi_usages :: ModIface_ phase -> [Usage]
+mi_usages = mi_usages_
+
+mi_exports :: ModIface_ phase -> [IfaceExport]
+mi_exports = mi_exports_
+
+mi_used_th :: ModIface_ phase -> Bool
+mi_used_th = mi_used_th_
+
+mi_fixities :: ModIface_ phase -> [(OccName, Fixity)]
+mi_fixities = mi_fixities_
+
+mi_warns :: ModIface_ phase -> IfaceWarnings
+mi_warns = mi_warns_
+
+mi_anns :: ModIface_ phase -> [IfaceAnnotation]
+mi_anns = mi_anns_
+
+mi_decls :: ModIface_ phase -> [IfaceDeclExts phase]
+mi_decls = mi_decls_
+
+mi_extra_decls :: ModIface_ phase -> Maybe [IfaceBindingX IfaceMaybeRhs IfaceTopBndrInfo]
+mi_extra_decls = mi_extra_decls_
+
+mi_globals :: ModIface_ phase -> Maybe IfGlobalRdrEnv
+mi_globals = mi_globals_
+
+mi_insts :: ModIface_ phase -> [IfaceClsInst]
+mi_insts = mi_insts_
+
+mi_fam_insts :: ModIface_ phase -> [IfaceFamInst]
+mi_fam_insts = mi_fam_insts_
+
+mi_rules :: ModIface_ phase -> [IfaceRule]
+mi_rules = mi_rules_
+
+mi_hpc :: ModIface_ phase -> AnyHpcUsage
+mi_hpc = mi_hpc_
+
+mi_trust :: ModIface_ phase -> IfaceTrustInfo
+mi_trust = mi_trust_
+
+mi_trust_pkg :: ModIface_ phase -> Bool
+mi_trust_pkg = mi_trust_pkg_
+
+mi_complete_matches :: ModIface_ phase -> [IfaceCompleteMatch]
+mi_complete_matches = mi_complete_matches_
+
+mi_docs :: ModIface_ phase -> Maybe Docs
+mi_docs = mi_docs_
+
+mi_final_exts :: ModIface_ phase -> IfaceBackendExts phase
+mi_final_exts = mi_final_exts_
+
+mi_ext_fields :: ModIface_ phase -> ExtensibleFields
+mi_ext_fields = mi_ext_fields_
 
+mi_src_hash :: ModIface_ phase -> Fingerprint
+mi_src_hash = mi_src_hash_
 
+mi_hi_bytes :: ModIface_ phase -> IfaceBinHandle phase
+mi_hi_bytes = mi_hi_bytes_


=====================================
compiler/GHC/Utils/Binary.hs
=====================================
@@ -19,7 +19,7 @@
 --     http://www.cs.york.ac.uk/fp/nhc98/
 
 module GHC.Utils.Binary
-  ( {-type-}  Bin,
+  ( {-type-}  Bin, RelBin(..), getRelBin,
     {-class-} Binary(..),
     {-type-}  ReadBinHandle, WriteBinHandle,
     SymbolTable, Dictionary,
@@ -32,10 +32,14 @@ module GHC.Utils.Binary
 
    seekBinWriter,
    seekBinReader,
+   seekBinReaderRel,
    tellBinReader,
    tellBinWriter,
    castBin,
    withBinBuffer,
+   freezeWriteHandle,
+   shrinkBinBuffer,
+   thawReadHandle,
 
    foldGet, foldGet',
 
@@ -44,7 +48,9 @@ module GHC.Utils.Binary
    readBinMemN,
 
    putAt, getAt,
+   putAtRel,
    forwardPut, forwardPut_, forwardGet,
+   forwardPutRel, forwardPutRel_, forwardGetRel,
 
    -- * For writing instances
    putByte,
@@ -64,6 +70,8 @@ module GHC.Utils.Binary
    -- * Lazy Binary I/O
    lazyGet,
    lazyPut,
+   lazyGet',
+   lazyPut',
    lazyGetMaybe,
    lazyPutMaybe,
 
@@ -86,10 +94,20 @@ module GHC.Utils.Binary
    initFastStringReaderTable, initFastStringWriterTable,
    putDictionary, getDictionary, putFS,
    FSTable(..), getDictFastString, putDictFastString,
+   -- * Generic deduplication table
+   GenericSymbolTable(..),
+   initGenericSymbolTable,
+   getGenericSymtab, putGenericSymTab,
+   getGenericSymbolTable, putGenericSymbolTable,
    -- * Newtype wrappers
    BinSpan(..), BinSrcSpan(..), BinLocated(..),
    -- * Newtypes for types that have canonically more than one valid encoding
    BindingName(..),
+   simpleBindingNameWriter,
+   simpleBindingNameReader,
+   FullBinData(..), freezeBinHandle, thawBinHandle, putFullBinData,
+   freezeBinHandle2,
+   BinArray,
   ) where
 
 import GHC.Prelude
@@ -102,11 +120,11 @@ import GHC.Utils.Panic.Plain
 import GHC.Types.Unique.FM
 import GHC.Data.FastMutInt
 import GHC.Utils.Fingerprint
-import GHC.Utils.Misc (HasCallStack)
 import GHC.Types.SrcLoc
 import GHC.Types.Unique
 import qualified GHC.Data.Strict as Strict
 import GHC.Utils.Outputable( JoinPointHood(..) )
+import GHC.Utils.Misc ( HasCallStack, HasDebugCallStack )
 
 import Control.DeepSeq
 import Control.Monad            ( when, (<$!>), unless, forM_, void )
@@ -114,7 +132,7 @@ import Foreign hiding (shiftL, shiftR, void)
 import Data.Array
 import Data.Array.IO
 import Data.Array.Unsafe
-import Data.ByteString (ByteString)
+import Data.ByteString (ByteString, copy)
 import Data.Coerce
 import qualified Data.ByteString.Internal as BS
 import qualified Data.ByteString.Unsafe   as BS
@@ -132,6 +150,7 @@ import Data.List (unfoldr)
 import System.IO as IO
 import System.IO.Unsafe         ( unsafeInterleaveIO )
 import System.IO.Error          ( mkIOError, eofErrorType )
+import Type.Reflection          ( Typeable, SomeTypeRep(..) )
 import qualified Type.Reflection as Refl
 import GHC.Real                 ( Ratio(..) )
 import Data.IntMap (IntMap)
@@ -142,6 +161,7 @@ import GHC.ForeignPtr           ( unsafeWithForeignPtr )
 
 import Unsafe.Coerce (unsafeCoerce)
 
+import GHC.Data.TrieMap
 type BinArray = ForeignPtr Word8
 
 #if !MIN_VERSION_base(4,15,0)
@@ -181,6 +201,51 @@ dataHandle (BinData size bin) = do
 handleData :: WriteBinHandle -> IO BinData
 handleData (WriteBinMem _ ixr _ binr) = BinData <$> readFastMutInt ixr <*> readIORef binr
 
+---------------------------------------------------------------
+-- FullBinData
+---------------------------------------------------------------
+
+data FullBinData = FullBinData
+  { fbd_readerUserData :: ReaderUserData
+  , fbd_off_s :: {-# UNPACK #-} !Int
+  -- ^ start offset
+  , fbd_off_e :: {-# UNPACK #-} !Int
+  -- ^ end offset
+  , fbd_size :: {-# UNPACK #-} !Int
+  -- ^ total buffer size
+  , fbd_buffer :: {-# UNPACK #-} !BinArray
+  }
+
+-- Equality and Ord assume that two distinct buffers are different, even if they compare the same things.
+instance Eq FullBinData where
+  (FullBinData _ b c d e) == (FullBinData _ b1 c1 d1 e1) = b == b1 && c == c1 && d == d1 && e == e1
+
+instance Ord FullBinData where
+  compare (FullBinData _ b c d e) (FullBinData _ b1 c1 d1 e1) =
+    compare b b1 `mappend` compare c c1 `mappend` compare d d1 `mappend` compare e e1
+
+putFullBinData :: WriteBinHandle -> FullBinData -> IO ()
+putFullBinData bh (FullBinData _ o1 o2 _sz ba) = do
+  let sz = o2 - o1
+  putPrim bh sz $ \dest ->
+    unsafeWithForeignPtr (ba `plusForeignPtr` o1) $ \orig ->
+    copyBytes dest orig sz
+
+freezeBinHandle :: Bin () -> ReadBinHandle -> IO FullBinData
+freezeBinHandle (BinPtr len) (ReadBinMem user_data ixr sz binr) = do
+  ix <- readFastMutInt ixr
+  pure (FullBinData user_data ix len sz binr)
+
+freezeBinHandle2 :: ReadBinHandle -> Bin () -> IO FullBinData
+freezeBinHandle2 (ReadBinMem user_data ixr sz binr) (BinPtr start) = do
+  ix <- readFastMutInt ixr
+  pure (FullBinData user_data start ix sz binr)
+
+thawBinHandle :: FullBinData -> IO ReadBinHandle
+thawBinHandle (FullBinData user_data ix _end sz ba) = do
+  ixr <- newFastMutInt ix
+  return $ ReadBinMem user_data ixr sz ba
+
 ---------------------------------------------------------------
 -- BinHandle
 ---------------------------------------------------------------
@@ -230,20 +295,28 @@ setReaderUserData bh us = bh { rbm_userData = us }
 -- | Add 'SomeBinaryReader' as a known binary decoder.
 -- If a 'BinaryReader' for the associated type already exists in 'ReaderUserData',
 -- it is overwritten.
-addReaderToUserData :: SomeBinaryReader -> ReadBinHandle -> ReadBinHandle
-addReaderToUserData cache@(SomeBinaryReader typRep _) bh = bh
+addReaderToUserData :: forall a. Typeable a => BinaryReader a -> ReadBinHandle -> ReadBinHandle
+addReaderToUserData reader bh = bh
   { rbm_userData = (rbm_userData bh)
-      { ud_reader_data = Map.insert (Refl.SomeTypeRep typRep) cache (ud_reader_data (rbm_userData bh))
+      { ud_reader_data =
+          let
+            typRep = Refl.typeRep @a
+          in
+            Map.insert (SomeTypeRep typRep) (SomeBinaryReader typRep reader) (ud_reader_data (rbm_userData bh))
       }
   }
 
 -- | Add 'SomeBinaryWriter' as a known binary encoder.
 -- If a 'BinaryWriter' for the associated type already exists in 'WriterUserData',
 -- it is overwritten.
-addWriterToUserData :: SomeBinaryWriter -> WriteBinHandle -> WriteBinHandle
-addWriterToUserData cache@(SomeBinaryWriter typRep _) bh = bh
+addWriterToUserData :: forall a . Typeable a => BinaryWriter a -> WriteBinHandle -> WriteBinHandle
+addWriterToUserData writer bh = bh
   { wbm_userData = (wbm_userData bh)
-      { ud_writer_data = Map.insert (Refl.SomeTypeRep typRep) cache (ud_writer_data (wbm_userData bh))
+      { ud_writer_data =
+          let
+            typRep = Refl.typeRep @a
+          in
+            Map.insert (SomeTypeRep typRep) (SomeBinaryWriter typRep writer) (ud_writer_data (wbm_userData bh))
       }
   }
 
@@ -266,9 +339,47 @@ unsafeUnpackBinBuffer (BS.BS arr len) = do
 newtype Bin a = BinPtr Int
   deriving (Eq, Ord, Show, Bounded)
 
+-- | Like a 'Bin' but is used to store relative offset pointers.
+-- Relative offset pointers store a relative location, but also contain an
+-- anchor that allow to obtain the absolute offset.
+data RelBin a = RelBin
+  { relBin_anchor :: {-# UNPACK #-} !(Bin a)
+  -- ^ Absolute position from where we read 'relBin_offset'.
+  , relBin_offset :: {-# UNPACK #-} !(RelBinPtr a)
+  -- ^ Relative offset to 'relBin_anchor'.
+  -- The absolute position of the 'RelBin' is @relBin_anchor + relBin_offset@
+  }
+  deriving (Eq, Ord, Show, Bounded)
+
+-- | A 'RelBinPtr' is like a 'Bin', but contains a relative offset pointer
+-- instead of an absolute offset.
+newtype RelBinPtr a = RelBinPtr (Bin a)
+  deriving (Eq, Ord, Show, Bounded)
+
 castBin :: Bin a -> Bin b
 castBin (BinPtr i) = BinPtr i
 
+-- | Read a relative offset location and wrap it in 'RelBin'.
+--
+-- The resulting 'RelBin' can be translated into an absolute offset location using
+-- 'makeAbsoluteBin'
+getRelBin :: ReadBinHandle -> IO (RelBin a)
+getRelBin bh = do
+  start <- tellBinReader bh
+  off <- get bh
+  pure $ RelBin start off
+
+makeAbsoluteBin ::  RelBin a -> Bin a
+makeAbsoluteBin (RelBin (BinPtr !start) (RelBinPtr (BinPtr !offset))) =
+  BinPtr $ start + offset
+
+makeRelativeBin :: RelBin a -> RelBinPtr a
+makeRelativeBin (RelBin _ offset) = offset
+
+toRelBin :: Bin (RelBinPtr a) -> Bin a -> RelBin a
+toRelBin (BinPtr !start) (BinPtr !goal) =
+  RelBin (BinPtr start) (RelBinPtr $ BinPtr $ goal - start)
+
 ---------------------------------------------------------------
 -- class Binary
 ---------------------------------------------------------------
@@ -289,6 +400,9 @@ class Binary a where
 putAt  :: Binary a => WriteBinHandle -> Bin a -> a -> IO ()
 putAt bh p x = do seekBinWriter bh p; put_ bh x; return ()
 
+putAtRel :: WriteBinHandle -> Bin (RelBinPtr a) -> Bin a -> IO ()
+putAtRel bh from to = putAt bh from (makeRelativeBin $ toRelBin from to)
+
 getAt  :: Binary a => ReadBinHandle -> Bin a -> IO a
 getAt bh p = do seekBinReader bh p; get bh
 
@@ -307,6 +421,44 @@ openBinMem size
     , wbm_arr_r = arr_r
     }
 
+-- | Freeze the given 'WriteBinHandle' and turn it into an equivalent 'ReadBinHandle'.
+--
+-- The current offset of the 'WriteBinHandle' is maintained in the new 'ReadBinHandle'.
+freezeWriteHandle :: WriteBinHandle -> IO ReadBinHandle
+freezeWriteHandle wbm = do
+  rbm_off_r <- newFastMutInt =<< readFastMutInt (wbm_off_r wbm)
+  rbm_sz_r <- readFastMutInt (wbm_sz_r wbm)
+  rbm_arr_r <- readIORef (wbm_arr_r wbm)
+  pure $ ReadBinMem
+    { rbm_userData = noReaderUserData
+    , rbm_off_r = rbm_off_r
+    , rbm_sz_r = rbm_sz_r
+    , rbm_arr_r = rbm_arr_r
+    }
+
+-- Copy the BinBuffer to a new BinBuffer which is exactly the right size.
+-- This performs a copy of the underlying buffer.
+-- The buffer may be truncated if the offset is not at the end of the written
+-- output.
+--
+-- UserData is also discarded during the copy
+-- You should just use this when translating a Put handle into a Get handle.
+shrinkBinBuffer :: WriteBinHandle -> IO ReadBinHandle
+shrinkBinBuffer bh = withBinBuffer bh $ \bs -> do
+  unsafeUnpackBinBuffer (copy bs)
+
+thawReadHandle :: ReadBinHandle -> IO WriteBinHandle
+thawReadHandle rbm = do
+  wbm_off_r <- newFastMutInt =<< readFastMutInt (rbm_off_r rbm)
+  wbm_sz_r <- newFastMutInt (rbm_sz_r rbm)
+  wbm_arr_r <- newIORef (rbm_arr_r rbm)
+  pure $ WriteBinMem
+    { wbm_userData = noWriterUserData
+    , wbm_off_r = wbm_off_r
+    , wbm_sz_r = wbm_sz_r
+    , wbm_arr_r = wbm_arr_r
+    }
+
 tellBinWriter :: WriteBinHandle -> IO (Bin a)
 tellBinWriter (WriteBinMem _ r _ _) = do ix <- readFastMutInt r; return (BinPtr ix)
 
@@ -332,12 +484,19 @@ seekBinNoExpandWriter (WriteBinMem _ ix_r sz_r _) (BinPtr !p) = do
         else writeFastMutInt ix_r p
 
 -- | SeekBin but without calling expandBin
-seekBinReader :: ReadBinHandle -> Bin a -> IO ()
+seekBinReader :: HasCallStack => ReadBinHandle -> Bin a -> IO ()
 seekBinReader (ReadBinMem _ ix_r sz_r _) (BinPtr !p) = do
   if (p > sz_r)
         then panic "seekBinReader: seek out of range"
         else writeFastMutInt ix_r p
 
+seekBinReaderRel :: HasCallStack => ReadBinHandle -> RelBin a -> IO ()
+seekBinReaderRel (ReadBinMem _ ix_r sz_r _) relBin = do
+  let (BinPtr !p) = makeAbsoluteBin relBin
+  if (p > sz_r)
+        then panic "seekBinReaderRel: seek out of range"
+        else writeFastMutInt ix_r p
+
 writeBinMem :: WriteBinHandle -> FilePath -> IO ()
 writeBinMem (WriteBinMem _ ix_r _ arr_r) fn = do
   h <- openBinaryFile fn WriteMode
@@ -1058,12 +1217,17 @@ instance Binary (Bin a) where
   put_ bh (BinPtr i) = putWord32 bh (fromIntegral i :: Word32)
   get bh = do i <- getWord32 bh; return (BinPtr (fromIntegral (i :: Word32)))
 
+-- Instance uses fixed-width encoding to allow inserting
+-- Bin placeholders in the stream.
+instance Binary (RelBinPtr a) where
+  put_ bh (RelBinPtr i) = put_ bh i
+  get bh = RelBinPtr <$> get bh
 
 -- -----------------------------------------------------------------------------
 -- Forward reading/writing
 
--- | "forwardPut put_A put_B" outputs A after B but allows A to be read before B
--- by using a forward reference
+-- | @'forwardPut' put_A put_B@ outputs A after B but allows A to be read before B
+-- by using a forward reference.
 forwardPut :: WriteBinHandle -> (b -> IO a) -> IO b -> IO (a,b)
 forwardPut bh put_A put_B = do
   -- write placeholder pointer to A
@@ -1086,7 +1250,9 @@ forwardPut_ :: WriteBinHandle -> (b -> IO a) -> IO b -> IO ()
 forwardPut_ bh put_A put_B = void $ forwardPut bh put_A put_B
 
 -- | Read a value stored using a forward reference
-forwardGet :: ReadBinHandle -> IO a -> IO a
+--
+-- The forward reference is expected to be an absolute offset.
+forwardGet :: HasCallStack => ReadBinHandle -> IO a -> IO a
 forwardGet bh get_A = do
     -- read forward reference
     p <- get bh -- a BinPtr
@@ -1098,29 +1264,79 @@ forwardGet bh get_A = do
     seekBinReader bh p_a
     pure r
 
+-- | @'forwardPutRel' put_A put_B@ outputs A after B but allows A to be read before B
+-- by using a forward reference.
+--
+-- This forward reference is a relative offset that allows us to skip over the
+-- result of 'put_A'.
+forwardPutRel :: WriteBinHandle -> (b -> IO a) -> IO b -> IO (a,b)
+forwardPutRel bh put_A put_B = do
+  -- write placeholder pointer to A
+  pre_a <- tellBinWriter bh
+  put_ bh pre_a
+
+  -- write B
+  r_b <- put_B
+
+  -- update A's pointer
+  a <- tellBinWriter bh
+  putAtRel bh pre_a a
+  seekBinNoExpandWriter bh a
+
+  -- write A
+  r_a <- put_A r_b
+  pure (r_a,r_b)
+
+-- | Like 'forwardGetRel', but discard the result.
+forwardPutRel_ :: WriteBinHandle -> (b -> IO a) -> IO b -> IO ()
+forwardPutRel_ bh put_A put_B = void $ forwardPutRel bh put_A put_B
+
+-- | Read a value stored using a forward reference.
+--
+-- The forward reference is expected to be a relative offset.
+forwardGetRel :: ReadBinHandle -> IO a -> IO a
+forwardGetRel bh get_A = do
+    -- read forward reference
+    p <- getRelBin bh
+    -- store current position
+    p_a <- tellBinReader bh
+    -- go read the forward value, then seek back
+    seekBinReader bh $ makeAbsoluteBin p
+    r <- get_A
+    seekBinReader bh p_a
+    pure r
+
 -- -----------------------------------------------------------------------------
 -- Lazy reading/writing
 
 lazyPut :: Binary a => WriteBinHandle -> a -> IO ()
-lazyPut bh a = do
+lazyPut = lazyPut' put_
+
+lazyGet :: Binary a => ReadBinHandle -> IO a
+lazyGet = lazyGet' get
+
+lazyPut' :: (WriteBinHandle -> a -> IO ()) -> WriteBinHandle -> a -> IO ()
+lazyPut' f bh a = do
     -- output the obj with a ptr to skip over it:
     pre_a <- tellBinWriter bh
     put_ bh pre_a       -- save a slot for the ptr
-    put_ bh a           -- dump the object
+    f bh a           -- dump the object
     q <- tellBinWriter bh     -- q = ptr to after object
-    putAt bh pre_a q    -- fill in slot before a with ptr to q
+    putAtRel bh pre_a q    -- fill in slot before a with ptr to q
     seekBinWriter bh q        -- finally carry on writing at q
 
-lazyGet :: Binary a => ReadBinHandle -> IO a
-lazyGet bh = do
-    p <- get bh -- a BinPtr
+lazyGet' :: HasDebugCallStack => (ReadBinHandle -> IO a) -> ReadBinHandle -> IO a
+lazyGet' f bh = do
+    p <- getRelBin bh -- a BinPtr
     p_a <- tellBinReader bh
     a <- unsafeInterleaveIO $ do
-        -- NB: Use a fresh off_r variable in the child thread, for thread
+        -- NB: Use a fresh rbm_off_r variable in the child thread, for thread
         -- safety.
         off_r <- newFastMutInt 0
-        getAt bh { rbm_off_r = off_r } p_a
-    seekBinReader bh p -- skip over the object for now
+        let bh' = bh { rbm_off_r = off_r }
+        seekBinReader bh' p_a
+        f bh'
+    seekBinReader bh (makeAbsoluteBin p) -- skip over the object for now
     return a
 
 -- | Serialize the constructor strictly but lazily serialize a value inside a
@@ -1173,6 +1389,12 @@ lazyGetMaybe bh = do
 newtype BindingName = BindingName { getBindingName :: Name }
   deriving ( Eq )
 
+simpleBindingNameWriter :: BinaryWriter Name -> BinaryWriter BindingName
+simpleBindingNameWriter = coerce
+
+simpleBindingNameReader :: BinaryReader Name -> BinaryReader BindingName
+simpleBindingNameReader = coerce
+
 -- | Existential for 'BinaryWriter' with a type witness.
 data SomeBinaryWriter = forall a . SomeBinaryWriter (Refl.TypeRep a) (BinaryWriter a)
 
@@ -1184,7 +1406,7 @@ data SomeBinaryReader = forall a . SomeBinaryReader (Refl.TypeRep a) (BinaryRead
 -- See Note [Binary UserData]
 data WriterUserData =
    WriterUserData {
-      ud_writer_data :: Map Refl.SomeTypeRep SomeBinaryWriter
+      ud_writer_data :: Map SomeTypeRep SomeBinaryWriter
       -- ^ A mapping from a type witness to the 'Writer' for the associated type.
       -- This is a 'Map' because microbenchmarks indicated this is more efficient
       -- than other representations for less than ten elements.
@@ -1201,7 +1423,7 @@ data WriterUserData =
 -- See Note [Binary UserData]
 data ReaderUserData =
    ReaderUserData {
-      ud_reader_data :: Map Refl.SomeTypeRep SomeBinaryReader
+      ud_reader_data :: Map SomeTypeRep SomeBinaryReader
       -- ^ A mapping from a type witness to the 'Reader' for the associated type.
       -- This is a 'Map' because microbenchmarks indicated this is more efficient
       -- than other representations for less than ten elements.
@@ -1215,12 +1437,12 @@ data ReaderUserData =
 
 mkWriterUserData :: [SomeBinaryWriter] -> WriterUserData
 mkWriterUserData caches = noWriterUserData
-  { ud_writer_data = Map.fromList $ map (\cache@(SomeBinaryWriter typRep _) -> (Refl.SomeTypeRep typRep, cache)) caches
+  { ud_writer_data = Map.fromList $ map (\cache@(SomeBinaryWriter typRep _) -> (SomeTypeRep typRep, cache)) caches
   }
 
 mkReaderUserData :: [SomeBinaryReader] -> ReaderUserData
 mkReaderUserData caches = noReaderUserData
-  { ud_reader_data = Map.fromList $ map (\cache@(SomeBinaryReader typRep _) -> (Refl.SomeTypeRep typRep, cache)) caches
+  { ud_reader_data = Map.fromList $ map (\cache@(SomeBinaryReader typRep _) -> (SomeTypeRep typRep, cache)) caches
   }
 
 mkSomeBinaryWriter :: forall a . Refl.Typeable a => BinaryWriter a -> SomeBinaryWriter
@@ -1346,6 +1568,110 @@ newtype WriterTable = WriterTable
   -- ^ Serialise a table to disk. Returns the number of written elements.
   }
 
+-- ----------------------------------------------------------------------------
+-- Common data structures for constructing and maintaining lookup tables for
+-- binary serialisation and deserialisation.
+-- ----------------------------------------------------------------------------
+
+-- | The 'GenericSymbolTable' stores a mapping from already seen elements to an index.
+-- If an element wasn't seen before, it is added to the mapping together with a fresh
+-- index.
+--
+-- 'GenericSymbolTable' is a variant of a 'BinSymbolTable' that is polymorphic in the table implementation.
+-- As such it can be used with any container that implements the 'TrieMap' type class.
+--
+-- While 'GenericSymbolTable' is similar to the 'BinSymbolTable', it supports storing tree-like
+-- structures such as 'Type' and 'IfaceType' more efficiently.
+--
+data GenericSymbolTable m = GenericSymbolTable
+  { gen_symtab_next :: !FastMutInt
+  -- ^ The next index to use.
+  , gen_symtab_map  :: !(IORef (m Int))
+  -- ^ Given a symbol, find the symbol and return its index.
+  , gen_symtab_to_write :: !(IORef [Key m])
+  -- ^ Reversed list of values to write into the buffer.
+  -- This is an optimisation, as it allows us to write out quickly all
+  -- newly discovered values that are discovered when serialising 'Key m'
+  -- to disk.
+  }
+
+-- | Initialise a 'GenericSymbolTable', initialising the index to '0'.
+initGenericSymbolTable :: TrieMap m => IO (GenericSymbolTable m)
+initGenericSymbolTable = do
+  symtab_next <- newFastMutInt 0
+  symtab_map <- newIORef emptyTM
+  symtab_todo <- newIORef []
+  pure $ GenericSymbolTable
+        { gen_symtab_next = symtab_next
+        , gen_symtab_map  = symtab_map
+        , gen_symtab_to_write = symtab_todo
+        }
+
+-- | Serialise the 'GenericSymbolTable' to disk.
+--
+-- Since 'GenericSymbolTable' stores tree-like structures, such as 'IfaceType',
+-- serialising an element can add new elements to the mapping.
+-- Thus, 'putGenericSymbolTable' first serialises all values, and then checks whether any
+-- new elements have been discovered. If so, repeat the loop.
+putGenericSymbolTable :: forall m. (TrieMap m) => GenericSymbolTable m -> (WriteBinHandle -> Key m -> IO ()) -> WriteBinHandle -> IO Int
+{-# INLINE putGenericSymbolTable #-}
+putGenericSymbolTable gen_sym_tab serialiser bh = do
+  putGenericSymbolTable bh
+  where
+    symtab_next = gen_symtab_next gen_sym_tab
+    symtab_to_write = gen_symtab_to_write gen_sym_tab
+    putGenericSymbolTable :: WriteBinHandle -> IO Int
+    putGenericSymbolTable bh  = do
+      let loop = do
+            vs <- atomicModifyIORef' symtab_to_write (\a -> ([], a))
+            case vs of
+              [] -> readFastMutInt symtab_next
+              todo -> do
+                mapM_ (\n -> serialiser bh n) (reverse todo)
+                loop
+      snd <$>
+        (forwardPutRel bh (const $ readFastMutInt symtab_next >>= put_ bh) $
+          loop)
+
+-- | Read the elements of a 'GenericSymbolTable' from disk into a 'SymbolTable'.
+getGenericSymbolTable :: forall a . (ReadBinHandle -> IO a) -> ReadBinHandle -> IO (SymbolTable a)
+getGenericSymbolTable deserialiser bh = do
+  sz <- forwardGetRel bh (get bh) :: IO Int
+  mut_arr <- newArray_ (0, sz-1) :: IO (IOArray Int a)
+  forM_ [0..(sz-1)] $ \i -> do
+    f <- deserialiser bh
+    writeArray mut_arr i f
+  unsafeFreeze mut_arr
+
+-- | Write an element 'Key m' to the given 'WriteBinHandle'.
+--
+-- If the element was seen before, we simply write the index of that element to the
+-- 'WriteBinHandle'. If we haven't seen it before, we add the element to
+-- the 'GenericSymbolTable', increment the index, and return this new index.
+putGenericSymTab :: (TrieMap m) => GenericSymbolTable m -> WriteBinHandle -> Key m -> IO ()
+{-# INLINE putGenericSymTab #-}
+putGenericSymTab GenericSymbolTable{
+               gen_symtab_map = symtab_map_ref,
+               gen_symtab_next = symtab_next,
+               gen_symtab_to_write = symtab_todo }
+        bh val = do
+  symtab_map <- readIORef symtab_map_ref
+  case lookupTM val symtab_map of
+    Just off -> put_ bh (fromIntegral off :: Word32)
+    Nothing -> do
+      off <- readFastMutInt symtab_next
+      writeFastMutInt symtab_next (off+1)
+      writeIORef symtab_map_ref
+          $! insertTM val off symtab_map
+      atomicModifyIORef symtab_todo (\todo -> (val : todo, ()))
+      put_ bh (fromIntegral off :: Word32)
+
+-- | Read a value from a 'SymbolTable'.
+getGenericSymtab :: Binary a => SymbolTable a -> ReadBinHandle -> IO a
+getGenericSymtab symtab bh = do
+  i :: Word32 <- get bh
+  return $! symtab ! fromIntegral i
+
 ---------------------------------------------------------
 -- The Dictionary
 ---------------------------------------------------------


=====================================
docs/users_guide/using-optimisation.rst
=====================================
@@ -1777,3 +1777,23 @@ as such you shouldn't need to set any of them explicitly. A flag
     This flag sets the size (in bytes) threshold above which the second approach
     is used. You can disable the second approach entirely by setting the
     threshold to 0.
+
+.. ghc-flag:: -fwrite-if-compression=⟨n⟩
+    :shortdesc: *default: 2.* Tweak the level of interface file compression.
+    :type: dynamic
+    :category: optimization
+
+    :default: 2
+
+    This flag defines the level of compression of interface files when writing to disk.
+    The higher the flag, the more we deduplicate the interface file, at the cost of a higher compilation time.
+    Deduplication (when applied to :ghc-flag:`--make` mode and :ghc-flag:`--interactive` mode) decreases the size of interface files as well as reducing
+    the overall memory usage of GHC.
+
+    Compression cannot be fully turned off, GHC always compresses interface files to a certain degree.
+    Currently, we support values of ``1``, ``2`` and ``3``.
+    Lower or higher values are clamped to ``1`` and ``3`` respectively.
+
+    * ``1``: Compress as little as possible. No run-time impact, at the cost of interface file size and memory usage.
+    * ``2``: Apply compression with minimal run-time overhead, reducing the interface file size and memory usage.
+    * ``3``: Apply all possible compressions, minimal interface file sizes and memory usage, at the cost of run-time overhead.


=====================================
testsuite/tests/iface/IfaceSharingIfaceType.hs
=====================================
@@ -0,0 +1,44 @@
+{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE OverloadedStrings #-}
+
+module IfaceSharingIfaceType (types) where
+
+import GHC.Data.FastString
+import GHC.Builtin.Uniques
+import GHC.Builtin.Names
+import GHC.Builtin.Types
+import GHC.Types.Name
+import GHC.Types.SrcLoc
+import GHC.Iface.Type
+import GHC.CoreToIface
+import GHC.Core.TyCo.Rep
+import GHC
+
+[f1,f2,f3,f4,f5] = map mkVarOcc ["a", "b","c","d","e"]
+
+[u1,u2,u3,u4,u5] = map mkPreludeMiscIdUnique [10000..10004]
+
+names = [ mkExternalName u1 pRELUDE f1 noSrcSpan
+        , mkExternalName u2 pRELUDE f2 noSrcSpan
+        , mkExternalName u3 pRELUDE f3 noSrcSpan
+        , mkExternalName u4 pRELUDE f4 noSrcSpan
+        , mkExternalName u5 pRELUDE f5 noSrcSpan ]
+
+-- Int
+intIfaceTy = toIfaceType intTy
+
+wordIfaceTy = toIfaceType wordTy
+
+listIntTy = toIfaceType (mkListTy intTy)
+
+funTy = (intTy `mkVisFunTyMany` wordTy `mkVisFunTyMany` mkListTy intTy)
+
+funIfaceTy = toIfaceType funTy
+
+reallyBigFunTy = toIfaceType (funTy `mkVisFunTyMany` funTy `mkVisFunTyMany` funTy `mkVisFunTyMany` funTy)
+
+forallIfaceTy = toIfaceType (dataConType justDataCon)
+
+
+types = [intIfaceTy, wordIfaceTy, listIntTy, funIfaceTy, reallyBigFunTy, forallIfaceTy]
+


=====================================
testsuite/tests/iface/IfaceSharingName.hs
=====================================
@@ -0,0 +1,21 @@
+{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE OverloadedStrings #-}
+
+module IfaceSharingName where
+
+import Lib
+import GHC.Data.FastString
+import GHC.Builtin.Uniques
+import GHC.Builtin.Names
+import GHC.Types.Name
+import GHC.Types.SrcLoc
+
+[f1,f2,f3,f4,f5] = map mkVarOcc ["a", "b","c","d","e"]
+
+[u1,u2,u3,u4,u5] = map mkPreludeMiscIdUnique [10000..10004]
+
+names = [ mkExternalName u1 pRELUDE f1 noSrcSpan
+        , mkExternalName u2 pRELUDE f2 noSrcSpan
+        , mkExternalName u3 pRELUDE f3 noSrcSpan
+        , mkExternalName u4 pRELUDE f4 noSrcSpan
+        , mkExternalName u5 pRELUDE f5 noSrcSpan ]


=====================================
testsuite/tests/iface/Lib.hs
=====================================
@@ -0,0 +1,15 @@
+module Lib where
+
+import GHC.Utils.Binary
+import GHC.Iface.Binary
+import qualified Data.ByteString as B
+import System.Environment
+import Data.Maybe
+
+testSize :: Binary a => CompressionIFace -> a -> IO Int
+testSize compLvl payload = do
+  args <- getArgs
+  bh <- openBinMem 1024
+  putWithUserData QuietBinIFace compLvl bh payload
+  withBinBuffer bh (\b -> return (B.length b))
+


=====================================
testsuite/tests/iface/Makefile
=====================================
@@ -0,0 +1,4 @@
+TOP=../..
+include $(TOP)/mk/boilerplate.mk
+include $(TOP)/mk/test.mk
+


=====================================
testsuite/tests/iface/all.T
=====================================
@@ -0,0 +1,24 @@
+test( 'if_faststring'
+    , [ stat_from_file('normal', 5, 'NORMALSIZE')
+      , stat_from_file('medium', 5, 'MEDIUMSIZE')
+      , stat_from_file('full', 5, 'FULLSIZE')
+      , extra_files(["Lib.hs"])]
+    , compile_and_run
+    , ['-package ghc'])
+
+test( 'if_name'
+    , [ stat_from_file('normal', 5, 'NORMALSIZE')
+      , stat_from_file('medium', 5, 'MEDIUMSIZE')
+      , stat_from_file('full', 5, 'FULLSIZE')
+      , extra_files(["Lib.hs", "IfaceSharingName.hs"])]
+    , compile_and_run
+    , ['-package ghc'])
+
+test( 'if_ifacetype'
+    , [ stat_from_file('normal', 5, 'NORMALSIZE')
+      , stat_from_file('medium', 5, 'MEDIUMSIZE')
+      , stat_from_file('full', 5, 'FULLSIZE')
+      , extra_files(["Lib.hs", "IfaceSharingIfaceType.hs"])]
+    , compile_and_run
+    , ['-package ghc'])
+


=====================================
testsuite/tests/iface/if_faststring.hs
=====================================
@@ -0,0 +1,15 @@
+{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE OverloadedStrings #-}
+
+import Lib
+import GHC.Data.FastString
+import GHC.Iface.Binary
+
+main :: IO ()
+main = do
+  sz <- testSize MaximalCompression (concat (replicate 1000 ["abc", "cde", "efg" :: FastString]))
+  writeFile "FULLSIZE" (show sz)
+  sz <- testSize SafeExtraCompression (concat (replicate 1000 ["abc", "cde", "efg" :: FastString]))
+  writeFile "MEDIUMSIZE" (show sz)
+  sz <- testSize NormalCompression (concat (replicate 1000 ["abc", "cde", "efg" :: FastString]))
+  writeFile "NORMALSIZE" (show sz)


=====================================
testsuite/tests/iface/if_ifacetype.hs
=====================================
@@ -0,0 +1,13 @@
+import Lib
+import IfaceSharingIfaceType
+import GHC.Iface.Binary
+
+main :: IO ()
+main = do
+  sz <- testSize MaximalCompression (concat (replicate 500 types))
+  writeFile "FULLSIZE" (show sz)
+  sz <- testSize SafeExtraCompression (concat (replicate 500 types))
+  writeFile "MEDIUMSIZE" (show sz)
+  sz <- testSize NormalCompression (concat (replicate 500 types))
+  writeFile "NORMALSIZE" (show sz)
+


=====================================
testsuite/tests/iface/if_name.hs
=====================================
@@ -0,0 +1,12 @@
+import Lib
+import IfaceSharingName
+import GHC.Iface.Binary
+
+main :: IO ()
+main = do
+  sz <- testSize MaximalCompression (concat (replicate 1000 names))
+  writeFile "FULLSIZE" (show sz)
+  sz <- testSize SafeExtraCompression (concat (replicate 1000 names))
+  writeFile "MEDIUMSIZE" (show sz)
+  sz <- testSize NormalCompression (concat (replicate 1000 names))
+  writeFile "NORMALSIZE" (show sz)


=====================================
testsuite/tests/plugins/simple-plugin/Simple/RemovePlugin.hs
=====================================
@@ -64,9 +64,10 @@ metaPlugin' _ meta = pprPanic "meta" (showAstData BlankSrcSpan BlankEpAnnotation
 
 interfaceLoadPlugin' :: [CommandLineOption] -> ModIface -> IfM lcl ModIface
 interfaceLoadPlugin' [name, "interface"] iface
-  = return $ iface { mi_exports = filter (availNotNamedAs name)
-                                         (mi_exports iface)
-                   }
+  = return $ set_mi_exports (filter (availNotNamedAs name)
+                                    (mi_exports iface))
+                            iface
+
 interfaceLoadPlugin' _ iface = return iface
 
 availNotNamedAs :: String -> AvailInfo -> Bool


=====================================
utils/haddock
=====================================
@@ -1 +1 @@
-Subproject commit a711607e29b925f3d69e27c5fde4ba655c711ff1
+Subproject commit 6f9e0cc9ae9f2dfbd671e5970ec9d7bbf9a9dcf1



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/5a4e116a2842dba90e87c55cb66875221a18225f...4b6741fc662667bec2e927d2c755c1dc83b4d712

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/5a4e116a2842dba90e87c55cb66875221a18225f...4b6741fc662667bec2e927d2c755c1dc83b4d712
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240513/b4af49dc/attachment-0001.html>


More information about the ghc-commits mailing list