[Git][ghc/ghc][wip/fendor/ghc-iface-refact] 2 commits: Fixup: rename name tables
Hannes Siebenhandl (@fendor)
gitlab at gitlab.haskell.org
Mon Apr 8 12:45:24 UTC 2024
Hannes Siebenhandl pushed to branch wip/fendor/ghc-iface-refact at Glasgow Haskell Compiler / GHC
Commits:
72116c2a by Fendor at 2024-04-08T14:42:55+02:00
Fixup: rename name tables
- - - - -
e79b4636 by Fendor at 2024-04-08T14:45:17+02:00
Fixup: add note about deduplication
- - - - -
1 changed file:
- compiler/GHC/Iface/Binary.hs
Changes:
=====================================
compiler/GHC/Iface/Binary.hs
=====================================
@@ -159,7 +159,7 @@ getWithUserData name_cache bh = do
getTables :: NameCache -> ReadBinHandle -> IO ReadBinHandle
getTables name_cache bh = do
fsReaderTable <- initFastStringReaderTable
- nameReaderTable <- (initReadNameCachedBinary name_cache)
+ nameReaderTable <- initNameReaderTable name_cache
-- The order of these deserialisation matters!
@@ -237,7 +237,7 @@ putWithUserData traceBinIface bh payload = do
putWithTables :: WriteBinHandle -> (WriteBinHandle -> IO b) -> IO (Int, Int, b)
putWithTables bh' put_payload = do
(fast_wt, fsWriter) <- initFastStringWriterTable
- (name_wt, nameWriter) <- initWriteNameTable
+ (name_wt, nameWriter) <- initNameWriterTable
let writerUserData = mkWriterUserData
[ mkSomeBinaryWriter @FastString fsWriter
@@ -277,8 +277,8 @@ binaryInterfaceMagic platform
Note [Iface Binary Serialisation]
~~~~~~~~~~~~~~~~~~~
When we serialise a 'ModIface', many symbols are redundant.
-For example, there can be duplicated 'FastString's and 'Name's.
-To save space, we deduplicate some symbols, such as 'FastString' and 'Name',
+For example, there can be many duplicated 'FastString's and 'Name's.
+To save space, we deduplicate duplicated symbols, such as 'FastString' and 'Name',
by maintaining a table of already seen symbols.
When serialising a symbol, we lookup whether we have encountered the symbol before.
If yes, we write the index of the symbol, otherwise we generate a new index and store it in the table.
@@ -286,6 +286,48 @@ If yes, we write the index of the symbol, otherwise we generate a new index and
Besides saving a lot of disk space, this additionally enables us to automatically share
these symbols when we read the 'ModIface' from disk, without additional mechanisms such as 'FastStringTable'.
+To add a deduplication table for a type, let us assume 'IfaceTyCon', you need to do the following:
+
+* The 'Binary' instance 'IfaceTyCon' needs to dynamically look up the serialiser instead of
+ directly serialising itself. It needs to look up the serialiser in the 'ReaderUserData' and
+ 'WriterUserData' respectively.
+
+ For example:
+
+ @
+ instance Binary IfaceTyCon where
+ put_ bh ty = case findUserDataWriter (Proxy @IfaceTyCon) bh of
+ tbl -> putEntry tbl bh ty
+ get bh = case findUserDataReader (Proxy @IfaceTyCon) bh of
+ tbl -> getEntry tbl bh
+ @
+
+* Whenever a value of 'IfaceTyCon' is serialised, the real serialisation function needs to
+ be configured in the User Data.
+
+ For example, for 'IfaceTyCon':
+
+ @
+ let bh0 :: WriteBinHandle = ...
+ putIfaceTyCon = ... -- Serialises 'IfaceTyCon' to disk
+ bh = addWriterToUserData (mkSomeBinaryWriter putIfaceTyCon) bh0
+ @
+
+ Naturally, you have to do something similar for reading values of 'IfaceTyCon'.
+
+ Now, here we can introduce the deduplication table!
+ Instead of specifying 'putIfaceTyCon', which writes a value of 'IfaceTyCon' directly to disk,
+ we provide a function that looks up values in a 'Map' and remembers the offset
+ of each value we have already seen.
+ Instead of serialising the full 'IfaceTyCon', we only write the index of the value to disk.
+
+* Storing the deduplication table.
+
+ After the deduplicating the elements in the payload (e.g., 'ModIface'), we now have a deduplication
+ table full with all the values.
+ We serialise this table to disk using the real serialiser (e.g., 'putIfaceTyCon').
+
+
Note [Iface Binary Serialiser Order]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Serialisation of 'ModIface' uses tables to deduplicate symbols that occur often.
@@ -336,8 +378,8 @@ Here, a visualisation of the table structure we currently have:
--
-initReadNameCachedBinary :: NameCache -> IO (ReaderTable Name)
-initReadNameCachedBinary cache = do
+initNameReaderTable :: NameCache -> IO (ReaderTable Name)
+initNameReaderTable cache = do
return $
ReaderTable
{ getTable = \bh -> getSymbolTable bh cache
@@ -350,8 +392,8 @@ data BinSymbolTable = BinSymbolTable {
-- indexed by Name
}
-initWriteNameTable :: IO (WriterTable, BinaryWriter Name)
-initWriteNameTable = do
+initNameWriterTable :: IO (WriterTable, BinaryWriter Name)
+initNameWriterTable = do
symtab_next <- newFastMutInt 0
symtab_map <- newIORef emptyUFM
let bin_symtab =
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/adf68fbabc8b0654e324959421496a59d85deb14...e79b463691f12528ca6e4bf7b1744388c248d183
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/adf68fbabc8b0654e324959421496a59d85deb14...e79b463691f12528ca6e4bf7b1744388c248d183
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240408/e0a95df6/attachment-0001.html>
More information about the ghc-commits
mailing list