[Git][ghc/ghc][wip/iface-sharing-tyconapp] 8 commits: Refactor the Binary serialisation interface

Matthew Pickering (@mpickering) gitlab at gitlab.haskell.org
Wed Apr 10 10:12:30 UTC 2024



Matthew Pickering pushed to branch wip/iface-sharing-tyconapp at Glasgow Haskell Compiler / GHC


Commits:
68d406f8 by Fendor at 2024-04-10T10:11:27+02:00
Refactor the Binary serialisation interface

The end goal is to dynamically add deduplication tables for `ModIface`
interface serialisation.

We identify two main points of interest that make this difficult:

1. UserData hardcodes what `Binary` instances can have deduplication
   tables. Moreover, it heavily uses partial functions.
2. GHC.Iface.Binary hardcodes the deduplication tables for 'Name' and
   'FastString', making it difficult to add more deduplication.

Instead of having a single `UserData` record with fields for all the
types that can have deduplication tables, we allow to provide custom
serialisers for any `Typeable`.
These are wrapped in existentials and stored in a `Map` indexed by their
respective `TypeRep`.
The `Binary` instance of the type to deduplicate still needs to
explicitly look up the decoder via `findUserDataReader` and
`findUserDataWriter`, which is no worse than the status-quo.

`Map` was chosen as microbenchmarks indicate it is the fastest for a
small number of keys (< 10).

To generalise the deduplication table serialisation mechanism, we
introduce the types `ReaderTable` and `WriterTable` which provide a
simple interface that is sufficient to implement a general purpose
deduplication mechanism for `writeBinIface` and `readBinIface`.

This allows us to provide a list of deduplication tables for
serialisation that can be extended more easily, for example for
`IfaceTyCon`, see the issue https://gitlab.haskell.org/ghc/ghc/-/issues/24540
for more motivation.

In addition to ths refactoring, we split `UserData` into `ReaderUserData`
and `WriterUserData`, to avoid partial functions.

Bump haddock submodule to accomodate for `UserData` split.

-------------------------
Metric Increase:
    T21839c
-------------------------

- - - - -
026372f7 by Fendor at 2024-04-10T10:11:27+02:00
Split `BinHandle` into `ReadBinHandle` and `WriteBinHandle`

A `BinHandle` contains too much information for reading data.
For example, it needs to keep a `FastMutInt` and a `IORef BinData`,
when the non-mutable variants would suffice.

Additionally, this change has the benefit that anyone can immediately
tell whether the `BinHandle` is used for reading or writing.

Bump haddock submodule BinHandle split.

- - - - -
12a3a5d5 by Matthew Pickering at 2024-04-10T10:44:28+02:00
Add deduplication table for `IfaceType`

The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions.
We fix this by adding a deduplication table to the serialisation of
`ModIface`, similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.

This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously. On the agda code
base, we reduce the size from 28 MB to 16 MB. When `-fwrite-simplified-core`
is enabled, we reduce the size from 112 MB to 22 MB.

We have to add an `Ord` instance to `IfaceType` in order to store it
efficiently for look up operations. This is mostly straightforward, we
change occurrences of `FastString` with `LexicalFastString` and add a
newtype definition for `IfLclName = LexicalFastString`.

Bump haddock submodule for `IfLclName` newtype changes.

- - - - -
4ed1ed63 by Fendor at 2024-04-10T10:44:30+02:00
Add IfaceType deduplication table to .hie serialisation

Refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.

- - - - -
c4a518b4 by Matthew Pickering at 2024-04-10T10:44:30+02:00
WIP: TrieMap for IfaceType

Bump haddock submodule

- - - - -
829f88c9 by Fendor at 2024-04-10T11:58:08+02:00
WIP: CI results for Data.Map deduplication

- - - - -
21c66902 by Matthew Pickering at 2024-04-10T11:12:09+01:00
Add some tests to check for size of interface files when serialising
various types

- - - - -
aa5cd630 by Matthew Pickering at 2024-04-10T11:12:09+01:00
Only share IfaceTyConApp

- - - - -


30 changed files:

- compiler/GHC/Core/Map/Expr.hs
- compiler/GHC/Core/TyCo/Rep.hs
- compiler/GHC/CoreToIface.hs
- compiler/GHC/Data/FastString.hs
- compiler/GHC/Data/TrieMap.hs
- compiler/GHC/Iface/Binary.hs
- compiler/GHC/Iface/Decl.hs
- compiler/GHC/Iface/Env.hs
- compiler/GHC/Iface/Ext/Binary.hs
- compiler/GHC/Iface/Ext/Fields.hs
- compiler/GHC/Iface/Ext/Utils.hs
- compiler/GHC/Iface/Recomp.hs
- compiler/GHC/Iface/Recomp/Binary.hs
- compiler/GHC/Iface/Recomp/Flags.hs
- compiler/GHC/Iface/Syntax.hs
- compiler/GHC/Iface/Type.hs
- + compiler/GHC/Iface/Type/Map.hs
- compiler/GHC/IfaceToCore.hs
- compiler/GHC/Stg/CSE.hs
- compiler/GHC/StgToJS/Object.hs
- compiler/GHC/Types/Basic.hs
- compiler/GHC/Types/FieldLabel.hs
- compiler/GHC/Types/Name.hs
- compiler/GHC/Types/Var.hs
- compiler/GHC/Utils/Binary.hs
- compiler/GHC/Utils/Binary/Typeable.hs
- compiler/Language/Haskell/Syntax/Type.hs
- compiler/Language/Haskell/Syntax/Type.hs-boot
- compiler/ghc.cabal.in
- + testsuite/tests/iface/IfaceSharingFastString.hs


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/c3080267851a9cb69641cf43b2d3c68da8267d3b...aa5cd6305545e33fc255a8f669aad72907cdec41

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/c3080267851a9cb69641cf43b2d3c68da8267d3b...aa5cd6305545e33fc255a8f669aad72907cdec41
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240410/cf13eb62/attachment.html>


More information about the ghc-commits mailing list