[Git][ghc/ghc][wip/fendor/ghc-iface-sharing-avoid-reserialisation] 6 commits: Add run-time configurability of .hi file compression

Hannes Siebenhandl (@fendor) gitlab at gitlab.haskell.org
Mon May 6 10:00:35 UTC 2024



Hannes Siebenhandl pushed to branch wip/fendor/ghc-iface-sharing-avoid-reserialisation at Glasgow Haskell Compiler / GHC


Commits:
5abb60f3 by Matthew Pickering at 2024-04-30T15:51:23+02:00
Add run-time configurability of .hi file compression

Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.

The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels that have a minimal impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.

We introduce three compression levels:

* `1`: `Normal` mode. This is the least amount of compression.
    It deduplicates only `Name` and `FastString`s, and is naturally the
    fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
  marginally slower than `Normal` mode. In general, it should be safe to
  always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
  resulting in minimal .hi files, but at the cost of additional
  compilation time.

Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.

Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.

-------------------------
Metric Decrease:
    MultiLayerModulesDefsGhciWithCore
    T16875
    T21839c
    T24471
    hard_hole_fits
-------------------------

- - - - -
1b91e077 by Matthew Pickering at 2024-04-30T15:53:05+02:00
Introduce regression tests for `.hi` file sizes

Add regression tests to track how `-fwrite-if-compression` levels affect
the size of `.hi` files.

- - - - -
53880781 by Fendor at 2024-04-30T15:53:05+02:00
Implement TrieMap for IfaceType

- - - - -
3134b745 by Fendor at 2024-05-06T12:00:04+02:00
Improve sharing of duplicated values in `ModIface`

As a `ModIface` contains often duplicated values that are not
necessarily shared, we improve sharing by serialising the `ModIface`
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in
`--make` mode. The peak memory usage is especially reduced when
generating interface files with core expressions
(`-fwrite-if-simplified-core`).

On agda, this reduces the peak memory usage:

* `2.2 GB` to `1.9 GB` for a ghci session.

On `lib:Cabal`, we report:

* `570 MB` to `500 MB` for a ghci session
* `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc

The execution time is not affected.

- - - - -
be433156 by Fendor at 2024-05-06T12:00:04+02:00
FullBinData Commit

- - - - -
cbd79025 by Fendor at 2024-05-06T12:00:04+02:00
Avoid unneccessarily re-serialising the `ModIface`

To reduce memory usage of `ModIface`, we serialise `ModIface` to an
in-memory byte array, which implicitly shares duplicated values.

This serailised byte array can be reused to avoid work when we actually
write the `ModIface` to disk.
We introduce a new field to `ModIface` which allows us to save the byte
array, and write it to disk if the `ModIface` wasn't changed after the
initial serialisation.

This requires us to change absolute offsets, for example to jump to the
deduplication table for `Name` or `FastString` with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.

This leads to new primitives for `ModIface`, which help to construct
relative offsets.

- - - - -


30 changed files:

- compiler/GHC.hs
- compiler/GHC/Core/Map/Expr.hs
- compiler/GHC/Data/TrieMap.hs
- compiler/GHC/Driver/DynFlags.hs
- compiler/GHC/Driver/Main.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/Iface/Binary.hs
- compiler/GHC/Iface/Ext/Binary.hs
- compiler/GHC/Iface/Ext/Fields.hs
- compiler/GHC/Iface/Load.hs
- compiler/GHC/Iface/Make.hs
- compiler/GHC/Iface/Recomp.hs
- compiler/GHC/Iface/Rename.hs
- compiler/GHC/Iface/Type.hs
- + compiler/GHC/Iface/Type/Map.hs
- compiler/GHC/Stg/CSE.hs
- compiler/GHC/Tc/Errors/Hole.hs
- compiler/GHC/Tc/Gen/Splice.hs
- compiler/GHC/Tc/Utils/Backpack.hs
- compiler/GHC/Unit/Module/ModIface.hs
- compiler/GHC/Utils/Binary.hs
- compiler/ghc.cabal.in
- docs/users_guide/using-optimisation.rst
- + testsuite/tests/iface/IfaceSharingIfaceType.hs
- + testsuite/tests/iface/IfaceSharingName.hs
- + testsuite/tests/iface/Lib.hs
- + testsuite/tests/iface/Makefile
- + testsuite/tests/iface/all.T
- + testsuite/tests/iface/if_faststring.hs
- + testsuite/tests/iface/if_ifacetype.hs


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4bcd2b69c92d8ea2cb465a0d3188a669abdfefde...cbd7902510ffeb80e96d23afcf693b684c63fa09

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4bcd2b69c92d8ea2cb465a0d3188a669abdfefde...cbd7902510ffeb80e96d23afcf693b684c63fa09
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240506/ea28cffe/attachment.html>


More information about the ghc-commits mailing list