[Git][ghc/ghc][wip/fendor/ghc-iface-sharing-avoid-reserialisation] 28 commits: Rename Solo# data constructor to MkSolo# (#24673)

Hannes Siebenhandl (@fendor) gitlab at gitlab.haskell.org
Tue May 14 11:32:21 UTC 2024



Hannes Siebenhandl pushed to branch wip/fendor/ghc-iface-sharing-avoid-reserialisation at Glasgow Haskell Compiler / GHC


Commits:
3b51995c by Andrei Borzenkov at 2024-05-07T14:39:40-04:00
Rename Solo# data constructor to MkSolo# (#24673)

- data Solo# a = (# a #)
+ data Solo# a = MkSolo# a

And `(# foo #)` syntax now becomes just a syntactic
sugar for `MkSolo# a`.

- - - - -
4d59abf2 by Arsen Arsenović at 2024-05-07T14:40:24-04:00
Add the cmm_cpp_is_gcc predicate to the testsuite

A future C-- test called T24474-cmm-override-g0 relies on the
GCC-specific behaviour of -g3 implying -dD, which, in turn, leads to it
emitting #defines past the preprocessing stage.  Clang, at least, does
not do this, so the test would fail if ran on Clang.

As the behaviour here being tested is ``-optCmmP-g3'' undoing effects of
the workaround we apply as a fix for bug #24474, and the workaround was
for GCC-specific behaviour, the test needs to be marked as fragile on
other compilers.

- - - - -
25b0b404 by Arsen Arsenović at 2024-05-07T14:40:24-04:00
Split out the C-- preprocessor, and make it pass -g0

Previously, C-- was processed with the C preprocessor program.  This
means that it inherited flags passed via -optc.  A flag that is somewhat
often passed through -optc is -g.  At certain -g levels (>=2), GCC
starts emitting defines *after* preprocessing, for the purposes of
debug info generation.  This is not useful for the C-- compiler, and, in
fact, causes lexer errors.  We can suppress this effect (safely, if
supported) via -g0.

As a workaround, in older versions of GCC (<=10), GCC only emitted
defines if a certain set of -g*3 flags was passed.  Newer versions check
the debug level.  For the former, we filter out those -g*3 flags and,
for the latter, we specify -g0 on top of that.

As a compatible and effective solution, this change adds a C--
preprocessor distinct from the C compiler and preprocessor, but that
keeps its flags.  The command line produced for C-- preprocessing now
looks like:

  $pgmCmmP $optCs_without_g3 $g0_if_supported $optCmmP

Closes: https://gitlab.haskell.org/ghc/ghc/-/issues/24474

- - - - -
9b4129a5 by Andreas Klebinger at 2024-05-08T13:24:20-04:00
-fprof-late: Only insert cost centres on functions/non-workfree cafs.

They are usually useless and doing so for data values comes with
a large compile time/code size overhead.

Fixes #24103

- - - - -
259b63d3 by Sebastian Graf at 2024-05-08T13:24:57-04:00
Simplifier: Preserve OccInfo on DataAlt fields when case binder is dead (#24770)

See the adjusted `Note [DataAlt occ info]`.
This change also has a positive repercussion on
`Note [Combine case alts: awkward corner]`.

Fixes #24770.

We now try not to call `dataConRepStrictness` in `adjustFieldsIdInfo` when all
fields are lazy anyway, leading to a 2% ghc/alloc decrease in T9675.

Metric Decrease:
    T9675

- - - - -
31b28cdb by Sebastian Graf at 2024-05-08T13:24:57-04:00
Kill seqRule, discard dead seq# in Prep (#24334)

Discarding seq#s in Core land via `seqRule` was problematic; see #24334.
So instead we discard certain dead, discardable seq#s in Prep now.
See the updated `Note [seq# magic]`.

This fixes the symptoms of #24334.

- - - - -
b2682534 by Rodrigo Mesquita at 2024-05-10T01:47:51-04:00
Document NcgImpl methods

Fixes #19914

- - - - -
4d3acbcf by Zejun Wu at 2024-05-10T01:48:28-04:00
Make renamer to be more flexible with parens in the LHS of the rules

We used to reject LHS like `(f a) b` in RULES and requires it to be written as
`f a b`. It will be handy to allow both as the expression may be more
readable with extra parens in some cases when infix operator is involved.
Espceially when TemplateHaskell is used, extra parens may be added out of
user's control and result in "valid" rules being rejected and there
are not always ways to workaround it.

Fixes #24621

- - - - -
ab840ce6 by Ben Gamari at 2024-05-10T01:49:04-04:00
IPE: Eliminate dependency on Read

Instead of encoding the closure type as decimal string we now simply
represent it as an integer, eliminating the need for `Read` in
`GHC.Internal.InfoProv.Types.peekInfoProv`.

Closes #24504.

-------------------------
Metric Decrease:
    T24602_perf_size
    size_hello_artifact
-------------------------

- - - - -
a9979f55 by Cheng Shao at 2024-05-10T01:49:43-04:00
testsuite: fix testwsdeque with recent clang

This patch fixes compilation of testwsdeque.c with recent versions of
clang, which will fail with the error below:

```
testwsdeque.c:95:33: error:
     warning: format specifies type 'long' but the argument has type 'void *' [-Wformat]
       95 |         barf("FAIL: %ld %d %d", p, n, val);
          |                     ~~~         ^

testwsdeque.c:95:39: error:
     warning: format specifies type 'int' but the argument has type 'StgWord' (aka 'unsigned long') [-Wformat]
       95 |         barf("FAIL: %ld %d %d", p, n, val);
          |                            ~~         ^~~
          |                            %lu

testwsdeque.c:133:42: error:
     error: incompatible function pointer types passing 'void (void *)' to parameter of type 'OSThreadProc *' (aka 'void *(*)(void *)') [-Wincompatible-function-pointer-types]
      133 |         createOSThread(&ids[n], "thief", thief, (void*)(StgWord)n);
          |                                          ^~~~~

/workspace/ghc/_build/stage1/lib/../lib/x86_64-linux-ghc-9.11.20240502/rts-1.0.2/include/rts/OSThreads.h:193:51: error:
     note: passing argument to parameter 'startProc' here
      193 |                                     OSThreadProc *startProc, void *param);
          |                                                   ^

2 warnings and 1 error generated.
```

- - - - -
c2b33fc9 by Rodrigo Mesquita at 2024-05-10T01:50:20-04:00
Rename pre-processor invocation args

Small clean up. Uses proper names for the various groups of arguments
that make up the pre-processor invocation.

- - - - -
2b1af08b by Cheng Shao at 2024-05-10T01:50:55-04:00
ghc-heap: fix typo in ghc-heap cbits

- - - - -
fc2d6de1 by Jade at 2024-05-10T21:07:16-04:00
Improve performance of Data.List.sort(By)

This patch improves the algorithm to sort lists in base.
It does so using two strategies:

1) Use a four-way-merge instead of the 'default' two-way-merge.
This is able to save comparisons and allocations.

2) Use `(>) a b` over `compare a b == GT` and allow inlining and specialization.
This mainly benefits types with a fast (>).

Note that this *may* break instances with a *malformed* Ord instance
where `a > b` is *not* equal to `compare a b == GT`.

CLC proposal: https://github.com/haskell/core-libraries-committee/issues/236

Fixes #24280

-------------------------
Metric Decrease:
    MultiLayerModulesTH_Make
    T10421
    T13719
    T15164
    T18698a
    T18698b
    T1969
    T9872a
    T9961
    T18730
    WWRec
    T12425
    T15703
-------------------------

- - - - -
1012e8aa by Matthew Pickering at 2024-05-10T21:07:52-04:00
Revert "ghcup-metadata: Drop output_name field"

This reverts commit ecbf22a6ac397a791204590f94c0afa82e29e79f.

This breaks the ghcup metadata generation on the nightly jobs.

- - - - -
daff1e30 by Jannis at 2024-05-12T13:38:35-04:00
Division by constants optimization

- - - - -
413217ba by Andreas Klebinger at 2024-05-12T13:39:11-04:00
Tidy: Add flag to expose unfoldings if they take dictionary arguments.

Add the flag `-fexpose-overloaded-unfoldings` to be able to control this
behaviour.

For ghc's boot libraries file size grew by less than 1% when it was
enabled. However I refrained from enabling it by default for now.

I've also added a section on specialization more broadly to the users
guide.

-------------------------
Metric Decrease:
    MultiLayerModulesTH_OneShot
Metric Increase:
    T12425
    T13386
    hard_hole_fits
-------------------------

- - - - -
c5d89412 by Zubin Duggal at 2024-05-13T22:19:53-04:00
Don't store a GlobalRdrEnv in `mi_globals` for GHCi.

GHCi only needs the `mi_globals` field for modules imported with
:module +*SomeModule.

It uses this field to make the top level environment in `SomeModule` available
to the repl.

By default, only the first target in the command line parameters is
"star" loaded into GHCi. Other modules have to be manually "star" loaded
into the repl.

Storing the top level GlobalRdrEnv for each module is very wasteful, especially
given that we will most likely never need most of these environments.

Instead we store only the information needed to reconstruct the top level environment
in a module, which is the `IfaceTopEnv` data structure, consisting of all import statements
as well as all top level symbols defined in the module (not taking export lists into account)

When a particular module is "star-loaded" into GHCi (as the first commandline target, or via
an explicit `:module +*SomeModule`, we reconstruct the top level environment on demand using
the `IfaceTopEnv`.

- - - - -
d65bf4a2 by Fendor at 2024-05-13T22:20:30-04:00
Add perf regression test for `-fwrite-if-simplified-core`

- - - - -
2c0f8ddb by Andrei Borzenkov at 2024-05-13T22:21:07-04:00
Improve pattern to type pattern transformation (23739)

`pat_to_type_pat` function now can handle more patterns:
  - TuplePat
  - ListPat
  - LitPat
  - NPat
  - ConPat

Allowing these new constructors in type patterns significantly
increases possible shapes of type patterns without `type` keyword.

This patch also changes how lookups in `lookupOccRnConstr` are
performed, because we need to fall back into
types when we didn't find a constructor on data level to perform
`ConPat` to type transformation properly.

- - - - -
be514bb4 by Cheng Shao at 2024-05-13T22:21:43-04:00
hadrian: fix hadrian building with ghc-9.10.1

- - - - -
ad38e954 by Cheng Shao at 2024-05-13T22:21:43-04:00
linters: fix lint-whitespace compilation with ghc-9.10.1

- - - - -
aef81310 by Fendor at 2024-05-14T13:00:24+02:00
Add Eq and Ord instance to `IfaceType`

We add an `Ord` instance so that we can store `IfaceType` in a
`Data.Map` container.
This is required to deduplicate `IfaceType` while writing `.hi` files to
disk. Deduplication has many beneficial consequences to both file size
and memory usage, as the deduplication enables implicit sharing of
values.
See issue #24540 for more motivation.

The `Ord` instance would be unnecessary if we used a `TrieMap` instead
of `Data.Map` for the deduplication process. While in theory this is
clerarly the better option, experiments on the agda code base showed
that a `TrieMap` implementation has worse run-time performance
characteristics.

To the change itself, we mostly derive `Eq` and `Ord`. This requires us
to change occurrences of `FastString` with `LexicalFastString`, since
`FastString` has no `Ord` instance.
We change the definition of `IfLclName` to a newtype of
`LexicalFastString`, to make such changes in the future easier.

Bump haddock submodule for IfLclName changes

- - - - -
531dfd3e by Fendor at 2024-05-14T13:00:24+02:00
Move out LiteralMap to avoid cyclic module dependencies

- - - - -
3447248e by Fendor at 2024-05-14T13:00:24+02:00
Add deduplication table for `IfaceType`

The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions when byte code is
embedded into the `.hi` file via `-fwrite-if-simplified-core` or
`-fbyte-code-and-object-code`.
Loading such `.hi` files from disk introduces many duplicates of
memory expensive values in `IfaceType`, such as `IfaceTyCon`,
`IfaceTyConApp`, `IA_Arg` and many more.

We improve the memory behaviour of GHCi by adding an additional
deduplication table for `IfaceType` to the serialisation of `ModIface`,
similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.

To provide some numbers, we evaluated this patch on the agda code base.
We loaded the full library from the `.hi` files, which contained the
embedded core expressions (`-fwrite-if-simplified-core`).

Before this patch:

* Load time: 11.7 s, 2.5 GB maximum residency.

After this patch:

* Load time:  7.3 s, 1.7 GB maximum residency.

This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously.

For example, on agda, we reduce the size of `.hi` files (with
`-fwrite-if-simplified-core`):

* Before: 101 MB on disk
* Now:     24 MB on disk

This has even a beneficial side effect on the cabal store. We reduce the
size of the store on disk:

* Before: 341 MB on disk
* Now:    310 MB on disk

Note, none of the dependencies have been compiled with
`-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple
locations in a `ModIface`.

We also add IfaceType deduplication table to .hie serialisation and
refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.

Bump haddock submodule to accomodate for changes to the deduplication
table layout and binary interface.

- - - - -
7e99b350 by Fendor at 2024-05-14T13:00:24+02:00
Add run-time configurability of `.hi` file compression

Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.

The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels with variable impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.

We introduce three compression levels:

* `1`: `Normal` mode. This is the least amount of compression.
    It deduplicates only `Name` and `FastString`s, and is naturally the
    fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
  marginally slower than `Normal` mode. In general, it should be safe to
  always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
  resulting in minimal .hi files, but at the cost of additional
  compilation time.

Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`, as we write out a byte that
indicates the next value has been deduplicated.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.

Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.

-------------------------
Metric Decrease:
    MultiLayerModulesDefsGhciWithCore
    T16875
    T21839c
    T24471
    hard_hole_fits
    libdir
-------------------------

- - - - -
0c246d82 by Matthew Pickering at 2024-05-14T13:00:24+02:00
Introduce regression tests for `.hi` file sizes

Add regression tests to track how `-fwrite-if-compression` levels affect
the size of `.hi` files.

- - - - -
c05ff9fe by Fendor at 2024-05-14T13:01:32+02:00
Improve sharing of duplicated values in `ModIface`

As a `ModIface` contains often duplicated values that are not
necessarily shared, we improve sharing by serialising the `ModIface`
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in
`--make` mode. The peak memory usage is especially reduced when
generating interface files with core expressions
(`-fwrite-if-simplified-core`).

On agda, this reduces the peak memory usage:

* `2.2 GB` to `1.9 GB` for a ghci session.

On `lib:Cabal`, we report:

* `570 MB` to `500 MB` for a ghci session
* `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc

There is a small impact on execution time, around 2% on the agda code
base.

- - - - -
e4e4f7a7 by Fendor at 2024-05-14T13:31:57+02:00
Avoid unneccessarily re-serialising the `ModIface`

To reduce memory usage of `ModIface`, we serialise `ModIface` to an
in-memory byte array, which implicitly shares duplicated values.

This serailised byte array can be reused to avoid work when we actually
write the `ModIface` to disk.
We introduce a new field to `ModIface` which allows us to save the byte
array, and write it to disk if the `ModIface` wasn't changed after the
initial serialisation.

This requires us to change absolute offsets, for example to jump to the
deduplication table for `Name` or `FastString` with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.

This leads to new primitives for `ModIface`, which help to construct
relative offsets.

Bump Haddock submodule, to account for interface file changes.

-------------------------
Metric Increase:
    MultiComponentModules
    MultiLayerModules
    T10421
    T12425
    T13035
    T13701
    T13719
    T14697
    T18730
    T9198
    mhu-perf
-------------------------

These metric increases may look bad, but they are all completely benign,
we simply allocate 1 MB per module for `shareIface`. As this allocation
is quite quick, it has a neglible impact on run-time performance.

- - - - -


30 changed files:

- .gitlab/ci.sh
- .gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
- compiler/GHC.hs
- compiler/GHC/Builtin/Types.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/Config.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Pipeline.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/CmmToAsm/AArch64/RegInfo.hs
- compiler/GHC/CmmToAsm/Monad.hs
- compiler/GHC/CmmToAsm/X86/Instr.hs
- compiler/GHC/Core/LateCC.hs
- compiler/GHC/Core/LateCC/TopLevelBinds.hs
- compiler/GHC/Core/LateCC/Types.hs
- compiler/GHC/Core/Map/Expr.hs
- compiler/GHC/Core/Opt/CSE.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify/Iteration.hs
- compiler/GHC/Core/Opt/SpecConstr.hs
- compiler/GHC/Core/TyCo/Rep.hs
- compiler/GHC/Core/Type.hs
- compiler/GHC/CoreToIface.hs
- compiler/GHC/CoreToStg/Prep.hs
- compiler/GHC/Data/FastString.hs
- compiler/GHC/Data/TrieMap.hs
- compiler/GHC/Driver/Backend.hs


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4b6741fc662667bec2e927d2c755c1dc83b4d712...e4e4f7a7aa21bbbe8b86be94f6bb4b211e554955

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/4b6741fc662667bec2e927d2c755c1dc83b4d712...e4e4f7a7aa21bbbe8b86be94f6bb4b211e554955
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240514/f38757c9/attachment-0001.html>


More information about the ghc-commits mailing list