[Git][ghc/ghc][wip/torsten.schmits/parallel-depanal-downsweep] 24 commits: Expand the `inline` rule to look through casts/ticks.
Torsten Schmits (@torsten.schmits)
gitlab at gitlab.haskell.org
Fri May 17 11:09:48 UTC 2024
Torsten Schmits pushed to branch wip/torsten.schmits/parallel-depanal-downsweep at Glasgow Haskell Compiler / GHC
Commits:
a593f284 by Andreas Klebinger at 2024-05-15T07:32:10-04:00
Expand the `inline` rule to look through casts/ticks.
Fixes #24808
- - - - -
b1e0c313 by Cheng Shao at 2024-05-15T07:32:46-04:00
testsuite: bump PartialDownSweep timeout to 5x on wasm32
- - - - -
b2227487 by Fendor at 2024-05-15T17:14:06-04:00
Add Eq and Ord instance to `IfaceType`
We add an `Ord` instance so that we can store `IfaceType` in a
`Data.Map` container.
This is required to deduplicate `IfaceType` while writing `.hi` files to
disk. Deduplication has many beneficial consequences to both file size
and memory usage, as the deduplication enables implicit sharing of
values.
See issue #24540 for more motivation.
The `Ord` instance would be unnecessary if we used a `TrieMap` instead
of `Data.Map` for the deduplication process. While in theory this is
clerarly the better option, experiments on the agda code base showed
that a `TrieMap` implementation has worse run-time performance
characteristics.
To the change itself, we mostly derive `Eq` and `Ord`. This requires us
to change occurrences of `FastString` with `LexicalFastString`, since
`FastString` has no `Ord` instance.
We change the definition of `IfLclName` to a newtype of
`LexicalFastString`, to make such changes in the future easier.
Bump haddock submodule for IfLclName changes
- - - - -
d368f9a6 by Fendor at 2024-05-15T17:14:06-04:00
Move out LiteralMap to avoid cyclic module dependencies
- - - - -
2fcc09fd by Fendor at 2024-05-15T17:14:06-04:00
Add deduplication table for `IfaceType`
The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions when byte code is
embedded into the `.hi` file via `-fwrite-if-simplified-core` or
`-fbyte-code-and-object-code`.
Loading such `.hi` files from disk introduces many duplicates of
memory expensive values in `IfaceType`, such as `IfaceTyCon`,
`IfaceTyConApp`, `IA_Arg` and many more.
We improve the memory behaviour of GHCi by adding an additional
deduplication table for `IfaceType` to the serialisation of `ModIface`,
similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.
To provide some numbers, we evaluated this patch on the agda code base.
We loaded the full library from the `.hi` files, which contained the
embedded core expressions (`-fwrite-if-simplified-core`).
Before this patch:
* Load time: 11.7 s, 2.5 GB maximum residency.
After this patch:
* Load time: 7.3 s, 1.7 GB maximum residency.
This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously.
For example, on agda, we reduce the size of `.hi` files (with
`-fwrite-if-simplified-core`):
* Before: 101 MB on disk
* Now: 24 MB on disk
This has even a beneficial side effect on the cabal store. We reduce the
size of the store on disk:
* Before: 341 MB on disk
* Now: 310 MB on disk
Note, none of the dependencies have been compiled with
`-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple
locations in a `ModIface`.
We also add IfaceType deduplication table to .hie serialisation and
refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.
Bump haddock submodule to accomodate for changes to the deduplication
table layout and binary interface.
- - - - -
36aa7cf1 by Fendor at 2024-05-15T17:14:06-04:00
Add run-time configurability of `.hi` file compression
Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.
The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels with variable impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.
We introduce three compression levels:
* `1`: `Normal` mode. This is the least amount of compression.
It deduplicates only `Name` and `FastString`s, and is naturally the
fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
marginally slower than `Normal` mode. In general, it should be safe to
always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
resulting in minimal .hi files, but at the cost of additional
compilation time.
Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`, as we write out a byte that
indicates the next value has been deduplicated.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.
Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.
-------------------------
Metric Decrease:
MultiLayerModulesDefsGhciWithCore
T16875
T21839c
T24471
hard_hole_fits
libdir
-------------------------
- - - - -
1e63a6fb by Matthew Pickering at 2024-05-15T17:14:07-04:00
Introduce regression tests for `.hi` file sizes
Add regression tests to track how `-fwrite-if-compression` levels affect
the size of `.hi` files.
- - - - -
639d742b by M Farkas-Dyck at 2024-05-15T17:14:49-04:00
TTG: ApplicativeStatement exist only in Rn and Tc
Co-Authored-By: romes <rodrigo.m.mesquita at gmail.com>
- - - - -
aa7b336b by Jade at 2024-05-15T23:06:17-04:00
Documentation: Improve documentation for symbols exported from System.IO
- - - - -
c561de8f by Jade at 2024-05-15T23:06:54-04:00
Improve suggestions for language extensions
- When suggesting Language extensions, also suggest Extensions which imply them
- Suggest ExplicitForAll and GADTSyntax instead of more specific
extensions
- Rephrase suggestion to include the term 'Extension'
- Also moves some flag specific definitions out of Session.hs into
Flags.hs (#24478)
Fixes: #24477
Fixes: #24448
Fixes: #10893
- - - - -
4c7ae2a1 by Andreas Klebinger at 2024-05-15T23:07:30-04:00
Testsuite: Check if llvm assembler is available for have_llvm
- - - - -
bc672166 by Torsten Schmits at 2024-05-15T23:08:06-04:00
refactor quadratic search in warnMissingHomeModules
- - - - -
7875e8cb by Torsten Schmits at 2024-05-15T23:08:06-04:00
add test that runs MakeDepend on thousands of modules
- - - - -
b84b91f5 by Adam Gundry at 2024-05-16T15:32:06-04:00
Representation-polymorphic HasField (fixes #22156)
This generalises the HasField class to support representation polymorphism,
so that instead of
type HasField :: forall {k} . k -> Type -> Type -> Constraint
we have
type HasField :: forall {k} {r_rep} {a_rep} . k -> TYPE r_rep -> TYPE a_rep -> Constraint
- - - - -
05285090 by Matthew Pickering at 2024-05-16T15:32:43-04:00
Bump os-string submodule to 2.0.2.2
Closes #24786
- - - - -
886ab43a by Cheng Shao at 2024-05-17T01:34:50-04:00
rts: do not prefetch mark_closure bdescr in non-moving gc when ASSERTS_ENABLED
This commit fixes a small an oversight in !12148: the prefetch logic
in non-moving GC may trap in debug RTS because it calls Bdescr() for
mark_closure which may be a static one. It's fine in non-debug RTS
because even invalid bdescr addresses are prefetched, they will not
cause segfaults, so this commit implements the most straightforward
fix: don't prefetch mark_closure bdescr when assertions are enabled.
- - - - -
b38dcf39 by Teo Camarasu at 2024-05-17T01:34:50-04:00
rts: Allocate non-moving segments with megablocks
Non-moving segments are 8 blocks long and need to be aligned.
Previously we serviced allocations by grabbing 15 blocks, finding
an aligned 8 block group in it and returning the rest.
This proved to lead to high levels of fragmentation as a de-allocating a segment
caused an 8 block gap to form, and this could not be reused for allocation.
This patch introduces a segment allocator based around using entire
megablocks to service segment allocations in bulk.
When there are no free segments, we grab an entire megablock and fill it
with aligned segments. As the megablock is free, we can easily guarantee
alignment. Any unused segments are placed on a free list.
It only makes sense to free segments in bulk when all of the segments in
a megablock are freeable. After sweeping, we grab the free list, sort it,
and find all groups of segments where they cover the megablock and free
them.
This introduces a period of time when free segments are not available to
the mutator, but the risk that this would lead to excessive allocation
is low. Right after sweep, we should have an abundance of partially full
segments, and this pruning step is relatively quick.
In implementing this we drop the logic that kept NONMOVING_MAX_FREE
segments on the free list.
We also introduce an eventlog event to log the amount of pruned/retained
free segments.
See Note [Segment allocation strategy]
Resolves #24150
-------------------------
Metric Decrease:
T13253
T19695
-------------------------
- - - - -
710665bd by Cheng Shao at 2024-05-17T01:35:30-04:00
rts: fix I/O manager compilation errors for win32 target
This patch fixes I/O manager compilation errors for win32 target
discovered when cross-compiling to win32 using recent clang:
```
rts/win32/ThrIOManager.c:117:7: error:
error: call to undeclared function 'is_io_mng_native_p'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
117 | if (is_io_mng_native_p ()) {
| ^
|
117 | if (is_io_mng_native_p ()) {
| ^
1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
rts/fs.c:143:28: error:
error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
143 | int setErrNoFromWin32Error () {
| ^
| void
|
143 | int setErrNoFromWin32Error () {
| ^
1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
rts/win32/ConsoleHandler.c:227:9: error:
error: call to undeclared function 'interruptIOManagerEvent'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
227 | interruptIOManagerEvent ();
| ^
|
227 | interruptIOManagerEvent ();
| ^
rts/win32/ConsoleHandler.c:227:9: error:
note: did you mean 'getIOManagerEvent'?
|
227 | interruptIOManagerEvent ();
| ^
rts/include/rts/IOInterface.h:27:10: error:
note: 'getIOManagerEvent' declared here
27 | void * getIOManagerEvent (void);
| ^
|
27 | void * getIOManagerEvent (void);
| ^
1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
rts/win32/ConsoleHandler.c:196:9: error:
error: call to undeclared function 'setThreadLabel'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
196 | setThreadLabel(cap, t, "signal handler thread");
| ^
|
196 | setThreadLabel(cap, t, "signal handler thread");
| ^
rts/win32/ConsoleHandler.c:196:9: error:
note: did you mean 'postThreadLabel'?
|
196 | setThreadLabel(cap, t, "signal handler thread");
| ^
rts/eventlog/EventLog.h:118:6: error:
note: 'postThreadLabel' declared here
118 | void postThreadLabel(Capability *cap,
| ^
|
118 | void postThreadLabel(Capability *cap,
| ^
1 error generated.
`x86_64-w64-mingw32-clang' failed in phase `C Compiler'. (Exit code: 1)
```
- - - - -
28b9cee0 by Rodrigo Mesquita at 2024-05-17T01:36:05-04:00
configure: Check C99-compat for Cmm preprocessor
Fixes #24815
- - - - -
8927e0c3 by Andreas Klebinger at 2024-05-17T01:36:41-04:00
Ensure `tcHasFixedRuntimeRep (# #)` returns True.
- - - - -
7e394b8b by Torsten Schmits at 2024-05-17T13:09:36+02:00
Parallelize getRootSummary computations in dep analysis downsweep
This reuses the upsweep step's infrastructure to process batches of
modules in parallel.
I benchmarked this by running `ghc -M` on two sets of 10,000 modules;
one with a linear dependency chain and the other with a binary tree.
Comparing different values for the number of modules per thread
suggested an optimum at `length targets `div` (n_cap * 2)`, with results
similar to this one (6 cores, 12 threads):
```
Benchmark 1: linear 1 jobs
Time (mean ± σ): 1.775 s ± 0.026 s [User: 1.377 s, System: 0.399 s]
Range (min … max): 1.757 s … 1.793 s 2 runs
Benchmark 2: linear 6 jobs
Time (mean ± σ): 876.2 ms ± 20.9 ms [User: 1833.2 ms, System: 518.6 ms]
Range (min … max): 856.2 ms … 898.0 ms 3 runs
Benchmark 3: linear 12 jobs
Time (mean ± σ): 793.5 ms ± 23.2 ms [User: 2318.9 ms, System: 718.6 ms]
Range (min … max): 771.9 ms … 818.0 ms 3 runs
```
Results don't differ much when the batch size is reduced to a quarter
of that, but there's significant thread scheduling overhead for a size
of 1:
```
Benchmark 1: linear 1 jobs
Time (mean ± σ): 2.611 s ± 0.029 s [User: 2.851 s, System: 0.783 s]
Range (min … max): 2.591 s … 2.632 s 2 runs
Benchmark 2: linear 6 jobs
Time (mean ± σ): 1.189 s ± 0.007 s [User: 2.707 s, System: 1.103 s]
Range (min … max): 1.184 s … 1.194 s 2 runs
Benchmark 3: linear 12 jobs
Time (mean ± σ): 1.097 s ± 0.006 s [User: 2.938 s, System: 1.300 s]
Range (min … max): 1.093 s … 1.101 s 2 runs
```
Larger batches also slightly worsen performance.
- - - - -
73558636 by Torsten Schmits at 2024-05-17T13:09:36+02:00
use thread-safe loggers
- - - - -
52871cd4 by Torsten Schmits at 2024-05-17T13:09:36+02:00
backwards compat
- - - - -
0d09c9df by Torsten Schmits at 2024-05-17T13:09:36+02:00
use wrapAction as well
- - - - -
30 changed files:
- compiler/GHC/Core/Map/Expr.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/Core/TyCo/Rep.hs
- compiler/GHC/Core/TyCon.hs
- compiler/GHC/CoreToIface.hs
- compiler/GHC/Data/FastString.hs
- compiler/GHC/Data/TrieMap.hs
- compiler/GHC/Driver/DynFlags.hs
- compiler/GHC/Driver/Flags.hs
- compiler/GHC/Driver/Main.hs
- compiler/GHC/Driver/Make.hs
- compiler/GHC/Driver/Session.hs
- compiler/GHC/Hs/Expr.hs
- compiler/GHC/Hs/Instances.hs
- compiler/GHC/Hs/Utils.hs
- compiler/GHC/HsToCore/Expr.hs
- compiler/GHC/HsToCore/GuardedRHSs.hs
- compiler/GHC/HsToCore/ListComp.hs
- compiler/GHC/HsToCore/Pmc/Desugar.hs
- compiler/GHC/HsToCore/Ticks.hs
- compiler/GHC/Iface/Binary.hs
- compiler/GHC/Iface/Decl.hs
- compiler/GHC/Iface/Env.hs
- compiler/GHC/Iface/Ext/Ast.hs
- compiler/GHC/Iface/Ext/Binary.hs
- compiler/GHC/Iface/Ext/Utils.hs
- compiler/GHC/Iface/Load.hs
- compiler/GHC/Iface/Recomp.hs
- compiler/GHC/Iface/Recomp/Binary.hs
- compiler/GHC/Iface/Syntax.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f037eaf84d4fa1f227f85ef45f75ab609554fd6f...0d09c9df74ff2dbed76a22af4a2250ba13a0c2d7
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f037eaf84d4fa1f227f85ef45f75ab609554fd6f...0d09c9df74ff2dbed76a22af4a2250ba13a0c2d7
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240517/fef8f2c6/attachment-0001.html>
More information about the ghc-commits
mailing list