[Git][ghc/ghc][wip/fendor/ghc-iface-sharing-avoid-reserialisation] 75 commits: EPA: Use EpaLocation for RecFieldsDotDot

Hannes Siebenhandl (@fendor) gitlab at gitlab.haskell.org
Mon Apr 22 09:53:51 UTC 2024



Hannes Siebenhandl pushed to branch wip/fendor/ghc-iface-sharing-avoid-reserialisation at Glasgow Haskell Compiler / GHC


Commits:
19883a23 by Alan Zimmerman at 2024-04-05T16:58:17-04:00
EPA: Use EpaLocation for RecFieldsDotDot

So we can update it to a delta position in makeDeltaAst if needed.

- - - - -
e8724327 by Matthew Pickering at 2024-04-05T16:58:53-04:00
Remove accidentally committed test.hs

- - - - -
88cb3e10 by Fendor at 2024-04-08T09:03:34-04:00
Avoid UArray when indexing is not required

`UnlinkedBCO`'s can occur many times in the heap. Each `UnlinkedBCO`
references two `UArray`'s but never indexes them. They are only needed
to encode the elements into a `ByteArray#`. The three words for
the lower bound, upper bound and number of elements are essentially
unused, thus we replace `UArray` with a wrapper around `ByteArray#`.
This saves us up to three words for each `UnlinkedBCO`.

Further, to avoid re-allocating these words for `ResolvedBCO`, we repeat
the procedure for `ResolvedBCO` and add custom `Binary` and `Show` instances.

For example, agda's repl session has around 360_000 UnlinkedBCO's,
so avoiding these three words is already saving us around 8MB residency.

- - - - -
f2cc1107 by Fendor at 2024-04-08T09:04:11-04:00
Never UNPACK `FastMutInt` for counting z-encoded `FastString`s

In `FastStringTable`, we count the number of z-encoded FastStrings
that exist in a GHC session.
We used to UNPACK the counters to not waste memory, but live retainer
analysis showed that we allocate a lot of `FastMutInt`s, retained by
`mkFastZString`.

We lazily compute the `FastZString`, only incrementing the counter when the `FastZString` is
forced.
The function `mkFastStringWith` calls `mkZFastString` and boxes the
`FastMutInt`, leading to the following core:

    mkFastStringWith
      = \ mk_fs _  ->
             = case stringTable of
                { FastStringTable _ n_zencs segments# _ ->
                    ...
                         case ((mk_fs (I# ...) (FastMutInt n_zencs))
                            `cast` <Co:2> :: ...)
                            ...

Marking this field as `NOUNPACK` avoids this reboxing, eliminating the
allocation of a fresh `FastMutInt` on every `FastString` allocation.

- - - - -
c6def949 by Matthew Pickering at 2024-04-08T16:06:51-04:00
Force in_multi to avoid retaining entire hsc_env

- - - - -
fbb91a63 by Fendor at 2024-04-08T16:06:51-04:00
Eliminate name thunk in declaration fingerprinting

Thunk analysis showed that we have about 100_000 thunks (in agda and
`-fwrite-simplified-core`) pointing to the name of the name decl.
Forcing this thunk fixes this issue.

The thunk created here is retained by the thunk created by forkM, it is
better to eagerly force this because the result (a `Name`) is already
retained indirectly via the `IfaceDecl`.

- - - - -
3b7b0c1c by Alan Zimmerman at 2024-04-08T16:07:27-04:00
EPA: Use EpaLocation in WarningTxt

This allows us to use an EpDelta if needed when using makeDeltaAst.

- - - - -
12b997df by Alan Zimmerman at 2024-04-08T16:07:27-04:00
EPA: Move DeltaPos and EpaLocation' into GHC.Types.SrcLoc

This allows us to use a NoCommentsLocation for the possibly trailing
comma location in a StringLiteral.
This in turn allows us to correctly roundtrip via makeDeltaAst.

- - - - -
868c8a78 by Fendor at 2024-04-09T08:51:50-04:00
Prefer packed representation for CompiledByteCode

As there are many 'CompiledByteCode' objects alive during a GHCi
session, representing its element in a more packed manner improves space
behaviour at a minimal cost.

When running GHCi on the agda codebase, we find around 380 live
'CompiledByteCode' objects. Packing their respective 'UnlinkedByteCode'
can save quite some pointers.

- - - - -
be3bddde by Alan Zimmerman at 2024-04-09T08:52:26-04:00
EPA: Capture all comments in a ClassDecl

Hopefully the final fix needed for #24533

- - - - -
3d0806fc by Jade at 2024-04-10T05:39:53-04:00
Validate -main-is flag using parseIdentifier

Fixes #24368

- - - - -
dd530bb7 by Rodrigo Mesquita at 2024-04-10T05:40:29-04:00
rts: free error message before returning

Fixes a memory leak in rts/linker/PEi386.c

- - - - -
e008a19a by Alexis King at 2024-04-10T05:40:29-04:00
linker: Avoid linear search when looking up Haskell symbols via dlsym

See the primary Note [Looking up symbols in the relevant objects] for a
more in-depth explanation.

When dynamically loading a Haskell symbol (typical when running a splice or
GHCi expression), before this commit we would search for the symbol in
all dynamic libraries that were loaded. However, this could be very
inefficient when too many packages are loaded (which can happen if there are
many package dependencies) because the time to lookup the would be
linear in the number of packages loaded.

This commit drastically improves symbol loading performance by
introducing a mapping from units to the handles of corresponding loaded
dlls. These handles are returned by dlopen when we load a dll, and can
then be used to look up in a specific dynamic library.

Looking up a given Name is now much more precise because we can get
lookup its unit in the mapping and lookup the symbol solely in the
handles of the dynamic libraries loaded for that unit.

In one measurement, the wait time before the expression was executed
went from +-38 seconds down to +-2s.

This commit also includes Note [Symbols may not be found in pkgs_loaded],
explaining the fallback to the old behaviour in case no dll can be found
in the unit mapping for a given Name.

Fixes #23415

Co-authored-by: Rodrigo Mesquita (@alt-romes)

- - - - -
dcfaa190 by Rodrigo Mesquita at 2024-04-10T05:40:29-04:00
rts: Make addDLL a wrapper around loadNativeObj

Rewrite the implementation of `addDLL` as a wrapper around the more
principled `loadNativeObj` rts linker function. The latter should be
preferred while the former is preserved for backwards compatibility.

`loadNativeObj` was previously only available on ELF platforms, so this
commit further refactors the rts linker to transform loadNativeObj_ELF
into loadNativeObj_POSIX, which is available in ELF and MachO platforms.

The refactor made it possible to remove the `dl_mutex` mutex in favour
of always using `linker_mutex` (rather than a combination of both).

Lastly, we implement `loadNativeObj` for Windows too.

- - - - -
12931698 by Rodrigo Mesquita at 2024-04-10T05:40:29-04:00
Use symbol cache in internal interpreter too

This commit makes the symbol cache that was used by the external
interpreter available for the internal interpreter too.

This follows from the analysis in #23415 that suggests the internal
interpreter could benefit from this cache too, and that there is no good
reason not to have the cache for it too. It also makes it a bit more
uniform to have the symbol cache range over both the internal and
external interpreter.

This commit also refactors the cache into a function which is used by
both `lookupSymbol` and also by `lookupSymbolInDLL`, extending the
caching logic to `lookupSymbolInDLL` too.

- - - - -
dccd3ea1 by Ben Gamari at 2024-04-10T05:40:29-04:00
testsuite: Add test for lookupSymbolInNativeObj

- - - - -
1b1a92bd by Alan Zimmerman at 2024-04-10T05:41:05-04:00
EPA: Remove unnecessary XRec in CompleteMatchSig

The XRec for [LIdP pass] is not needed for exact printing, remove it.

- - - - -
6e18ce2b by Ben Gamari at 2024-04-12T08:16:09-04:00
users-guide: Clarify language extension documentation

Over the years the users guide's language extension documentation has
gone through quite a few refactorings. In the process some of the
descriptions have been rendered non-sensical. For instance, the
description of `NoImplicitPrelude` actually describes the semantics of
`ImplicitPrelude`.

To fix this we:

 * ensure that all extensions are named in their "positive" sense (e.g.
   `ImplicitPrelude` rather than `NoImplicitPrelude`).
 * rework the documentation to avoid flag-oriented wording
   like "enable" and "disable"
 * ensure that the polarity of the documentation is consistent with
   reality.

Fixes #23895.

- - - - -
a933aff3 by Zubin Duggal at 2024-04-12T08:16:45-04:00
driver: Make `checkHomeUnitsClosed` faster

The implementation of `checkHomeUnitsClosed` was traversing every single path
in the unit dependency graph - this grows exponentially and quickly grows to be
infeasible on larger unit dependency graphs.

Instead we replace this with a faster implementation which follows from the
specificiation of the closure property - there is a closure error if there are
units which are both are both (transitively) depended upon by home units and
(transitively) depend on home units, but are not themselves home units.

To compute the set of units required for closure, we first compute the closure
of the unit dependency graph, then the transpose of this closure, and find all
units that are reachable from the home units in the transpose of the closure.

- - - - -
23c3e624 by Andreas Klebinger at 2024-04-12T08:17:21-04:00
RTS: Emit warning when -M < -H

Fixes #24487

- - - - -
d23afb8c by Ben Gamari at 2024-04-12T08:17:56-04:00
testsuite: Add broken test for CApiFFI with -fprefer-bytecode

See #24634.

- - - - -
a4bb3a51 by Ben Gamari at 2024-04-12T08:18:32-04:00
base: Deprecate GHC.Pack

As proposed in #21461.

Closes #21540.

- - - - -
55eb8c98 by Ben Gamari at 2024-04-12T08:19:08-04:00
ghc-internal: Fix mentions of ghc-internal in deprecation warnings

Closes #24609.

- - - - -
b0fbd181 by Ben Gamari at 2024-04-12T08:19:44-04:00
rts: Implement set_initial_registers for AArch64

Fixes #23680.

- - - - -
14c9ec62 by Ben Gamari at 2024-04-12T08:20:20-04:00
ghcup-metadata: Use Debian 9 binaries on Ubuntu 16, 17

Closes #24646.

- - - - -
35a1621e by Ben Gamari at 2024-04-12T08:20:55-04:00
Bump unix submodule to 2.8.5.1

Closes #24640.

- - - - -
a1c24df0 by Finley McIlwaine at 2024-04-12T08:21:31-04:00
Correct default -funfolding-use-threshold in docs

- - - - -
0255d03c by Oleg Grenrus at 2024-04-12T08:22:07-04:00
FastString is a __Modified__ UTF-8

- - - - -
c3489547 by Matthew Pickering at 2024-04-12T13:13:44-04:00
rts: Improve tracing message when nursery is resized

It is sometimes more useful to know how much bigger or smaller the
nursery got when it is resized.

In particular I am trying to investigate situations where we end up with
fragmentation due to the nursery (#24577)

- - - - -
5e4f4ba8 by Simon Peyton Jones at 2024-04-12T13:14:20-04:00
Don't generate wrappers for `type data` constructors with StrictData

Previously, the logic for checking if a data constructor needs a wrapper or not
would take into account whether the constructor's fields have explicit
strictness (e.g., `data T = MkT !Int`), but the logic would _not_ take into
account whether `StrictData` was enabled. This meant that something like `type
data T = MkT Int` would incorrectly generate a wrapper for `MkT` if
`StrictData` was enabled, leading to the horrible errors seen in #24620. To fix
this, we disable generating wrappers for `type data` constructors altogether.

Fixes #24620.

Co-authored-by: Ryan Scott <ryan.gl.scott at gmail.com>

- - - - -
dbdf1995 by Alex Mason at 2024-04-15T15:28:26+10:00
Implements MO_S_Mul2 and MO_U_Mul2 using the  UMULH, UMULL and SMULH instructions for AArch64

Also adds a test for MO_S_Mul2

- - - - -
42bd0407 by Teo Camarasu at 2024-04-16T20:06:39-04:00
Make template-haskell a stage1 package

Promoting template-haskell from a stage0 to a stage1 package means that
we can much more easily refactor template-haskell.

We implement this by duplicating the in-tree `template-haskell`.
A new `template-haskell-next` library is autogenerated to mirror `template-haskell`
`stage1:ghc` to depend on the new interface of the library including the
`Binary` instances without adding an explicit dependency on `template-haskell`.

This is controlled by the `bootstrap-th` cabal flag

When building `template-haskell` modules as part of this vendoring we do
not have access to quote syntax, so we cannot use variable quote
notation (`'Just`). So we either replace these with hand-written `Name`s
or hide the code behind CPP.

We can remove the `th_hack` from hadrian, which was required when
building stage0 packages using the in-tree `template-haskell` library.

For more details see Note [Bootstrapping Template Haskell].

Resolves #23536

Co-Authored-By: Sebastian Graf <sgraf1337 at gmail.com>
Co-Authored-By: Matthew Craven <5086-clyring at users.noreply.gitlab.haskell.org>

- - - - -
3d973e47 by Ben Gamari at 2024-04-16T20:07:15-04:00
Bump parsec submodule to 3.1.17.0

- - - - -
9d38bfa0 by Simon Peyton Jones at 2024-04-16T20:07:51-04:00
Clone CoVars in CorePrep

This MR addresses #24463.  It's all explained in the new

   Note [Cloning CoVars and TyVars]

- - - - -
0fe2b410 by Andreas Klebinger at 2024-04-16T20:08:27-04:00
NCG: Fix a bug where we errounously removed a required jump instruction.

Add a new method to the Instruction class to check if we can eliminate a
jump in favour of fallthrough control flow.

Fixes #24507

- - - - -
9f99126a by Teo Camarasu at 2024-04-16T20:09:04-04:00
Fix documentation preview from doc-tarball job

- Include all the .html files and assets in the job artefacts
- Include all the .pdf files in the job artefacts
- Mark the artefact as an "exposed" artefact meaning it turns up in the
  UI.

Resolves #24651

- - - - -
3a0642ea by Ben Gamari at 2024-04-16T20:09:39-04:00
rts: Ignore EINTR while polling in timerfd itimer implementation

While the RTS does attempt to mask signals, it may be that a foreign
library unmasks them. This previously caused benign warnings which we
now ignore.

See #24610.

- - - - -
9a53cd3f by Alan Zimmerman at 2024-04-16T20:10:15-04:00
EPA: Add additional comments field to AnnsModule

This is used in exact printing to store comments coming after the
`where` keyword but before any comments allocated to imports or decls.

It is used in ghc-exactprint, see
https://github.com/alanz/ghc-exactprint/commit/44bbed311fd8f0d053053fef195bf47c17d34fa7

- - - - -
e5c43259 by Bryan Richter at 2024-04-16T20:10:51-04:00
Remove unrunnable FreeBSD CI jobs

FreeBSD runner supply is inelastic. Currently there is only one, and
it's unavailable because of a hardware issue.

- - - - -
914eb49a by Ben Gamari at 2024-04-16T20:11:27-04:00
rel-eng: Fix mktemp usage in recompress-all

We need a temporary directory, not a file.

- - - - -
f30e4984 by Teo Camarasu at 2024-04-16T20:12:03-04:00
Fix ghc API link in docs/index.html

This was missing part of the unit ID meaning it would 404.

Resolves #24674

- - - - -
d7a3d6b5 by Ben Gamari at 2024-04-16T20:12:39-04:00
template-haskell: Declare TH.Lib.Internal as not-home

Rather than `hide`.

Closes #24659.

- - - - -
5eaa46e7 by Matthew Pickering at 2024-04-19T02:14:55-04:00
testsuite: Rename isCross() predicate to needsTargetWrapper()

isCross() was a misnamed because it assumed that all cross targets would
provide a target wrapper, but the two most common cross targets
(javascript, wasm) don't need a target wrapper.

Therefore we rename this predicate to `needsTargetWrapper()` so
situations in the testsuite where we can check whether running
executables requires a target wrapper or not.

- - - - -
55a9d699 by Simon Peyton Jones at 2024-04-19T02:15:32-04:00
Do not float HNFs out of lambdas

This MR adjusts SetLevels so that it is less eager to float a
HNF (lambda or constructor application) out of a lambda, unless
it gets to top level.

Data suggests that this change is a small net win:
 * nofib bytes-allocated falls by -0.09% (but a couple go up)
 * perf/should_compile bytes-allocated falls by -0.5%
 * perf/should_run bytes-allocated falls by -0.1%
See !12410 for more detail.

When fiddling elsewhere, I also found that this patch had a huge
positive effect on the (very delicate) test
  perf/should_run/T21839r
But that improvement doesn't show up in this MR by itself.

Metric Decrease:
    MultiLayerModulesRecomp
    T15703
    parsing001

- - - - -
f0701585 by Alan Zimmerman at 2024-04-19T02:16:08-04:00
EPA: Fix comments in mkListSyntaxTy0

Also extend the test to confirm.

Addresses #24669, 1 of 4

- - - - -
b01c01d4 by Serge S. Gulin at 2024-04-19T02:16:51-04:00
JS: set image `x86_64-linux-deb11-emsdk-closure` for build

- - - - -
c90c6039 by Alan Zimmerman at 2024-04-19T02:17:27-04:00
EPA: Provide correct span for PatBind

And remove unused parameter in checkPatBind

Contributes to #24669

- - - - -
26036f96 by Alan Zimmerman at 2024-04-19T13:11:08-04:00
EPA: Fix span for PatBuilderAppType

Include the location of the prefix @ in the span for InVisPat.

Also removes unnecessary annotations from HsTP.

Contributes to #24669

- - - - -
dba03aab by Matthew Craven at 2024-04-19T13:11:44-04:00
testsuite: Give the pre_cmd for mhu-perf more time

- - - - -
d31fbf6c by Krzysztof Gogolewski at 2024-04-19T21:04:09-04:00
Fix quantification order for a `op` b and a %m -> b

Fixes #23764

Implements https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0640-tyop-quantification-order.rst

Updates haddock submodule.

- - - - -
385cd1c4 by Sebastian Graf at 2024-04-19T21:04:45-04:00
Make `seq#` a magic Id and inline it in CorePrep (#24124)

We can save much code and explanation in Tag Inference and StgToCmm by making
`seq#` a known-key Magic Id in `GHC.Internal.IO` and inline this definition in
CorePrep. See the updated `Note [seq# magic]`.
I also implemented a new `Note [Flatten case-bind]` to get better code for
otherwise nested case scrutinees.

I renamed the contructors of `ArgInfo` to use an `AI` prefix in order to
resolve the clash between `type CpeApp = CoreExpr` and the data constructor of
`ArgInfo`, as well as fixed typos in `Note [CorePrep invariants]`.

Fixes #24252 and #24124.

- - - - -
275e41a9 by Jade at 2024-04-20T11:10:40-04:00
Put the newline after errors instead of before them

This mainly has consequences for GHCi but also slightly alters how the
output of GHC on the commandline looks.

Fixes: #22499

- - - - -
dd339c7a by Teo Camarasu at 2024-04-20T11:11:16-04:00
Remove unecessary stage0 packages

Historically quite a few packages had to be stage0 as they depended on
`template-haskell` and that was stage0. In #23536 we made it so that was
no longer the case. This allows us to remove a bunch of packages from
this list.

A few still remain. A new version of `Win32` is required by
`semaphore-compat`. Including `Win32` in the stage0 set requires also
including `filepath` because otherwise Hadrian's dependency logic gets
confused. Once our boot compiler has a newer version of `Win32` all of
these will be able to be dropped.

Resolves #24652

- - - - -
2f8e3a25 by Alan Zimmerman at 2024-04-20T11:11:52-04:00
EPA: Avoid duplicated comments in splice decls

Contributes to #24669

- - - - -
c70b9ddb by Serge S. Gulin at 2024-04-21T16:33:43+03:00
JS: fix typos and namings (fixes #24602)

You may noted that I've also changed term of

```
, global "h$vt_double" ||= toJExpr IntV
```

See "IntV"

and

```
  WaitReadOp  -> \[] [fd] -> pure $ PRPrimCall $ returnS (app
"h$waidRead" [fd])
```

See "h$waidRead"

- - - - -
3db54f9b by Serge S. Gulin at 2024-04-21T16:33:43+03:00
JS: trivial checks for variable presence (fixes #24602)

- - - - -
777f108f by Serge S. Gulin at 2024-04-21T16:33:43+03:00
JS: fs module imported twice (by emscripten and by ghc-internal). ghc-internal import wrapped
in a closure to prevent conflict with emscripten (fixes #24602)

Better solution is to use some JavaScript module system like AMD, CommonJS or even UMD. It will be investigated at other issues.
At first glance we should try UMD (See https://github.com/umdjs/umd)

- - - - -
a45a5712 by Serge S. Gulin at 2024-04-21T16:33:43+03:00
JS: thread.js requires h$fds and h$fdReady to be declared for static code analysis, minimal
code copied from GHCJS (fixes #24602)

I've just copied some old pieces of GHCJS from publicly available sources (See https://github.com/Taneb/shims/blob/a6dd0202dcdb86ad63201495b8b5d9763483eb35/src/io.js#L607).
Also I didn't put details to h$fds. I took minimal and left only its object initialization: `var h$fds = {};`

- - - - -
ad90bf12 by Serge S. Gulin at 2024-04-21T16:33:43+03:00
JS: heap and stack overflows reporting defined as js hard failure (fixes #24602)

These errors were treated as a hard failure for browser application. The fix is trivial: just throw error.

- - - - -
5962fa52 by Serge S. Gulin at 2024-04-21T16:33:44+03:00
JS: Stubs for code without actual implementation detected by Google Closure Compiler (fixes #24602)

These errors were fixed just by introducing stubbed functions with throw for further implementation.

- - - - -
a0694298 by Serge S. Gulin at 2024-04-21T16:34:07+03:00
JS: Add externs to linker (fixes #24602)

After enabling jsdoc and built-in google closure compiler types I was needed to deal with the following:

1. Define NodeJS-environment types. I've just copied minimal set of externs from semi-official repo (see https://github.com/externs/nodejs/blob/6c6882c73efcdceecf42e7ba11f1e3e5c9c041f0/v8/nodejs.js#L8).
2. Define Emscripten-environment types: `HEAP8`. Emscripten already provides some externs in our code but it supposed to be run in some module system. And its definitions do not work well in plain bundle.
3. We have some functions which purpose is to add to functions some contextual information via function properties. These functions should be marked as `modifies` to let google closure compiler remove calls if these functions are not used actually by call graph. Such functions are: `h$o`, `h$sti`, `h$init_closure`, `h$setObjInfo`.
4. STG primitives such as registries and stuff from `GHC.StgToJS`. `dXX` properties were already present at externs generator function but they are started from `7`, not from `1`. This message is related: `// fixme does closure compiler bite us here?`

- - - - -
e58bb29f by Serge S. Gulin at 2024-04-21T16:34:07+03:00
JS: added both tests: for size and for correctness (fixes #24602)

By some reason MacOS builds add to stderr messages like:

    Ignoring unexpected archive entry:
    __.SYMDEF
    ...

However I left stderr to `/dev/null` for compatibility with linux CI builds.

- - - - -
909f3a9c by Serge S. Gulin at 2024-04-21T16:34:07+03:00
JS: Disable js linker warning for empty symbol table to make js tests running consistent across environments

- - - - -
83eb10da by Serge S. Gulin at 2024-04-21T16:34:07+03:00
JS: Add special preprocessor for js files due of needing to keep jsdoc comments (fixes #24602)

Our js files have defined google closure compiler types at jsdoc entries but these jsdoc entries are removed by cpp preprocessor. I considered that reusing them in javascript-backend would be a nice thing. Right now haskell processor uses `-traditional` option to deal with comments and `//` operators.
But now there are following compiler options: `-C` and `-CC`.
You can read about them at GCC (see https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html#index-CC) and CLang (see https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-CC).
It seems that `-CC` works better for javascript jsdoc than `-traditional`.
At least it leaves `/* ... */` comments w/o changes.

- - - - -
e1cf8dc2 by brandon s allbery kf8nh at 2024-04-22T03:48:26-04:00
fix link in CODEOWNERS

It seems that our local Gitlab no longer has documentation for the
`CODEOWNERS` file, but the master documentation still does. Use
that instead.

- - - - -
47ad2517 by Fendor at 2024-04-22T10:13:17+02:00
Refactor the Binary serialisation interface

The end goal is to dynamically add deduplication tables for `ModIface`
interface serialisation.

We identify two main points of interest that make this difficult:

1. UserData hardcodes what `Binary` instances can have deduplication
   tables. Moreover, it heavily uses partial functions.
2. GHC.Iface.Binary hardcodes the deduplication tables for 'Name' and
   'FastString', making it difficult to add more deduplication.

Instead of having a single `UserData` record with fields for all the
types that can have deduplication tables, we allow to provide custom
serialisers for any `Typeable`.
These are wrapped in existentials and stored in a `Map` indexed by their
respective `TypeRep`.
The `Binary` instance of the type to deduplicate still needs to
explicitly look up the decoder via `findUserDataReader` and
`findUserDataWriter`, which is no worse than the status-quo.

`Map` was chosen as microbenchmarks indicate it is the fastest for a
small number of keys (< 10).

To generalise the deduplication table serialisation mechanism, we
introduce the types `ReaderTable` and `WriterTable` which provide a
simple interface that is sufficient to implement a general purpose
deduplication mechanism for `writeBinIface` and `readBinIface`.

This allows us to provide a list of deduplication tables for
serialisation that can be extended more easily, for example for
`IfaceTyCon`, see the issue https://gitlab.haskell.org/ghc/ghc/-/issues/24540
for more motivation.

In addition to ths refactoring, we split `UserData` into `ReaderUserData`
and `WriterUserData`, to avoid partial functions and reduce overall
memory usage, as we need fewer mutable variables.

Bump haddock submodule to accomodate for `UserData` split.

-------------------------
Metric Increase:
    MultiLayerModulesTH_Make
    MultiLayerModulesRecomp
    T21839c
-------------------------

- - - - -
c80d115b by Fendor at 2024-04-22T10:13:22+02:00
Split `BinHandle` into `ReadBinHandle` and `WriteBinHandle`

A `BinHandle` contains too much information for reading data.
For example, it needs to keep a `FastMutInt` and a `IORef BinData`,
when the non-mutable variants would suffice.

Additionally, this change has the benefit that anyone can immediately
tell whether the `BinHandle` is used for reading or writing.

Bump haddock submodule BinHandle split.

- - - - -
b2f88df7 by Fendor at 2024-04-22T10:37:29+02:00
Add Eq and Ord instance to `IfaceType`

We add an `Ord` instance so that we can store `IfaceType` in a
`Data.Map` container.
This is required to deduplicate `IfaceType` while writing `.hi` files to
disk. Deduplication has many beneficial consequences to both file size
and memory usage, as the deduplication enables implicit sharing of
values.
See issue #24540 for more motivation.

The `Ord` instance would be unnecessary if we used a `TrieMap` instead
of `Data.Map` for the deduplication process. While in theory this is
clerarly the better option, experiments on the agda code base showed
that a `TrieMap` implementation has worse run-time performance
characteristics.

To the change itself, we mostly derive `Eq` and `Ord`. This requires us
to change occurrences of `FastString` with `LexicalFastString`, since
`FastString` has no `Ord` instance.
We change the definition of `IfLclName` to a newtype of
`LexicalFastString`, to make such changes in the future easier.

Bump haddock submodule for IfLclName changes

- - - - -
25abd579 by Fendor at 2024-04-22T11:27:33+02:00
Break cyclic module dependency

- - - - -
0e8e5bf0 by Fendor at 2024-04-22T11:27:33+02:00
Add deduplication table for `IfaceType`

The type `IfaceType` is a highly redundant, tree-like data structure.
While benchmarking, we realised that the high redundancy of `IfaceType`
causes high memory consumption in GHCi sessions when byte code is
embedded into the `.hi` file via `-fwrite-if-simplified-core` or
`-fbyte-code-and-object-code`.
Loading such `.hi` files from disk introduces many duplicates of
memory expensive values in `IfaceType`, such as `IfaceTyCon`,
`IfaceTyConApp`, `IA_Arg` and many more.

We improve the memory behaviour of GHCi by adding an additional
deduplication table for `IfaceType` to the serialisation of `ModIface`,
similar to how we deduplicate `Name`s and `FastString`s.
When reading the interface file back, the table allows us to automatically
share identical values of `IfaceType`.

To provide some numbers, we evaluated this patch on the agda code base.
We loaded the full library from the `.hi` files, which contained the
embedded core expressions (`-fwrite-if-simplified-core`).

Before this patch:

* Load time: 11.7 s, 2.5 GB maximum residency.

After this patch:

* Load time:  7.3 s, 1.7 GB maximum residency.

This deduplication has the beneficial side effect to additionally reduce
the size of the on-disk interface files tremendously.

For example, on agda, we reduce the size of `.hi` files (with
`-fwrite-if-simplified-core`):

* Before: 101 MB on disk
* Now:     24 MB on disk

This has even a beneficial side effect on the cabal store. We reduce the
size of the store on disk:

* Before: 341 MB on disk
* Now:    310 MB on disk

Note, none of the dependencies have been compiled with
`-fwrite-if-simplified-core`, but `IfaceType` occurs in multiple
locations in a `ModIface`.

We also add IfaceType deduplication table to .hie serialisation and
refactor .hie file serialisation to use the same infrastrucutre as
`putWithTables`.

- - - - -
faf75a40 by Matthew Pickering at 2024-04-22T11:27:33+02:00
Add run-time configurability of .hi file compression

Introduce the flag `-fwrite-if-compression=<n>` which allows to
configure the compression level of writing .hi files.

The motivation is that some deduplication operations are too expensive
for the average use case. Hence, we introduce multiple compression
levels that have a minimal impact on performance, but still reduce the
memory residency and `.hi` file size on disk considerably.

We introduce three compression levels:

* `1`: `Normal` mode. This is the least amount of compression.
    It deduplicates only `Name` and `FastString`s, and is naturally the
    fastest compression mode.
* `2`: `Safe` mode. It has a noticeable impact on .hi file size and is
  marginally slower than `Normal` mode. In general, it should be safe to
  always use `Safe` mode.
* `3`: `Full` deduplication mode. Deduplicate as much as we can,
  resulting in minimal .hi files, but at the cost of additional
  compilation time.

Reading .hi files doesn't need to know the initial compression level,
and can always deserialise a `ModIface`.
This allows users to experiment with different compression levels for
packages, without recompilation of dependencies.

Note, the deduplication also has an additional side effect of reduced
memory consumption to implicit sharing of deduplicated elements.
See https://gitlab.haskell.org/ghc/ghc/-/issues/24540 for example where
that matters.

-------------------------
Metric Decrease:
    T21839c
    T24471
-------------------------

- - - - -
860806e9 by Matthew Pickering at 2024-04-22T11:27:33+02:00
Introduce regression tests for `.hi` file sizes

Add regression tests to track how `-fwrite-if-compression` levels affect
the size of `.hi` files.

- - - - -
a42dcf7a by Fendor at 2024-04-22T11:37:35+02:00
Implement TrieMap for IfaceType

- - - - -
85a7dab9 by Fendor at 2024-04-22T11:38:41+02:00
Share duplicated values in ModIface after generation

This helps keeping down the peak memory usage while compiling in
`--make` mode.

- - - - -
35caa779 by Fendor at 2024-04-22T11:53:41+02:00
Reuse the 'ReadBinMem' after sharing to avoid recomputations

- - - - -


30 changed files:

- .gitignore
- .gitlab-ci.yml
- .gitlab/generate-ci/gen_ci.hs
- .gitlab/jobs.yaml
- .gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
- .gitlab/rel_eng/recompress-all
- CODEOWNERS
- compiler/GHC.hs
- compiler/GHC/Builtin/Names.hs
- compiler/GHC/Builtin/PrimOps.hs
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/ByteCode/Linker.hs
- compiler/GHC/ByteCode/Types.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/BlockLayout.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/Monad.hs
- compiler/GHC/CmmToAsm/PPC.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs
- compiler/GHC/CmmToAsm/Reg/Liveness.hs
- compiler/GHC/CmmToAsm/X86.hs
- compiler/GHC/CmmToAsm/X86/Instr.hs
- compiler/GHC/Core/Map/Expr.hs
- compiler/GHC/Core/Opt/ConstantFold.hs
- compiler/GHC/Core/Opt/DmdAnal.hs
- compiler/GHC/Core/Opt/SetLevels.hs


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3632e6cd3a9dc352bbf516b58e505f419d7a7832...35caa77917244c34b44c47f534b966c09a9ce351

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3632e6cd3a9dc352bbf516b58e505f419d7a7832...35caa77917244c34b44c47f534b966c09a9ce351
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240422/2b97cf50/attachment-0001.html>


More information about the ghc-commits mailing list