[Git][ghc/ghc][wip/romes/rts-linker-direct-symbol-lookup] 55 commits: rts: Fix TSAN_ENABLED CPP guard
Rodrigo Mesquita (@alt-romes)
gitlab at gitlab.haskell.org
Thu Apr 4 10:08:23 UTC 2024
Rodrigo Mesquita pushed to branch wip/romes/rts-linker-direct-symbol-lookup at Glasgow Haskell Compiler / GHC
Commits:
c8a4c050 by Ben Gamari at 2024-04-02T12:50:35-04:00
rts: Fix TSAN_ENABLED CPP guard
This should be `#if defined(TSAN_ENABLED)`, not `#if TSAN_ENABLED`,
lest we suffer warnings.
- - - - -
e91dad93 by Cheng Shao at 2024-04-02T12:50:35-04:00
rts: fix errors when compiling with TSAN
This commit fixes rts compilation errors when compiling with TSAN:
- xxx_FENCE macros are redefined and trigger CPP warnings.
- Use SIZEOF_W. WORD_SIZE_IN_BITS is provided by MachDeps.h which
Cmm.h doesn't include by default.
- - - - -
a9ab9455 by Cheng Shao at 2024-04-02T12:50:35-04:00
rts: fix clang-specific errors when compiling with TSAN
This commit fixes clang-specific rts compilation errors when compiling
with TSAN:
- clang doesn't have -Wtsan flag
- Fix prototype of ghc_tsan_* helper functions
- __tsan_atomic_* functions aren't clang built-ins and
sanitizer/tsan_interface_atomic.h needs to be included
- On macOS, TSAN runtime library is
libclang_rt.tsan_osx_dynamic.dylib, not libtsan. -fsanitize-thread
as a link-time flag will take care of linking the TSAN runtime
library anyway so remove tsan as an rts extra library
- - - - -
865bd717 by Cheng Shao at 2024-04-02T12:50:35-04:00
compiler: fix github link to __tsan_memory_order in a comment
- - - - -
07cb627c by Cheng Shao at 2024-04-02T12:50:35-04:00
ci: improve TSAN CI jobs
- Run TSAN jobs with +thread_sanitizer_cmm which enables Cmm
instrumentation as well.
- Run TSAN jobs in deb12 which ships gcc-12, a reasonably recent gcc
that @bgamari confirms he's using in #GHC:matrix.org. Ideally we
should be using latest clang release for latest improvements in
sanitizers, though that's left as future work.
- Mark TSAN jobs as manual+allow_failure in validate pipelines. The
purpose is to demonstrate that we have indeed at least fixed
building of TSAN mode in CI without blocking the patch to land, and
once merged other people can begin playing with TSAN using their own
dev setups and feature branches.
- - - - -
a1c18c7b by Andrei Borzenkov at 2024-04-02T12:51:11-04:00
Merge tc_infer_hs_type and tc_hs_type into one function using ExpType philosophy (#24299, #23639)
This patch implements refactoring which is a prerequisite to
updating kind checking of type patterns. This is a huge simplification
of the main worker that checks kind of HsType.
It also fixes the issues caused by previous code duplication, e.g.
that we didn't add module finalizers from splices in inference mode.
- - - - -
817e8936 by Rodrigo Mesquita at 2024-04-02T20:13:05-04:00
th: Hide the Language.Haskell.TH.Lib.Internal module from haddock
Fixes #24562
- - - - -
b36ee57b by Sylvain Henry at 2024-04-02T20:13:46-04:00
JS: reenable h$appendToHsString optimization (#24495)
The optimization introducing h$appendToHsString wasn't kicking in
anymore (while it did in 9.8.1) because of the changes introduced in #23270 (7e0c8b3bab30).
This patch reenables the optimization by matching on case-expression, as
done in Cmm for unpackCString# standard thunks.
The test is also T24495 added in the next commits (two commits for ease
of backporting to 9.8).
- - - - -
527616e9 by Sylvain Henry at 2024-04-02T20:13:46-04:00
JS: fix h$appendToHsString implementation (#24495)
h$appendToHsString needs to wrap its argument in an updatable thunk
to behave like unpackAppendCString#. Otherwise if a SingleEntry thunk is
passed, it is stored as-is in a CONS cell, making the resulting list
impossible to deepseq (forcing the thunk doesn't update the contents of
the CONS cell)!
The added test checks that the optimization kicks in and that
h$appendToHsString works as intended.
Fix #24495
- - - - -
faa30b41 by Simon Peyton Jones at 2024-04-02T20:14:22-04:00
Deal with duplicate tyvars in type declarations
GHC was outright crashing before this fix: #24604
- - - - -
e0b0c717 by Simon Peyton Jones at 2024-04-02T20:14:58-04:00
Try using MCoercion in exprIsConApp_maybe
This is just a simple refactor that makes exprIsConApp_maybe
a little bit more direct, simple, and efficient.
Metrics: compile_time/bytes allocated
geo. mean -0.1%
minimum -2.0%
maximum -0.0%
Not a big gain, but worthwhile given that the code is, if anything,
easier to grok.
- - - - -
15f4d867 by Duncan Coutts at 2024-04-03T01:27:17-04:00
Initial ./configure support for selecting I/O managers
In this patch we just define new CPP vars, but don't yet use them
or replace the existing approach. That will follow.
The intention here is that every I/O manager can be enabled/disabled at
GHC build time (subject to some constraints). More than one I/O manager
can be enabled to be built. At least one I/O manager supporting the
non-threaded RTS must be enabled as well as at least one supporting the
non-threaded RTS. The I/O managers enabled here will become the choices
available at runtime at RTS startup (in later patches). The choice can
be made with RTS flags. There are separate sets of choices for the
threaded and non-threaded RTS ways, because most I/O managers are
specific to these ways. Furthermore we must establish a default I/O
manager for the threaded and non-threaded RTS.
Most I/O managers are platform-specific so there are checks to ensure
each one can be enabled on the platform. Such checks are also where (in
future) any system dependencies (e.g. libraries) can be checked.
The output is a set of CPP flags (in the mk/config.h file), with one
flag per named I/O manager:
* IOMGR_BUILD_<name> : which ones should be built (some)
* IOMGR_DEFAULT_NON_THREADED_<name> : which one is default (exactly one)
* IOMGR_DEFAULT_THREADED_<name> : which one is default (exactly one)
and a set of derived flags in IOManager.h
* IOMGR_ENABLED_<name> : enabled for the current RTS way
Note that IOMGR_BUILD_<name> just says that an I/O manager will be
built for _some_ RTS way (i.e. threaded or non-threaded). The derived
flags IOMGR_ENABLED_<name> in IOManager.h say if each I/O manager is
enabled in the "current" RTS way. These are the ones that can be used
for conditional compilation of the I/O manager code.
Co-authored-by: Pi Delport <pi at well-typed.com>
- - - - -
85b0f87a by Duncan Coutts at 2024-04-03T01:27:17-04:00
Change the handling of the RTS flag --io-manager=
Now instead of it being just used on Windows to select between the WinIO
vs the MIO or Win32-legacy I/O managers, it is now used on all platforms
for selecting the I/O manager to use.
Right now it remains the case that there is only an actual choice on
Windows, but that will change later.
Document the --io-manager flag in the user guide.
This change is also reflected in the RTS flags types in the base
library. Deprecate the export of IoSubSystem from GHC.RTS.Flags with a
message to import it from GHC.IO.Subsystem.
The way the 'IoSubSystem' is detected also changes. Instead of looking
at the RTS flag, there is now a C bool global var in the RTS which gets
set on startup when the I/O manager is selected. This bool var says
whether the selected I/O manager classifies as "native" on Windows,
which in practice means the WinIO I/O manager has been selected.
Similarly, the is_io_mng_native_p RTS helper function is re-implemented
in terms of the selected I/O manager, rather than based on the RTS
flags.
We do however remove the ./configure --native-io-manager flag because
we're bringing the WinIO/MIO/Win32-legacy choice under the new general
scheme for selecting I/O managers, and that new scheme involves no
./configure time user choices, just runtime RTS flag choices.
- - - - -
1a8f020f by Duncan Coutts at 2024-04-03T01:27:17-04:00
Convert {init,stop,exit}IOManager to switch style
Rather than ad-hoc cpp conitionals on THREADED_RTS and mingw32_HOST_OS,
we use a style where we switch on the I/O manager impl, with cases for
each I/O manager impl.
- - - - -
a5bad3d2 by Duncan Coutts at 2024-04-03T01:27:17-04:00
Split up the CapIOManager content by I/O manager
Using the new IOMGR_ENABLED_<name> CPP defines.
- - - - -
1d36e609 by Duncan Coutts at 2024-04-03T01:27:17-04:00
Convert initIOManagerAfterFork and wakeupIOManager to switch style
- - - - -
c2f26f36 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Move most of waitRead#/Write# from cmm to C
Moves it into the IOManager.c where we can follow the new pattern of
switching on the selected I/O manager.
- - - - -
457705a8 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Move most of the delay# impl from cmm to C
Moves it into the IOManager.c where we can follow the new pattern of
switching on the selected I/O manager.
Uses a new IOManager API: syncDelay, following the naming convention of
sync* for thread-synchronous I/O & timer/delay operations.
As part of porting from cmm to C, we maintain the rule that the
why_blocked gets accessed using load acquire and store release atomic
memory operations. There was one exception to this rule: in the delay#
primop cmm code on posix (not win32), the why_blocked was being updated
using a store relaxed, not a store release. I've no idea why. In this
convesion I'm playing it safe here and using store release consistently.
- - - - -
e93058e0 by Duncan Coutts at 2024-04-03T01:27:18-04:00
insertIntoSleepingQueue is no longer public
No longer defined in IOManager.h, just a private function in
IOManager.c. Since it is no longer called from cmm code, just from
syncDelay. It ought to get moved further into the select() I/O manager
impl, rather than living in IOManager.c.
On the other hand appendToIOBlockedQueue is still called from cmm code
in the win32-legacy I/O manager primops async{Read,Write}#, and it is
also used by the select() I/O manager. Update the CPP and comments to
reflect this.
- - - - -
60ce9910 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Move anyPendingTimeoutsOrIO impl from .h to .c
The implementation is eventually going to need to use more private
things, which will drag in unwanted includes into IOManager.h, so it's
better to move the impl out of the header file and into the .c file, at
the slight cost of it no longer being inline.
At the same time, change to the "switch (iomgr_type)" style.
- - - - -
f70b8108 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Take a simpler approach to gcc warnings in IOManager.c
We have lots of functions with conditional implementations for
different I/O managers. Some functions, for some I/O managers,
naturally have implementations that do nothing or barf. When only one
such I/O manager is enabled then the whole function implementation will
have an implementation that does nothing or barfs. This then results in
warnings from gcc that parameters are unused, or that the function
should be marked with attribute noreturn (since barf does not return).
The USED_IF_THREADS trick for fine-grained warning supression is fine
for just two cases, but an equivalent here would need
USED_IF_THE_ONLY_ENABLED_IOMGR_IS_X_OR_Y which would have combinitorial
blowup. So we take a coarse grained approach and simply disable these
two warnings for the whole file.
So we use a GCC pragma, with its handy push/pop support:
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wsuggest-attribute=noreturn"
#pragma GCC diagnostic ignored "-Wunused-parameter"
...
#pragma GCC diagnostic pop
- - - - -
b48805b9 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Add a new trace class for the iomanager
It makes sense now for it to be separate from the scheduler class of
tracers.
Enabled with +RTS -Do. Document the -Do debug flag in the user guide.
- - - - -
f0c1f862 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Have the throwTo impl go via (new) IOManager APIs
rather than directly operating on the IO manager's data structures.
Specifically, when thowing an async exception to a thread that is
blocked waiting for I/O or waiting for a timer, then we want to cancel
that I/O waiting or cancel the timer. Currently this is done directly in
removeFromQueues() in RaiseAsync.c. We want it to go via proper APIs
both for modularity but also to let us support multiple I/O managers.
So add sync{IO,Delay}Cancel, which is the cancellation for the
corresponding sync{IO,Delay}. The implementations of these use the usual
"switch (iomgr_type)" style.
- - - - -
4f9e9c4e by Duncan Coutts at 2024-04-03T01:27:18-04:00
Move awaitEvent into a proper IOManager API
and have the scheduler use it.
Previously the scheduler calls awaitEvent directly, and awaitEvent is
implemented directly in the RTS I/O managers (select, win32). This
relies on the old scheme where there's a single active I/O manager for
each platform and RTS way.
We want to move that to go via an API in IOManager.{h,c} which can then
call out to the active I/O manager.
Also take the opportunity to split awaitEvent into two. The existing
awaitEvent has a bool wait parameter, to say if the call should be
blocking or non-blocking. We split this into two separate functions:
pollCompletedTimeoutsOrIO and awaitCompletedTimeoutsOrIO. We split them
for a few reasons: they have different post-conditions (specifically the
await version is supposed to guarantee that there are threads runnable
when it completes). Secondly, it is also anticipated that in future I/O
managers the implementations of the two cases will be simpler if they
are separated.
- - - - -
5ad4b30f by Duncan Coutts at 2024-04-03T01:27:18-04:00
Rename awaitEvent in select and win32 I/O managers
These are now just called from IOManager.c and are the per-I/O manager
backend impls (whereas previously awaitEvent was the entry point).
Follow the new naming convention in the IOManager.{h,c} of
awaitCompletedTimeoutsOrIO, with the I/O manager's name as a suffix:
so awaitCompletedTimeoutsOrIO{Select,Win32}.
- - - - -
d30c6bc6 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Tidy up a couple things in Select.{h,c}
Use the standard #include {Begin,End}Private.h style rather than
RTS_PRIVATE on individual decls.
And conditionally build the code for the select I/O manager based on
the new CPP IOMGR_ENABLED_SELECT rather than on THREADED_RTS.
- - - - -
4161f516 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Add an IOManager API for scavenging TSO blocked_info
When the GC scavenges a TSO it needs to scavenge the tso->blocked_info
but the blocked_info is a big union and what lives there depends on the
two->why_blocked, which for I/O-related reasons is something that in
principle is the responsibility of the I/O manager and not the GC. So
the right thing to do is for the GC to ask the I/O manager to sscavenge
the blocked_info if it encounters any I/O-related why_blocked reasons.
So we add scavengeTSOIOManager in IOManager.{h,c} with the usual style.
Now as it happens, right now, there is no special scavenging to do, so
the implementation of scavengeTSOIOManager is a fancy no-op. That's
because the select I/O manager uses only the fd and target members,
which are not GC pointers, and the win32-legacy I/O manager _ought_ to
be using GC-managed heap objects for the StgAsyncIOResult but it is
actually usingthe C heap, so again no GC pointers. If the win32-legacy
were doing this more sensibly, then scavengeTSOIOManager would be the
right place to do the GC magic.
Future I/O managers will need GC heap objects in the tso->blocked_info
and will make use of this functionality.
- - - - -
94a87d21 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Add I/O manager API notifyIOManagerCapabilitiesChanged
Used in setNumCapabilities.
It only does anything for MIO on Posix.
Previously it always invoked Haskell code, but that code only did
anything on non-Windows (and non-JS), and only threaded. That currently
effectively means the MIO I/O manager on Posix.
So now it only invokes it for the MIO Posix case.
- - - - -
3be6d591 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Select an I/O manager early in RTS startup
We need to select the I/O manager to use during startup before the
per-cap I/O manager initialisation.
- - - - -
aaa294d0 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Make struct CapIOManager be fully opaque
Provide an opaque (forward) definition in Capability.h (since the cap
contains a *CapIOManager) and then only provide a full definition in
a new file IOManagerInternals.h. This new file is only supposed to be
included by the IOManager implementation, not by its users. So that
means IOManager.c and individual I/O manager implementations.
The posix/Signals.c still needs direct access, but that should be
eliminated. Anything that needs direct access either needs to be clearly
part of an I/O manager (e.g. the sleect() one) or go via a proper API.
- - - - -
877a2a80 by Duncan Coutts at 2024-04-03T01:27:18-04:00
The select() I/O manager does have some global initialisation
It's just to make sure an exception CAF is a GC root.
- - - - -
9c51473b by Duncan Coutts at 2024-04-03T01:27:18-04:00
Add tracing for the main I/O manager actions
Using the new tracer class.
Note: The unconditional definition of showIOManager should be
compatible with the debugTrace change in 7c7d1f6.
Co-authored-by: Pi Delport <pi at well-typed.com>
- - - - -
c7d3e3a3 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Include the default I/O manager in the +RTS --info output
Document the extra +RTS --info output in the user guide
- - - - -
8023bad4 by Duncan Coutts at 2024-04-03T01:27:18-04:00
waitRead# / waitWrite# do not work for win32-legacy I/O manager
Previously it was unclear that they did not work because the code path
was shared with other I/O managers (in particular select()).
Following the code carefully shows that what actually happens is that
the calling thread would block forever: the thread will be put into the
blocked queue, but no other action is scheduled that will ever result in
it getting unblocked.
It's better to just fail loudly in case anyone accidentally calls it,
also it's less confusing code.
- - - - -
83a74d20 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Conditionally ignore some GCC warnings
Some GCC versions don't know about some warnings, and they complain
that we're ignoring unknown warnings. So we try to ignore the warning
based on the GCC version.
- - - - -
1adc6fa4 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Accept changes to base-exports
All the changes are in fact not changes at all.
Previously, the IoSubSystem data type was defined in GHC.RTS.Flags and
exported from both GHC.RTS.Flags and GHC.IO.SubSystem. Now, the data
type is defined in GHC.IO.SubSystem and still exported from both
modules.
Therefore, the same exports and same instances are still available from
both modules. But the base-exports records only the defining module, and
so it looks like a change when it is fully compatible.
Related: we do add a deprecation to the export of the type via
GHC.RTS.Flags, telling people to use the export from GHC.IO.SubSystem.
Also the sort order for some unrelated Show instances changed. No idea
why.
The same changes apply in the other versions, with a few more changes
due to sort order weirdness.
- - - - -
8d950968 by Duncan Coutts at 2024-04-03T01:27:18-04:00
Accept metric decrease in T12227
I can't think of any good reason that anything in this MR should have
changed the number of allocations, up or down.
(Yes this is an empty commit.)
Metric Decrease:
T12227
- - - - -
e869605e by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Several improvements to the handling of coercions
* Make `mkSymCo` and `mkInstCo` smarter
Fixes #23642
* Fix return role of `SelCo` in the coercion optimiser.
Fixes #23617
* Make the coercion optimiser `opt_trans_rule` work better for newtypes
Fixes #23619
- - - - -
1efd0714 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
FloatOut: improve floating for join point
See the new Note [Floating join point bindings].
* Completely get rid of the complicated join_ceiling nonsense, which
I have never understood.
* Do not float join points at all, except perhaps to top level.
* Some refactoring around wantToFloat, to treat Rec and NonRec more
uniformly
- - - - -
9c00154d by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Improve eta-expansion through call stacks
See Note [Eta expanding through CallStacks] in GHC.Core.Opt.Arity
This is a one-line change, that fixes an inconsistency
- || isCallStackPredTy ty
+ || isCallStackPredTy ty || isCallStackTy ty
- - - - -
95a9a172 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Spelling, layout, pretty-printing only
- - - - -
bdf1660f by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Improve exprIsConApp_maybe a little
Eliminate a redundant case at birth. This sometimes reduces
Simplifier iterations.
See Note [Case elim in exprIsConApp_maybe].
- - - - -
609cd32c by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Inline GHC.HsToCore.Pmc.Solver.Types.trvVarInfo
When exploring compile-time regressions after meddling with the Simplifier, I
discovered that GHC.HsToCore.Pmc.Solver.Types.trvVarInfo was very delicately
balanced. It's a small, heavily used, overloaded function and it's important
that it inlines. By a fluke it was before, but at various times in my journey it
stopped doing so. So I just added an INLINE pragma to it; no sense in depending
on a delicately-balanced fluke.
- - - - -
ae24c9bc by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Slight improvement in WorkWrap
Ensure that WorkWrap preserves lambda binders, in case of join points. Sadly I
have forgotten why I made this change (it was while I was doing a lot of
meddling in the Simplifier, but
* it does no harm,
* it is slightly more efficient, and
* presumably it made something better!
Anyway I have kept it in a separate commit.
- - - - -
e9297181 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Use named record fields for the CastIt { ... } data constructor
This is a pure refactor
- - - - -
b4581e23 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Remove a long-commented-out line
Pure refactoring
- - - - -
e026bdf2 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Simplifier improvements
This MR started as: allow the simplifer to do more in one pass,
arising from places I could see the simplifier taking two iterations
where one would do. But it turned into a larger project, because
these changes unexpectedly made inlining blow up, especially join
points in deeply-nested cases.
The main changes are below. There are also many new or rewritten Notes.
Avoiding simplifying repeatedly
~~~~~~~~~~~~~~~
See Note [Avoiding simplifying repeatedly]
* The SimplEnv now has a seInlineDepth field, which says how deep
in unfoldings we are. See Note [Inline depth] in Simplify.Env.
Currently used only for the next point: avoiding repeatedly
simplifying coercions.
* Avoid repeatedly simplifying coercions.
see Note [Avoid re-simplifying coercions] in Simplify.Iteration
As you'll see from the Note, this makes use of the seInlineDepth.
* Allow Simplify.Iteration.simplAuxBind to inline used-once things.
This is another part of Note [Post-inline for single-use things], and
is really good for reducing simplifier iterations in situations like
case K e of { K x -> blah }
wher x is used once in blah.
* Make GHC.Core.SimpleOpt.exprIsConApp_maybe do some simple case
elimination. Note [Case elim in exprIsConApp_maybe]
* Improve the case-merge transformation:
- Move the main code to `GHC.Core.Utils.mergeCaseAlts`, to join `filterAlts`
and friends. See Note [Merge Nested Cases] in GHC.Core.Utils.
- Add a new case for `tagToEnum#`; see wrinkle (MC3).
- Add a new case to look through join points: see wrinkle (MC4)
postInlineUnconditionally
~~~~~~~~~~~~~~~~~~~~~~~~~
* Allow Simplify.Utils.postInlineUnconditionally to inline variables
that are used exactly once. See Note [Post-inline for single-use things].
* Do not postInlineUnconditionally join point, ever.
Doing so does not reduce allocation, which is the main point,
and with join points that are used a lot it can bloat code.
See point (1) of Note [Duplicating join points] in
GHC.Core.Opt.Simplify.Iteration.
* Do not postInlineUnconditionally a strict (demanded) binding.
It will not allocate a thunk (it'll turn into a case instead)
so again the main point of inlining it doesn't hold. Better
to check per-call-site.
* Improve occurrence analyis for bottoming function calls, to help
postInlineUnconditionally. See Note [Bottoming function calls]
in GHC.Core.Opt.OccurAnal
Inlining generally
~~~~~~~~~~~~~~~~~~
* In GHC.Core.Opt.Simplify.Utils.interestingCallContext,
use RhsCtxt NonRecursive (not BoringCtxt) for a plain-seq case.
See Note [Seq is boring] Also, wrinkle (SB1), inline in that
`seq` context only for INLINE functions (UnfWhen guidance).
* In GHC.Core.Opt.Simplify.Utils.interestingArg,
- return ValueArg for OtherCon [c1,c2, ...], but
- return NonTrivArg for OtherCon []
This makes a function a little less likely to inline if all we
know is that the argument is evaluated, but nothing else.
* isConLikeUnfolding is no longer true for OtherCon {}.
This propagates to exprIsConLike. Con-like-ness has /positive/
information.
Join points
~~~~~~~~~~~
* Be very careful about inlining join points.
See these two long Notes
Note [Duplicating join points] in GHC.Core.Opt.Simplify.Iteration
Note [Inlining join points] in GHC.Core.Opt.Simplify.Inline
* When making join points, don't do so if the join point is so small
it will immediately be inlined; check uncondInlineJoin.
* In GHC.Core.Opt.Simplify.Inline.tryUnfolding, improve the inlining
heuristics for join points. In general we /do not/ want to inline
join points /even if they are small/. See Note [Duplicating join points]
GHC.Core.Opt.Simplify.Iteration.
But sometimes we do: see Note [Inlining join points] in
GHC.Core.Opt.Simplify.Inline; and the new `isBetterUnfoldingThan` function.
* Do not add an unfolding to a join point at birth. This is a tricky one
and has a long Note [Do not add unfoldings to join points at birth]
It shows up in two places
- In `mkDupableAlt` do not add an inlining
- (trickier) In `simplLetUnfolding` don't add an unfolding for a
fresh join point
I am not fully satisifed with this, but it works and is well documented.
* In GHC.Core.Unfold.sizeExpr, make jumps small, so that we don't penalise
having a non-inlined join point.
Performance changes
~~~~~~~~~~~~~~~~~~~
* Binary sizes fall by around 2.6%, according to nofib.
* Compile times improve slightly. Here are the figures over 1%.
I investiate the biggest differnce in T18304. It's a very small module, just
a few hundred nodes. The large percentage difffence is due to a single
function that didn't quite inline before, and does now, making code size a
bit bigger. I decided gains outweighed the losses.
Metrics: compile_time/bytes allocated (changes over +/- 1%)
------------------------------------------------
CoOpt_Singletons(normal) -9.2% GOOD
LargeRecord(normal) -23.5% GOOD
MultiComponentModulesRecomp(normal) +1.2%
MultiLayerModulesTH_OneShot(normal) +4.1% BAD
PmSeriesS(normal) -3.8%
PmSeriesV(normal) -1.5%
T11195(normal) -1.3%
T12227(normal) -20.4% GOOD
T12545(normal) -3.2%
T12707(normal) -2.1% GOOD
T13253(normal) -1.2%
T13253-spj(normal) +8.1% BAD
T13386(normal) -3.1% GOOD
T14766(normal) -2.6% GOOD
T15164(normal) -1.4%
T15304(normal) +1.2%
T15630(normal) -8.2%
T15630a(normal) NEW
T15703(normal) -14.7% GOOD
T16577(normal) -2.3% GOOD
T17516(normal) -39.7% GOOD
T18140(normal) +1.2%
T18223(normal) -17.1% GOOD
T18282(normal) -5.0% GOOD
T18304(normal) +10.8% BAD
T18923(normal) -2.9% GOOD
T1969(normal) +1.0%
T19695(normal) -1.5%
T20049(normal) -12.7% GOOD
T21839c(normal) -4.1% GOOD
T3064(normal) -1.5%
T3294(normal) +1.2% BAD
T4801(normal) +1.2%
T5030(normal) -15.2% GOOD
T5321Fun(normal) -2.2% GOOD
T6048(optasm) -16.8% GOOD
T783(normal) -1.2%
T8095(normal) -6.0% GOOD
T9630(normal) -4.7% GOOD
T9961(normal) +1.9% BAD
WWRec(normal) -1.4%
info_table_map_perf(normal) -1.3%
parsing001(normal) +1.5%
geo. mean -2.0%
minimum -39.7%
maximum +10.8%
* Runtimes generally improve. In the testsuite perf/should_run gives:
Metrics: runtime/bytes allocated
------------------------------------------
Conversions(normal) -0.3%
T13536a(optasm) -41.7% GOOD
T4830(normal) -0.1%
haddock.Cabal(normal) -0.1%
haddock.base(normal) -0.1%
haddock.compiler(normal) -0.1%
geo. mean -0.8%
minimum -41.7%
maximum +0.0%
* For runtime, nofib is a better test. The news is mostly good.
Here are the number more than +/- 0.1%:
# bytes allocated
==========================++==========
imaginary/digits-of-e1 || -14.40%
imaginary/digits-of-e2 || -4.41%
imaginary/paraffins || -0.17%
imaginary/rfib || -0.15%
imaginary/wheel-sieve2 || -0.10%
real/compress || -0.47%
real/fluid || -0.10%
real/fulsom || +0.14%
real/gamteb || -1.47%
real/gg || -0.20%
real/infer || +0.24%
real/pic || -0.23%
real/prolog || -0.36%
real/scs || -0.46%
real/smallpt || +4.03%
shootout/k-nucleotide || -20.23%
shootout/n-body || -0.42%
shootout/spectral-norm || -0.13%
spectral/boyer2 || -3.80%
spectral/constraints || -0.27%
spectral/hartel/ida || -0.82%
spectral/mate || -20.34%
spectral/para || +0.46%
spectral/rewrite || +1.30%
spectral/sphere || -0.14%
==========================++==========
geom mean || -0.59%
real/smallpt has a huge nest of local definitions, and I
could not pin down a reason for a regression. But there are
three big wins!
Metric Decrease:
CoOpt_Singletons
LargeRecord
T12227
T12707
T13386
T13536a
T14766
T15703
T16577
T17516
T18223
T18282
T18923
T21839c
T20049
T5321Fun
T5030
T6048
T8095
T9630
T783
Metric Increase:
MultiLayerModulesTH_OneShot
T13253-spj
T18304
T18698a
T9961
T3294
- - - - -
27db3c5e by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Testsuite message changes from simplifier improvements
- - - - -
271a7812 by Simon Peyton Jones at 2024-04-03T01:27:55-04:00
Account for bottoming functions in OccurAnal
This fixes #24582, a small but long-standing bug
- - - - -
6a38d7b6 by Rodrigo Mesquita at 2024-04-04T11:08:06+01:00
loader: Note down suggestion for needed_mods
The associated ticket is #24600
- - - - -
a7e985fb by Rodrigo Mesquita at 2024-04-04T11:08:07+01:00
rts: free error message before returning
Fixes a memory leak in rts/linker/PEi386.c
- - - - -
8b985376 by Alexis King at 2024-04-04T11:08:07+01:00
linker: Avoid linear search when looking up Haskell symbols via dlsym
See the primary Note [Looking up symbols in the relevant objects] for a
more in-depth explanation.
When dynamically loading a Haskell symbol (typical when running a splice or
GHCi expression), before this commit we would search for the symbol in
all dynamic libraries that were loaded. However, this could be very
inefficient when too many packages are loaded (which can happen if there are
many package dependencies) because the time to lookup the would be
linear in the number of packages loaded.
This commit drastically improves symbol loading performance by
introducing a mapping from units to the handles of corresponding loaded
dlls. These handles are returned by dlopen when we load a dll, and can
then be used to look up in a specific dynamic library.
Looking up a given Name is now much more precise because we can get
lookup its unit in the mapping and lookup the symbol solely in the
handles of the dynamic libraries loaded for that unit.
In one measurement, the wait time before the expression was executed
went from +-38 seconds down to +-2s.
This commit also includes Note [Symbols may not be found in pkgs_loaded],
explaining the fallback to the old behaviour in case no dll can be found
in the unit mapping for a given Name.
Fixes #23415
Co-authored-by: Rodrigo Mesquita (@alt-romes)
- - - - -
38a83ce0 by Rodrigo Mesquita at 2024-04-04T11:08:07+01:00
rts: Make addDLL a wrapper around loadNativeObj
Rewrite the implementation of `addDLL` as a wrapper around the more
principled `loadNativeObj` rts linker function. The latter should be
preferred while the former is preserved for backwards compatibility.
`loadNativeObj` was previously only available on ELF platforms, so this
commit further refactors the rts linker to transform loadNativeObj_ELF
into loadNativeObj_POSIX, which is available in ELF and MachO platforms.
The refactor made it possible to remove the `dl_mutex` mutex in favour
of always using `linker_mutex` (rather than a combination of both).
Lastly, we implement `loadNativeObj` for Windows too.
- - - - -
d50ce08d by Rodrigo Mesquita at 2024-04-04T11:08:07+01:00
Use symbol cache in internal interpreter too
This commit makes the symbol cache that was used by the external
interpreter available for the internal interpreter too.
This follows from the analysis in #23415 that suggests the internal
interpreter could benefit from this cache too, and that there is no good
reason not to have the cache for it too. It also makes it a bit more
uniform to have the symbol cache range over both the internal and
external interpreter.
This commit also refactors the cache into a function which is used by
both `lookupSymbol` and also by `lookupSymbolInDLL`, extending the
caching logic to `lookupSymbolInDLL` too.
- - - - -
1a021a8a by Ben Gamari at 2024-04-04T11:08:07+01:00
testsuite: Add test for lookupSymbolInNativeObj
- - - - -
17 changed files:
- .gitlab/generate-ci/gen_ci.hs
- .gitlab/jobs.yaml
- compiler/GHC.hs
- compiler/GHC/ByteCode/Linker.hs
- compiler/GHC/Cmm/ThreadSanitizer.hs
- compiler/GHC/Core.hs
- compiler/GHC/Core/Coercion.hs
- compiler/GHC/Core/Coercion/Opt.hs
- compiler/GHC/Core/Opt/Arity.hs
- compiler/GHC/Core/Opt/FloatOut.hs
- compiler/GHC/Core/Opt/Monad.hs
- compiler/GHC/Core/Opt/OccurAnal.hs
- compiler/GHC/Core/Opt/Pipeline.hs
- compiler/GHC/Core/Opt/SetLevels.hs
- compiler/GHC/Core/Opt/Simplify.hs
- compiler/GHC/Core/Opt/Simplify/Env.hs
- compiler/GHC/Core/Opt/Simplify/Inline.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/00f908b07ae52b0ecfea90eb708a309da0257abe...1a021a8a4cfe4ee06b840611d035d99191ad5ac8
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/00f908b07ae52b0ecfea90eb708a309da0257abe...1a021a8a4cfe4ee06b840611d035d99191ad5ac8
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240404/531c3004/attachment-0001.html>
More information about the ghc-commits
mailing list