[Git][ghc/ghc][wip/ncg-simd] 34 commits: GNU/Hurd: Add getExecutablePath support

Wed Sep 25 08:30:33 UTC 2024

sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC

Commits:
3939a8bf by Samuel Thibault at 2024-09-16T10:33:44-04:00
GNU/Hurd: Add getExecutablePath support

GNU/Hurd exposes it as /proc/self/exe just like on Linux.

- - - - -
d3b19851 by Sylvain Henry at 2024-09-17T11:03:28-04:00
RTS: expose closure_sizeW_ (#25252)

C code using the closure_sizeW macro can't be linked with the RTS linker
without this patch. It fails with:

  ghc-9.11.20240911: Failed to lookup symbol: closure_sizeW_

Fix #25252

Co-authored-by: Hamish Mackenzie <Hamish.K.Mackenzie at gmail.com>
Co-authored-by: Moritz Angermann <moritz.angermann at gmail.com>

- - - - -
137bf74d by Sebastian Graf at 2024-09-17T11:04:05-04:00
HsExpr: Inline `HsWrap` into `WrapExpr`

This nice refactoring was suggested by Simon during review:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13261#note_583374

Fixes #25264.

- - - - -
7fd9e5e2 by Sebastian Graf at 2024-09-17T11:04:05-04:00
Pmc: Improve Desugaring of overloaded list patterns (#25257)

This actually makes things simpler.

Fixes #25257.

- - - - -
e4169ba9 by Ben Gamari at 2024-09-18T07:55:28-04:00
configure: Correctly report when subsections-via-symbols is disabled

As noted in #24962, currently subsections-via-symbols is disabled on
AArch64/Darwin due to alleged breakage. However, `configure` reports to
the user that it is enabled. Fix this.

- - - - -
9d20a787 by Mario Blažević at 2024-09-18T07:56:08-04:00
Modified the default export implementation to match the amended spec

- - - - -
35eb4f42 by Sylvain Henry at 2024-09-18T07:57:00-04:00
FFI: don't ppr Id/Var symbols with debug info (#25255)

Even if `-dpp-debug` is enabled we should still generate valid C code.
So we disable debug info printing when rendering with Code style.

- - - - -
9e96dad8 by Sebastian Graf at 2024-09-21T17:47:59-04:00
Demand: Combine examples into Note (#25107)

Just a leftover from !13060.

Fixes #25107.

- - - - -
21aaa34b by sheaf at 2024-09-21T17:48:36-04:00
Use x86_64-unknown-windows-gnu target for LLVM on Windows

- - - - -
992a7624 by sheaf at 2024-09-21T17:48:36-04:00
LLVM: use -relocation-model=pic on Windows

This is necessary to avoid the segfaults reported in #22487.

Fixes #22487

- - - - -
c50d29be by Ryan Hendrickson at 2024-09-21T17:49:15-04:00
compiler: Use type abstractions when deriving

For deriving newtype and deriving via, in order to bring type variables
needed for the coercions into scope, GHC generates type signatures for
derived class methods. As a simplification, drop the type signatures and
instead use type abstractions to bring method type variables into scope.

- - - - -
f04fd0ae by Zubin Duggal at 2024-09-21T17:49:51-04:00
driver: Ensure we run driverPlugin for staticPlugins (#25217)

driverPlugins are only run when the plugin state changes. This meant they were
never run for static plugins, as their state never changes.

We need to keep track of whether a static plugin has been initialised to ensure
we run static driver plugins at least once. This necessitates an additional field
in the `StaticPlugin` constructor as this state has to be bundled with the plugin
itself, as static plugins have no name/identifier we can use to otherwise reference
them

- - - - -
620becd7 by Andreas Klebinger at 2024-09-21T17:50:27-04:00
Allow unknown fd device types for setNonBlockingMode.

This allows fds with a unknown device type to have blocking mode
set. This happens for example for fds from the inotify subsystem.

Fixes #25199.

- - - - -
c76e25b3 by Hécate Kleidukos at 2024-09-21T17:51:07-04:00
Use Hackage version of Cabal 3.14.0.0 for Hadrian.
We remove the vendored Cabal submodule.

Also update the bootstrap plans

Fixes #25086

- - - - -
6c83fd7f by Zubin Duggal at 2024-09-21T17:51:07-04:00
ci: Ensure we source ci.sh in any jobs that run commands outside of ci.sh

ci.sh sets up the toolchain environment, including paths for the cabal directory, the
toolchain binaries etc. If we run any commands outside of ci.sh, unless we
source ci.sh we will use the wrong values for these environment variables.

In particular, I ran into an issue where the cabal invocation `hadrian/ghci` was
using an old index state despite `ci.sh setup` updating and setting the correct
index state. This is because `ci.sh` sets the `CABAL_DIR` to a different place, which
is where the index was downloaded to, but we were using the default cabal directory
outside ci.sh

The solution is to source the correct environment `ci.sh` using `. ci.sh setup`

- - - - -
9586998d by Sven Tennie at 2024-09-21T17:51:43-04:00
ghc-toolchain: Set -fuse-ld even for ld.bfd

This reflects the behaviour of the autoconf scripts.

- - - - -
d7016e0d by Sylvain Henry at 2024-09-21T17:52:24-04:00
Parser: be more careful when lexing extended literals (#25258)

Previously we would lex invalid prefixes like "8#Int3" as [8#Int, 3].

A side-effect of this patch is that we now allow negative unsigned
extended literals. They trigger an overflow warning later anyway.

- - - - -
ca67d7cb by Zubin Duggal at 2024-09-22T02:34:06-04:00
rts: Ensure we dump new Cost Centres added by freshly loaded objects to the eventlog.

To do this, we keep track of the ID of the last cost centre we dumped in DUMPED_CC_ID,
and call dumpCostCentresToEventLog from refreshProfilingCCSs, which will dump all the new
cost centres up to the one we already dumped in DUMPED_CC_ID.

Fixes #24148

- - - - -
c0df5aa9 by Alan Zimmerman at 2024-09-22T02:34:42-04:00
EPA: Replace AnnsModule am_main with EpTokens

Working towards removing `AddEpAnn`

- - - - -
61a060eb by sheaf at 2024-09-25T10:29:39+02:00
The X86 SIMD patch.

This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.

Main changes:

  - Introduction of vector formats (`GHC.CmmToAsm.Format`)
  - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
    and removal of unused Float virtual register.
  - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
    two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
    (for registers that can be used for scalar floating point values as well
    as vectors).
  - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
    of which format each register is used at, so that the register
    allocator can know if it needs to spill the entire vector register
    or just the lower 64 bits.
  - Modify spill/load/reg-2-reg code to account for vector registers
    (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
  - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
    the format we are storing in any given register, for instance changing
    `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
  - Add logic to lower vector `MachOp`s to X86 assembly
    (see `GHC.CmmToAsm.X86.CodeGen`)
  - Minor cleanups to genprimopcode, to remove the llvm_only attribute
    which is no longer applicable.

Tests for this feature are provided in the "testsuite/tests/simd" directory.

Fixes #7741

Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.

-------------------------
Metric Increase:
    T12707
    T13035
    T13379
    T3294
    T4801
    T5321FD
    T5321Fun
    T783
-------------------------

- - - - -
b05410b3 by sheaf at 2024-09-25T10:29:39+02:00
Use xmm registers in genapply

This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.

It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.

The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.

Fixes #25062

- - - - -
37d16c80 by sheaf at 2024-09-25T10:29:39+02:00
Add vector fused multiply-add operations

This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.

- - - - -
699457cf by sheaf at 2024-09-25T10:29:40+02:00
Add vector shuffle primops

This adds vector shuffle primops, such as

```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```

which shuffle the components of the input two vectors into the output vector.

NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.

These are handled in the X86 NCG and the LLVM backend.

Tested in simd009.

- - - - -
b386e1ca by sheaf at 2024-09-25T10:29:40+02:00
Add Broadcast MachOps

This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).

These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.

- - - - -
fac72d54 by sheaf at 2024-09-25T10:29:40+02:00
Fix treatment of signed zero in vector negation

This commit fixes the handling of signed zero in floating-point vector
negation.

A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.

The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.

- - - - -
73657caf by sheaf at 2024-09-25T10:29:41+02:00
Add min/max primops

This commit adds min/max primops, such as

  minDouble# :: Double# -> Double# -> Double#
  minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
  minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#

These are supported in:
  - the X86, AArch64 and PowerPC NCGs,
  - the LLVM backend,
  - the WebAssembly and JavaScript backends.

Fixes #25120

- - - - -
8a80f932 by sheaf at 2024-09-25T10:29:41+02:00
Add test for C calls & SIMD vectors

- - - - -
6798e3b8 by sheaf at 2024-09-25T10:29:41+02:00
Add test for #25169

- - - - -
646c5b63 by sheaf at 2024-09-25T10:30:00+02:00
Fix #25169 using Plan A from the ticket

We now compile certain low-level Cmm functions in the RTS multiple
times, with different levels of vector support. We then dispatch
at runtime in the RTS, based on what instructions are supported.

See Note [realArgRegsCover] in GHC.Cmm.CallConv.

Fixes #25169

-------------------------
Metric Increase:
    T10421
    T12425
    T18730
    T1969
    T9198
-------------------------

- - - - -
3b41ae5e by sheaf at 2024-09-25T10:30:11+02:00
Fix C calls with SIMD vectors

This commit fixes the code generation for C calls, to take into account
the calling convention.

This is particularly tricky on Windows, where all vectors are expected
to be passed by reference. See Note [The Windows X64 C calling convention]
in GHC.CmmToAsm.X86.CodeGen.

- - - - -
f0c2ff0a by sheaf at 2024-09-25T10:30:11+02:00
X86 CodeGen: refactor getRegister CmmLit

This refactors the code dealing with loading literals into registers,
removing duplication and putting all the code in a single place.
It also changes which XOR instruction is used to place a zero value
into a register, so that we use VPXOR for a 128-bit integer vector
when AVX is supported.

- - - - -
e8d68fd1 by sheaf at 2024-09-25T10:30:11+02:00
X86 genCCall: promote arg before calling evalArgs

The job of evalArgs is to ensure each argument is put into a temporary
register, so that it can then be loaded directly into one of the
argument registers for the C call, without the generated code clobbering
any other register used for argument passing.

However, if we promote arguments after calling evalArgs, there is the
possibility that the code used for the promotion will clobber a register,
defeating the work of evalArgs.
To avoid this, we first promote arguments, and only then call evalArgs.

- - - - -
629b174a by sheaf at 2024-09-25T10:30:11+02:00
X86 genCCall64: simplify loadArg code

This commit simplifies the argument loading code by making the
assumption that it is safe to directly load the argument into register,
because doing so will not clobber any previous assignments.

This assumption is borne from the use of 'evalArgs', which evaluates
any arguments which might necessitate non-trivial code generation into
separate temporary registers.

- - - - -
3fc52e65 by sheaf at 2024-09-25T10:30:11+02:00
LLVM: propagate GlobalRegUse information

This commit ensures we keep track of how any particular global register
is being used in the LLVM backend. This informs the LLVM type
annotations, and avoids type mismatches of the following form:

  argument is not of expected type '<2 x double>'
    call ccc <2 x double> (<2 x double>)
      (<4 x i32> arg)

- - - - -

30 changed files:

- .gitignore
- .gitlab-ci.yml
- .gitlab/ci.sh
- .gitmodules
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/Cmm/Type.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs
- compiler/GHC/CmmToAsm/Format.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/9d0f68ea956702d09115ff8c8353b9d4eee8bc7d...3fc52e65de1dfdb7e57df8494614e2d3d48b01f4

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/9d0f68ea956702d09115ff8c8353b9d4eee8bc7d...3fc52e65de1dfdb7e57df8494614e2d3d48b01f4
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240925/0f9972fa/attachment-0001.html>