[Git][ghc/ghc][wip/ncg-simd] 14 commits: The X86 SIMD patch.
sheaf (@sheaf)
gitlab at gitlab.haskell.org
Fri Sep 20 14:59:27 UTC 2024
sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC
Commits:
b1681978 by sheaf at 2024-09-20T16:58:37+02:00
The X86 SIMD patch.
This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.
Main changes:
- Introduction of vector formats (`GHC.CmmToAsm.Format`)
- Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
and removal of unused Float virtual register.
- Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
(for registers that can be used for scalar floating point values as well
as vectors).
- Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
of which format each register is used at, so that the register
allocator can know if it needs to spill the entire vector register
or just the lower 64 bits.
- Modify spill/load/reg-2-reg code to account for vector registers
(`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
- Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
the format we are storing in any given register, for instance changing
`Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
- Add logic to lower vector `MachOp`s to X86 assembly
(see `GHC.CmmToAsm.X86.CodeGen`)
- Minor cleanups to genprimopcode, to remove the llvm_only attribute
which is no longer applicable.
Tests for this feature are provided in the "testsuite/tests/simd" directory.
Fixes #7741
Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.
-------------------------
Metric Increase:
T12707
T13035
T13379
T3294
T4801
T5321FD
T5321Fun
T783
-------------------------
- - - - -
d9821a7e by sheaf at 2024-09-20T16:58:40+02:00
Use xmm registers in genapply
This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.
It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.
The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.
Fixes #25062
- - - - -
5e4e1f64 by sheaf at 2024-09-20T16:58:40+02:00
Add vector fused multiply-add operations
This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.
- - - - -
95787b4d by sheaf at 2024-09-20T16:58:41+02:00
Add vector shuffle primops
This adds vector shuffle primops, such as
```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```
which shuffle the components of the input two vectors into the output vector.
NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.
These are handled in the X86 NCG and the LLVM backend.
Tested in simd009.
- - - - -
1be7e9fc by sheaf at 2024-09-20T16:58:41+02:00
Add Broadcast MachOps
This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).
These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.
- - - - -
9cacd56a by sheaf at 2024-09-20T16:58:41+02:00
Fix treatment of signed zero in vector negation
This commit fixes the handling of signed zero in floating-point vector
negation.
A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.
The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.
- - - - -
fdff26e9 by sheaf at 2024-09-20T16:58:41+02:00
Add min/max primops
This commit adds min/max primops, such as
minDouble# :: Double# -> Double# -> Double#
minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#
These are supported in:
- the X86, AArch64 and PowerPC NCGs,
- the LLVM backend,
- the WebAssembly and JavaScript backends.
Fixes #25120
- - - - -
8b6e545a by sheaf at 2024-09-20T16:58:41+02:00
Add test for C calls & SIMD vectors
- - - - -
8035cbeb by sheaf at 2024-09-20T16:58:42+02:00
Add test for #25169
- - - - -
b788180d by sheaf at 2024-09-20T16:58:42+02:00
Fix #25169 using Plan A from the ticket
We now compile certain low-level Cmm functions in the RTS multiple
times, with different levels of vector support. We then dispatch
at runtime in the RTS, based on what instructions are supported.
See Note [realArgRegsCover] in GHC.Cmm.CallConv.
Fixes #25169
-------------------------
Metric Increase:
T10421
T12425
T1969
T9198
-------------------------
- - - - -
b17d19ea by sheaf at 2024-09-20T16:59:12+02:00
Fix C calls with SIMD vectors
This commit fixes the code generation for C calls, to take into account
the calling convention.
This is particularly tricky on Windows, where all vectors are expected
to be passed by reference. See Note [The Windows X64 C calling convention]
in GHC.CmmToAsm.X86.CodeGen.
- - - - -
3a08a57e by sheaf at 2024-09-20T16:59:13+02:00
X86 genCCall64: simply loadArg code
This commit simplifies the argument loading code by making the
assumption that it is safe to directly load the argument into register,
because doing so will not clobber any previous assignments.
This assumption is borne from the use of 'evalArgs', which evaluates
any arguments which might necessitate non-trivial code generation into
separate temporary registers.
- - - - -
57bd478d by sheaf at 2024-09-20T16:59:13+02:00
LLVM: propagate GlobalRegUse information
This commit ensures we keep track of how any particular global register
is being used in the LLVM backend. This informs the LLVM type
annotations, and avoids type mismatches of the following form:
argument is not of expected type '<2 x double>'
call ccc <2 x double> (<2 x double>)
(<4 x i32> arg)
- - - - -
a34a4dc2 by sheaf at 2024-09-20T16:59:13+02:00
X86 CodeGen: refactor getRegister CmmLit
This refactors the code dealing with loading literals into registers,
removing duplication and putting all the code in a single place.
It also changes which XOR instruction is used to place a zero value
into a register, so that we use VPXOR for a 128-bit integer vector
when AVX is supported.
- - - - -
30 changed files:
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/Cmm/Type.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs
- compiler/GHC/CmmToAsm/Format.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs
- compiler/GHC/CmmToAsm/PPC/Ppr.hs
- compiler/GHC/CmmToAsm/PPC/Regs.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/57defd651d44ee65be01c7508166a50f4a2c8de7...a34a4dc2269300b31c06173dad5d64f55317ecb4
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/57defd651d44ee65be01c7508166a50f4a2c8de7...a34a4dc2269300b31c06173dad5d64f55317ecb4
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240920/e388673e/attachment-0001.html>
More information about the ghc-commits
mailing list