[Git][ghc/ghc][wip/ncg-simd] 32 commits: ci: Run abi-test on test-abi label
sheaf (@sheaf)
gitlab at gitlab.haskell.org
Wed Sep 25 23:43:09 UTC 2024
sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC
Commits:
2a551cd5 by Matthew Pickering at 2024-09-24T16:33:50+05:30
ci: Run abi-test on test-abi label
- - - - -
ab4039ac by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
testsuite: Add a test for object determinism
Extends the abi_test with an object determinism check
Also includes a standalone test to be run by developers manually when
debugging issues with determinism.
- - - - -
d62c18d8 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: Sampling uniques in the CG
To achieve object determinism, the passes processing Cmm and the rest of
the code generation pipeline musn't create new uniques which are
non-deterministic.
This commit changes occurrences of non-deterministic unique sampling
within these code generation passes by a deterministic unique sampling
strategy by propagating and threading through a deterministic
incrementing counter in them. The threading is done implicitly with
`UniqDSM` and `UniqDSMT`.
Secondly, the `DUniqSupply` used to run a `UniqDSM` must be threaded
through all passes to guarantee uniques in different passes are unique
amongst them altogether. Specifically, the same `DUniqSupply` must be
threaded through the CG Streaming pipeline, starting with Driver.Main
calling `StgToCmm.codeGen`, `cmmPipeline`, `cmmToRawCmm`, and
`codeOutput` in sequence.
To thread resources through the `Stream` abstraction, we use the `UniqDSMT`
transformer on top of `IO` as the Monad underlying the Stream. `UniqDSMT` will
thread the `DUniqSupply` through every pass applied to the `Stream`, for every
element. We use @type CgStream = Stream (UniqDSMT IO)@ for the Stream used in
code generation which that carries through the deterministic unique supply.
See Note [Deterministic Uniques in the CG]
- - - - -
3bbe4af4 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: Cmm unique renaming pass
To achieve object determinism, we need to prevent the non-deterministic
uniques from leaking into the object code. We can do this by
deterministically renaming the non-external uniques in the Cmm groups
that are yielded right after StgToCmm.
The key to deterministic renaming is observing that the order of
declarations, instructions, and data in the Cmm groups are already
deterministic (modulo other determinism bugs), regardless of the
uniques. We traverse the Cmm AST in this deterministic order and
rename the uniques, incrementally, in the order they are found, thus
making them deterministic. This renaming is guarded by
-fobject-determinism which is disabled by default for now.
This is one of the key passes for object determinism. Read about the
overview of object determinism and a more detailed explanation of this
pass in:
* Note [Object determinism]
* Note [Renaming uniques deterministically]
Significantly closes the gap to #12935
- - - - -
8357ed50 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: DCmmGroup vs CmmGroup
Part of our strategy in producing deterministic objects, namely,
renaming all Cmm uniques in order, depend on the object code produced
having a deterministic order (say, A_closure always comes before
B_closure).
However, the use of LabelMaps in the Cmm representation invalidated this
requirement because the LabelMaps elements would already be in a
non-deterministic order (due to the original uniques), and the renaming
in sequence wouldn't work because of that non-deterministic order.
Therefore, we now start off with lists in CmmGroup (which preserve the
original order), and convert them into LabelMaps (for performance in the
code generator) after the uniques of the list elements have been
renamed.
See Note [DCmmGroup vs CmmGroup or: Deterministic Info Tables] and #12935.
Co-authored-by: Matthew Pickering <matthewtpickering at gmail.com>
- - - - -
0e675fb8 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: Don't print unique in pprFullName
This unique was leaking as part of the profiling description in info
tables when profiling was enabled, despite not providing information
relevant to the profile.
- - - - -
340f58b0 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: UDFM for distinct-constructor-tables
In order to produce deterministic objects when compiling with
-distinct-constructor-tables, we also have to update the data
constructor map to be backed by a deterministic unique map (UDFM) rather
than a non-deterministic one (UniqMap).
- - - - -
282f37a0 by Rodrigo Mesquita at 2024-09-24T16:33:50+05:30
determinism: InfoTableMap uniques in generateCgIPEStub
Fixes object determinism when using -finfo-table-map
Make sure to also deterministically rename the IPE map (as per Note
[Renaming uniques deterministically]), and to use a deterministic unique
supply when creating new labels for the IPE information to guarantee
deterministic objects when IPE information is requested.
Note that the Cmm group produced in generateCgIPEStub must /not/ be
renamed because renaming uniques is not idempotent, and the references
to the previously renamed code in the IPE Cmm group would be renamed
twice and become invalid references to non-existent symbols.
We do need to det-rename the InfoTableMap that is created in the
conversion from Core to Stg. This is not a problem since that map won't
refer any already renamed names (since it was created before the
renaming).
- - - - -
7b37afc9 by Zubin Duggal at 2024-09-24T16:33:50+05:30
ci: Allow abi-test to fail.
We are not fully deterministic yet, see #12935 for work that remains to be done.
- - - - -
a63ee33a by Simon Peyton Jones at 2024-09-25T17:08:24-04:00
Add Given injectivity for built-in type families
Ticket #24845 asks (reasonably enough) that if we have
[G] a+b ~ 0
then we also know
[G] a ~ 0, b ~ 0
and similar injectivity-like facts for other built-in type
families. The status quo was that we never generate evidence for
injectivity among Givens -- but it is quite reasonnable to do so.
All we need is to have /evidence/ for the new constraints
This MR implements that goal. I also took the opportunity to
* Address #24978: refactoring UnivCo
* Fix #25248, which was a consequences of the previous formulation of UnivCo
As a result this MR touches a lot of code. The big things are:
* Coercion constructor UnivCo now takes a [Coercion] as argument to
express the coercions on which the UnivCo depends. A nice consequence
is that UnivCoProvenance now has no free variables, simpler in a number
of places.
* Coercion constructors AxiomInstCo and AxiomRuleCo are combined into
AxiomCo. The new AxiomCo, carries a (slightly oddly named)
CoAxiomRule, which itself is a sum type of the various forms of
built-in axiom. See Note [CoAxiomRule] in GHC.Core.Coercion.Axiom
A merit of this is that we can separate the case of open and closed
type families, and eliminate the redundant `BranchIndex` in the former
case.
* Much better representation for data BuiltInSynFamily, which means we
no longer need to enumerate built-in axioms as well as built-in tycons.
* There is a massive refactor in GHC.Builtin.Types.Literals, which contains all
the built-in axioms for type-level operations (arithmetic, append, cons etc).
A big change is that instead of redundantly having (a) a hand-written
matcher, and (b) a template-based "proves" function, which were hard to
keep in sync, the two are derive from one set of human-supplied info.
See GHC.Builtin.Types.Literals.mkRewriteAxiom, and friends.
* Significant changes in GHC.Tc.Solver.Equality to account for the new
opportunity for Given/Given equalities.
Smaller things
* Improve pretty-printing to avoid parens around atomic coercions.
* Do proper eqType in findMatchingIrreds, not `eqTypeNoKindCheck`.
Looks like a bug, Richard agrees.
* coercionLKind and coercionRKind are hot functions. I refactored the
implementation (which I had to change anyway) to increase sharing.
See Note [coercionKind performance] in GHC.Core.Coercion
* I wrote a new Note [Finding orphan names] in GHC.Core.FVs about orphan
names
* I improved the `is_concrete` flag in GHC.Core.Type.buildSynTyCon, to avoid
calling tyConsOfType. I forget exactly why I did this, but it's definitely
better now.
* I moved some code from GHC.Tc.Types.Constraint into GHC.Tc.Types.CtLocEnv
and I renamed the module GHC.Tc.Types.CtLocEnv to GHC.Tc.Types.CtLoc
- - - - -
dd8ef342 by Ryan Scott at 2024-09-25T17:09:01-04:00
Resolve ambiguous method-bound type variables in vanilla defaults and GND
When defining an instance of a class with a "vanilla" default, such as in the
following example (from #14266):
```hs
class A t where
f :: forall x m. Monoid x => t m -> m
f = <blah>
instance A []
```
We have to reckon with the fact that the type of `x` (bound by the type
signature for the `f` method) is ambiguous. If we don't deal with the ambiguity
somehow, then when we generate the following code:
```hs
instance A [] where
f = $dmf @[] -- NB: the type of `x` is still ambiguous
```
Then the generated code will not typecheck. (Issue #25148 is a more recent
example of the same problem.)
To fix this, we bind the type variables from the method's original type
signature using `TypeAbstractions` and instantiate `$dmf` with them using
`TypeApplications`:
```hs
instance A [] where
f @x @m = $dmf @[] @x @m -- `x` is no longer ambiguous
```
Note that we only do this for vanilla defaults and not for generic defaults
(i.e., defaults using `DefaultSignatures`). For the full details, see `Note
[Default methods in instances] (Wrinkle: Ambiguous types from vanilla method
type signatures)`.
The same problem arose in the code generated by `GeneralizedNewtypeDeriving`,
as we also fix it here using the same technique. This time, we can take
advantage of the fact that `GeneralizedNewtypeDeriving`-generated code
_already_ brings method-bound type variables into scope via `TypeAbstractions`
(after !13190), so it is very straightforward to visibly apply the type
variables on the right-hand sides of equations. See `Note [GND and ambiguity]`.
Fixes #14266. Fixes #25148.
- - - - -
0a4da5d2 by ARATA Mizuki at 2024-09-25T17:09:41-04:00
Document primitive string literals and desugaring of string literals
Fixes #17474 and #17974
Co-authored-by: Matthew Craven <5086-clyring at users.noreply.gitlab.haskell.org>
- - - - -
ad0731ad by Zubin Duggal at 2024-09-25T17:10:18-04:00
rts: Fix segfault when using non-moving GC with profiling
`nonMovingCollect()` swaps out the `static_flag` value used as a
sentinel for `gct->scavenged_static_objects`, but the subsequent call
`resetStaticObjectForProfiling()` sees the old value of `static_flag` used as
the sentinel and segfaults. So we must call `resetStaticObjectForProfiling()`
before calling `nonMovingCollect()` as otherwise it looks for the incorrect
sentinel value
Fixes #25232 and #23958
Also teach the testsuite driver about nonmoving profiling ways
and stop disabling metric collection when nonmoving GC is enabled.
- - - - -
e7a26d7a by Sylvain Henry at 2024-09-25T17:11:00-04:00
Fix interaction between fork and kqueue (#24672)
A kqueue file descriptor isn't inherited by a child created with fork.
As such we mustn't try to close this file descriptor as we would close a
random one, e.g. the one used by timerfd.
Fix #24672
- - - - -
6863503c by Simon Peyton Jones at 2024-09-25T17:11:37-04:00
Improve GHC.Tc.Solver.defaultEquality
This MR improves GHC.Tc.Solver.defaultEquality to solve #25251.
The main change is to use checkTyEqRhs to check the equality, so
that we do promotion properly.
But within that we needed a small enhancement to LC_Promote. See
Note [Defaulting equalites] (DE4) and (DE5)
The tricky case is (alas) hard to trigger, so I have not added a
regression test.
- - - - -
97a6c6c3 by Sylvain Henry at 2024-09-25T17:12:18-04:00
JS: fix h$withCStringOnHeap helper (#25288)
strlen returns the length of the string without the \0 terminating byte,
hence CString weren't properly allocated on the heap (ending \0 byte was
missing).
- - - - -
d6ec6f2a by sheaf at 2024-09-26T00:50:17+02:00
The X86 SIMD patch.
This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.
Main changes:
- Introduction of vector formats (`GHC.CmmToAsm.Format`)
- Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
and removal of unused Float virtual register.
- Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
(for registers that can be used for scalar floating point values as well
as vectors).
- Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
of which format each register is used at, so that the register
allocator can know if it needs to spill the entire vector register
or just the lower 64 bits.
- Modify spill/load/reg-2-reg code to account for vector registers
(`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
- Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
the format we are storing in any given register, for instance changing
`Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
- Add logic to lower vector `MachOp`s to X86 assembly
(see `GHC.CmmToAsm.X86.CodeGen`)
- Minor cleanups to genprimopcode, to remove the llvm_only attribute
which is no longer applicable.
Tests for this feature are provided in the "testsuite/tests/simd" directory.
Fixes #7741
Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.
-------------------------
Metric Increase:
T12707
T13035
T13379
T3294
T4801
T5321FD
T5321Fun
T783
-------------------------
- - - - -
8a64f210 by sheaf at 2024-09-26T00:50:21+02:00
Use xmm registers in genapply
This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.
It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.
The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.
Fixes #25062
- - - - -
5b58a3f1 by sheaf at 2024-09-26T00:50:21+02:00
Add vector fused multiply-add operations
This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.
- - - - -
d3e3c31c by sheaf at 2024-09-26T00:50:22+02:00
Add vector shuffle primops
This adds vector shuffle primops, such as
```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```
which shuffle the components of the input two vectors into the output vector.
NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.
These are handled in the X86 NCG and the LLVM backend.
Tested in simd009.
- - - - -
408ede55 by sheaf at 2024-09-26T00:50:51+02:00
Add Broadcast MachOps
This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).
These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.
- - - - -
2150879d by sheaf at 2024-09-26T00:50:53+02:00
Fix treatment of signed zero in vector negation
This commit fixes the handling of signed zero in floating-point vector
negation.
A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.
The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.
- - - - -
1b439775 by sheaf at 2024-09-26T00:50:54+02:00
Add min/max primops
This commit adds min/max primops, such as
minDouble# :: Double# -> Double# -> Double#
minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#
These are supported in:
- the X86, AArch64 and PowerPC NCGs,
- the LLVM backend,
- the WebAssembly and JavaScript backends.
Fixes #25120
- - - - -
a06e9dd0 by sheaf at 2024-09-26T00:50:54+02:00
Add test for C calls & SIMD vectors
- - - - -
5358ce5b by sheaf at 2024-09-26T00:50:54+02:00
Add test for #25169
- - - - -
83f08b00 by sheaf at 2024-09-26T00:50:54+02:00
Fix #25169 using Plan A from the ticket
We now compile certain low-level Cmm functions in the RTS multiple
times, with different levels of vector support. We then dispatch
at runtime in the RTS, based on what instructions are supported.
See Note [realArgRegsCover] in GHC.Cmm.CallConv.
Fixes #25169
-------------------------
Metric Increase:
T10421
T12425
T18730
T1969
T9198
-------------------------
- - - - -
26f68f5b by sheaf at 2024-09-26T00:51:08+02:00
Fix C calls with SIMD vectors
This commit fixes the code generation for C calls, to take into account
the calling convention.
This is particularly tricky on Windows, where all vectors are expected
to be passed by reference. See Note [The Windows X64 C calling convention]
in GHC.CmmToAsm.X86.CodeGen.
- - - - -
2f123d5f by sheaf at 2024-09-26T00:51:08+02:00
X86 CodeGen: refactor getRegister CmmLit
This refactors the code dealing with loading literals into registers,
removing duplication and putting all the code in a single place.
It also changes which XOR instruction is used to place a zero value
into a register, so that we use VPXOR for a 128-bit integer vector
when AVX is supported.
- - - - -
f624595d by sheaf at 2024-09-26T00:51:08+02:00
X86 genCCall: promote arg before calling evalArgs
The job of evalArgs is to ensure each argument is put into a temporary
register, so that it can then be loaded directly into one of the
argument registers for the C call, without the generated code clobbering
any other register used for argument passing.
However, if we promote arguments after calling evalArgs, there is the
possibility that the code used for the promotion will clobber a register,
defeating the work of evalArgs.
To avoid this, we first promote arguments, and only then call evalArgs.
- - - - -
2a552087 by sheaf at 2024-09-26T00:51:08+02:00
X86 genCCall64: simplify loadArg code
This commit simplifies the argument loading code by making the
assumption that it is safe to directly load the argument into register,
because doing so will not clobber any previous assignments.
This assumption is borne from the use of 'evalArgs', which evaluates
any arguments which might necessitate non-trivial code generation into
separate temporary registers.
- - - - -
72ef4f97 by sheaf at 2024-09-26T00:51:08+02:00
LLVM: propagate GlobalRegUse information
This commit ensures we keep track of how any particular global register
is being used in the LLVM backend. This informs the LLVM type
annotations, and avoids type mismatches of the following form:
argument is not of expected type '<2 x double>'
call ccc <2 x double> (<2 x double>)
(<4 x i32> arg)
- - - - -
1df6d224 by sheaf at 2024-09-26T01:01:25+02:00
shuffle fixup
- - - - -
21 changed files:
- .gitlab-ci.yml
- .gitlab/ci.sh
- compiler/GHC/Builtin/Types/Literals.hs
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/BlockId.hs
- compiler/GHC/Cmm/CLabel.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Dataflow.hs
- compiler/GHC/Cmm/Dataflow/Graph.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Info.hs
- compiler/GHC/Cmm/Info/Build.hs
- compiler/GHC/Cmm/LayoutStack.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
The diff was not included because it is too large.
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3fc52e65de1dfdb7e57df8494614e2d3d48b01f4...1df6d2247e43ff87049c3822512b40c61b4f4076
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/3fc52e65de1dfdb7e57df8494614e2d3d48b01f4...1df6d2247e43ff87049c3822512b40c61b4f4076
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240925/6c55fd0c/attachment-0001.html>
More information about the ghc-commits
mailing list