[Git][ghc/ghc][wip/ncg-simd] 27 commits: Improve docs for NondecreasingIndentation

Wed Jul 17 15:13:22 UTC 2024

sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC

Commits:
46ec0a8e by Torsten Schmits at 2024-07-09T13:37:02+02:00
Improve docs for NondecreasingIndentation

The text stated that this affects indentation of layouts nested in do
expressions, while it actually affects that of do layouts nested in any
other.

- - - - -
dddc9dff by Zubin Duggal at 2024-07-12T11:41:24-04:00
compiler: Fingerprint -fwrite-if-simplified-core

We need to recompile if this flag is changed because later modules might depend on the
simplified core for this module if -fprefer-bytecode is enabled.

Fixes #24656

- - - - -
145a6477 by Matthew Pickering at 2024-07-12T11:42:00-04:00
Add support for building profiled dynamic way

The main payload of this change is to hadrian.

* Default settings will produced dynamic profiled objects
* `-fexternal-interpreter` is turned on in some situations when there is
  an incompatibility between host GHC and the way attempting to be
  built.
* Very few changes actually needed to GHC

There are also necessary changes to the bootstrap plans to work with the
vendored Cabal dependency. These changes should ideally be reverted by
the next GHC release.

In hadrian support is added for building profiled dynamic libraries
(nothing too exciting to see there)

Updates hadrian to use a vendored Cabal submodule, it is important that
we replace this usage with a released version of Cabal library before
the 9.12 release.

Fixes #21594

-------------------------
Metric Increase:
    libdir
-------------------------

- - - - -
414a6950 by Matthew Pickering at 2024-07-12T11:42:00-04:00
testsuite: Make find_so regex more precise

The hash contains lowercase [a-z0-9] and crucially not _p which meant we
sometimes matched on `libHS.._p` profiled shared libraries rather than
the normal shared library.

- - - - -
dee035bf by Alex Mason at 2024-07-12T11:42:41-04:00
ncg(aarch64): Add fsqrt instruction, byteSwap primitives [#24956]

Implements the FSQRT machop using native assembly rather than a C call.

Implements MO_BSwap by producing assembly to do the byte swapping
instead of producing a foreign call a C function.

In `tar`, the hot loop for `deserialise` got almost 4x faster by
avoiding the foreign call which caused spilling live variables to the
stack -- this means the loop did 4x more memory read/writing than
necessary in that particular case!

- - - - -
5104ee61 by Sylvain Henry at 2024-07-12T11:43:23-04:00
Linker: use m32 allocator for sections when NEED_PLT (#24432)

Use M32 allocator to avoid fragmentation when allocating ELF sections.
We already did this when NEED_PLT was undefined. Failing to do this led
to relocations impossible to fulfil (#24432).

- - - - -
52d66984 by Sylvain Henry at 2024-07-12T11:43:23-04:00
RTS: allow M32 allocation outside of 4GB range when assuming -fPIC

- - - - -
c34fef56 by Sylvain Henry at 2024-07-12T11:43:23-04:00
Linker: fix stub offset

Remove unjustified +8 offset that leads to memory corruption (cf
discussion in #24432).

- - - - -
280e4bf5 by Simon Peyton Jones at 2024-07-12T11:43:59-04:00
Make type-equality on synonyms a bit faster

This MR make equality fast for (S tys1 `eqType` S tys2),
where S is a non-forgetful type synonym.

It doesn't affect compile-time allocation much, but then comparison doesn't
allocate anyway.  But it seems like a Good Thing anyway.

See Note [Comparing type synonyms] in GHC.Core.TyCo.Compare
and Note [Forgetful type synonyms] in GHC.Core.TyCon

Addresses #25009.

- - - - -
cb83c347 by Alan Zimmerman at 2024-07-12T11:44:35-04:00
EPA: Bring back SrcSpan in EpaDelta

When processing files in ghc-exactprint, the usual workflow is to
first normalise it with makeDeltaAst, and then operate on it.

But we need the original locations to operate on it, in terms of
finding things.

So restore the original SrcSpan for reference in EpaDelta

- - - - -
7bcda869 by Matthew Pickering at 2024-07-12T11:45:11-04:00
Update alpine release job to 3.20

alpine 3.20 was recently released and uses a new python and sphinx
toolchain which could be useful to test.

- - - - -
43aa99b8 by Matthew Pickering at 2024-07-12T11:45:11-04:00
testsuite: workaround bug in python-3.12

There is some unexplained change to binding behaviour in python-3.12
which requires moving this import from the top-level into the scope of
the function.

I didn't feel any particular desire to do a deep investigation as to why
this changed as the code works when modified like this. No one in the
python IRC channel seemed to know what the problem was.

- - - - -
e3914028 by Adam Sandberg Ericsson at 2024-07-12T11:45:47-04:00
initialise mmap_32bit_base during RTS startup #24847
- - - - -
86b8ecee by Hécate Kleidukos at 2024-07-12T11:46:27-04:00
haddock: Only fetch supported languages and extensions once per Interface list

This reduces the number of operations done on each Interface, because
supported languages and extensions are determined from architecture and
operating system of the build host. This information remains stable
across Interfaces, and as such doesn not need to be recovered for each
Interface.

- - - - -
4f85366f by sheaf at 2024-07-13T05:58:14-04:00
Testsuite: use py-cpuinfo to compute CPU features

This replaces the rather hacky logic we had in place for checking
CPU features. In particular, this means that feature availability now
works properly on Windows.

- - - - -
41f1354d by Matthew Pickering at 2024-07-13T05:58:51-04:00
testsuite: Replace $CC with $TEST_CC

The TEST_CC variable should be set based on the test compiler, which may
be different to the compiler which is set to CC on your system (for
example when cross compiling).

Fixes #24946

- - - - -
572fbc44 by sheaf at 2024-07-15T08:30:32-04:00
isIrrefutableHsPat: consider COMPLETE pragmas

This patch ensures we taken into account COMPLETE pragmas when we
compute whether a pattern is irrefutable. In particular, if a pattern
synonym is the sole member of a COMPLETE pragma (without a result TyCon),
then we consider a pattern match on that pattern synonym to be irrefutable.

This affects the desugaring of do blocks, as it ensures we don't use
a "fail" operation.

Fixes #15681 #16618 #22004

- - - - -
84dadea9 by Zubin Duggal at 2024-07-15T08:31:09-04:00
haddock: Handle non-hs files, so that haddock can generate documentation for modules with
foreign imports and template haskell.

Fixes #24964

- - - - -
0b4ff9fa by Zubin Duggal at 2024-07-15T12:12:30-04:00
haddock: Keep track of warnings/deprecations from dependent packages in `InstalledInterface`
and use this to propagate these on items re-exported from dependent packages.

Fixes #25037

- - - - -
b8b4b212 by Zubin Duggal at 2024-07-15T12:12:30-04:00
haddock: Keep track of instance source locations in `InstalledInterface` and use this to add
source locations on out of package instances

Fixes #24929

- - - - -
559a7a7c by Matthew Pickering at 2024-07-15T12:13:05-04:00
ci: Refactor job_groups definition, split up by platform

The groups are now split up so it's easier to see which jobs are
generated for each platform

No change in behaviour, just refactoring.

- - - - -
043b9634 by sheaf at 2024-07-17T17:12:42+02:00
The X86 SIMD patch.

This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.

Main changes:

  - Introduction of vector formats (`GHC.CmmToAsm.Format`)
  - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
    and removal of unused Float virtual register.
  - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
    two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
    (for registers that can be used for scalar floating point values as well
    as vectors).
  - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
    of which format each register is used at, so that the register
    allocator can know if it needs to spill the entire vector register
    or just the lower 64 bits.
  - Modify spill/load/reg-2-reg code to account for vector registers
    (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
  - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
    the format we are storing in any given register, for instance changing
    `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
  - Add logic to lower vector `MachOp`s to X86 assembly
    (see `GHC.CmmToAsm.X86.CodeGen`)
  - Minor cleanups to genprimopcode, to remove the llvm_only attribute
    which is no longer applicable.

Tests for this feature are provided in the "testsuite/tests/simd" directory.

Fixes #7741

Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.

-------------------------
Metric Increase:
    T12707
    T13035
    T13379
    T3294
    T4801
    T5321FD
    T5321Fun
    T783
-------------------------

- - - - -
cf58d684 by sheaf at 2024-07-17T17:12:42+02:00
Use xmm registers in genapply

This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.

It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.

The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.

Fixes #25062

- - - - -
09c877e1 by sheaf at 2024-07-17T17:12:42+02:00
Add vector fused multiply-add operations

This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.

- - - - -
7e37dfd3 by sheaf at 2024-07-17T17:12:42+02:00
Add vector shuffle primops

This adds vector shuffle primops, such as

```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```

which shuffle the components of the input two vectors into the output vector.

NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.

These are handled in the X86 NCG and the LLVM backend.

Tested in simd009.

- - - - -
02959539 by sheaf at 2024-07-17T17:12:42+02:00
Add Broadcast MachOps

This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).

These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.

- - - - -
6c773f34 by sheaf at 2024-07-17T17:12:43+02:00
Fix treatment of signed zero in vector negation

This commit fixes the handling of signed zero in floating-point vector
negation.

A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.

The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.

- - - - -

30 changed files:

- .gitlab-ci.yml
- .gitlab/generate-ci/gen_ci.hs
- .gitlab/jobs.yaml
- .gitlab/rel_eng/fetch-gitlab-artifacts/fetch_gitlab.py
- .gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
- .gitmodules
- compiler/GHC.hs
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/Cmm/Type.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f13b09c9daa35dab229b81196ae1a5224514a3a7...6c773f341f02dd09e5c13734d96ae192660971fb

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f13b09c9daa35dab229b81196ae1a5224514a3a7...6c773f341f02dd09e5c13734d96ae192660971fb
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240717/b530c4e3/attachment-0001.html>