[Git][ghc/ghc][wip/ncg-simd] 28 commits: testsuite: extend size performance tests with gzip (fixes #25046)

Thu Aug 15 14:20:01 UTC 2024

sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC

Commits:
eb1cb536 by Serge S. Gulin at 2024-08-06T08:54:55+00:00
testsuite: extend size performance tests with gzip (fixes #25046)

The main purpose is to create tests for minimal app (hello world and its variations, i.e. unicode used) distribution size metric.

Many platforms support distribution in compressed form via gzip. It would be nice to collect information on how much size is taken by the executional bundle for each platform at minimal edge case.

2 groups of tests are added:
1. We extend javascript backend size tests with gzip-enabled versions for all cases where an optimizing compiler is used (for now it is google closure compiler).
2. We add trivial hello world tests with gzip-enabled versions for all other platforms at CI pipeline where no external optimizing compiler is used.

- - - - -
d94410f8 by Rodrigo Mesquita at 2024-08-07T11:49:19-04:00
ghc-internal: @since for backtraceDesired

Fixes point 1 in #25052

- - - - -
bfe600f5 by Rodrigo Mesquita at 2024-08-07T11:49:19-04:00
ghc-internal: No trailing whitespace in exceptions

Fixes #25052

- - - - -
62650d9f by Andreas Klebinger at 2024-08-07T11:49:54-04:00
Add since annotation for -fkeep-auto-rules.

This partially addresses #25082.

- - - - -
5f0e23fd by Andreas Klebinger at 2024-08-07T11:49:54-04:00
Mention `-fkeep-auto-rules` in release notes.

It was added earlier but hadn't appeared in any release notes yet.
Partially addresses #25082.

- - - - -
7446a09a by Sylvain Henry at 2024-08-07T11:50:35-04:00
Cmm: don't perform unsound optimizations on 32-bit compiler hosts

- beef61351b240967b49169d27a9a19565cf3c4af enabled the use of
  MO_Add/MO_Sub for 64-bit operations in the C and LLVM backends
- 6755d833af8c21bbad6585144b10e20ac4a0a1ab did the same for the x86 NCG
  backend

However we store some literal values as `Int` in the compiler. As a
result, some Cmm optimizations transformed target 64-bit literals into
compiler `Int`. If the compiler is 32-bit, this leads to computing with
wrong literals (see #24893 and #24700).

This patch disables these Cmm optimizations for 32-bit compilers. This
is unsatisfying (optimizations shouldn't be compiler-word-size
dependent) but it fixes the bug and it makes the patch easy to backport.
A proper fix would be much more invasive but it shall be implemented in
the future.

Co-authored-by: amesgen <amesgen at amesgen.de>

- - - - -
d59faaf2 by Vladislav Zavialov at 2024-08-07T11:51:11-04:00
docs: Update info on RequiredTypeArguments

Add a section on "types in terms" that were implemented in 8b2f70a202
and remove the now outdated suggestion of using `type` for them.

- - - - -
39fd6714 by Sylvain Henry at 2024-08-07T11:51:52-04:00
JS: fix minor typo in base's jsbits

- - - - -
e7764575 by Sylvain Henry at 2024-08-07T11:51:52-04:00
RTS: remove hack to force old cabal to build a library with only JS sources

Need to extend JSC externs with Emscripten RTS definitions to avoid
JSC_UNDEFINED_VARIABLE errors when linking without the emcc rts.

Fix #25138

Some recompilation avoidance tests now fail. This is tracked with the
other instances of this failure in #23013. My hunch is that they were
working by chance when we used the emcc linker.

Metric Decrease:
    T24602_perf_size

- - - - -
d1a40233 by Brandon Chinn at 2024-08-07T11:53:08-04:00
Support multiline strings in type literals (#25132)

- - - - -
610840eb by Sylvain Henry at 2024-08-07T11:53:50-04:00
JS: fix callback documentation (#24377)

Fix #24377

- - - - -
6ae4b76a by Zubin Duggal at 2024-08-13T13:36:57-04:00
haddock: Build haddock-api and haddock-library using hadrian

We build these two packages as regular boot library dependencies rather
than using the `in-ghc-tree` flag to include the source files into the haddock
executable.

The `in-ghc-tree` flag is moved into haddock-api to ensure that haddock built
from hackage can still find the location of the GHC bindist using `ghc-paths`.

Addresses #24834

This causes a metric decrease under non-release flavours because under these
flavours libraries are compiled with optimisation but executables are not.

Since we move the bulk of the code from the haddock executable to the
haddock-api library, we see a metric decrease on the validate flavours.

Metric Decrease:
    haddock.Cabal
    haddock.base
    haddock.compiler

- - - - -
51ffba5d by Arnaud Spiwack at 2024-08-13T13:37:50-04:00
Add an extension field to HsRecFields

This is the Right Thing to Do™. And it prepares for storing a
multiplicity coercion there.

First step of the plan outlined here and below
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12947#note_573091

- - - - -
4d2faeeb by Arnaud Spiwack at 2024-08-13T13:37:50-04:00
Add test for #24961

- - - - -
623b4337 by Arnaud Spiwack at 2024-08-13T13:37:50-04:00
Ensures that omitted record fields in pattern have multiplicity Many

Omitted fields were simply ignored in the type checker and produced
incorrect Core code.

Fixes #24961

Metric Increase:
    RecordUpdPerf

- - - - -
c749bdfd by Sylvain Henry at 2024-08-13T13:38:41-04:00
AARCH64 linker: skip NONE relocations

This patch is part of the patches upstreamed from haskell.nix.
See https://github.com/input-output-hk/haskell.nix/pull/1960 for the
original report/patch.

- - - - -
682a6a41 by Brandon Chinn at 2024-08-13T13:39:17-04:00
Support multiline strings in TH

- - - - -
f046a759 by sheaf at 2024-08-15T16:14:00+02:00
The X86 SIMD patch.

This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.

Main changes:

  - Introduction of vector formats (`GHC.CmmToAsm.Format`)
  - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
    and removal of unused Float virtual register.
  - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
    two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
    (for registers that can be used for scalar floating point values as well
    as vectors).
  - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
    of which format each register is used at, so that the register
    allocator can know if it needs to spill the entire vector register
    or just the lower 64 bits.
  - Modify spill/load/reg-2-reg code to account for vector registers
    (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
  - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
    the format we are storing in any given register, for instance changing
    `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
  - Add logic to lower vector `MachOp`s to X86 assembly
    (see `GHC.CmmToAsm.X86.CodeGen`)
  - Minor cleanups to genprimopcode, to remove the llvm_only attribute
    which is no longer applicable.

Tests for this feature are provided in the "testsuite/tests/simd" directory.

Fixes #7741

Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.

-------------------------
Metric Increase:
    T12707
    T13035
    T13379
    T3294
    T4801
    T5321FD
    T5321Fun
    T783
-------------------------

- - - - -
fae71b33 by sheaf at 2024-08-15T16:14:00+02:00
Use xmm registers in genapply

This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.

It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.

The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.

Fixes #25062

- - - - -
92b728cf by sheaf at 2024-08-15T16:14:01+02:00
Add vector fused multiply-add operations

This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.

- - - - -
f3386a59 by sheaf at 2024-08-15T16:14:01+02:00
Add vector shuffle primops

This adds vector shuffle primops, such as

```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```

which shuffle the components of the input two vectors into the output vector.

NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.

These are handled in the X86 NCG and the LLVM backend.

Tested in simd009.

- - - - -
3b3dfb92 by sheaf at 2024-08-15T16:14:01+02:00
Add Broadcast MachOps

This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).

These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.

- - - - -
aa5820fd by sheaf at 2024-08-15T16:14:01+02:00
Fix treatment of signed zero in vector negation

This commit fixes the handling of signed zero in floating-point vector
negation.

A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.

The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.

- - - - -
cec5908c by sheaf at 2024-08-15T16:14:01+02:00
Add min/max primops

This commit adds min/max primops, such as

  minDouble# :: Double# -> Double# -> Double#
  minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
  minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#

These are supported in:
  - the X86, AArch64 and PowerPC NCGs,
  - the LLVM backend,
  - the WebAssembly and JavaScript backends.

Fixes #25120

- - - - -
22176f8b by sheaf at 2024-08-15T16:14:01+02:00
Modularise RegClass

This commit modularises the RegClass datatype, allowing it to be
used with architectures that have different register architectures, e.g.
RISC-V which has separate floating-point and vector registers.

The two modules GHC.Platform.Reg.Class.Unified and
GHC.Platform.Reg.Class.Separate implement the two register architectures
we currently support (corresponding to the two constructors of the
GHC.Platform.Reg.Class.RegArch datatype).

- - - - -
941add96 by sheaf at 2024-08-15T16:14:01+02:00
Add test for C calls & SIMD vectors

- - - - -
606c72e4 by sheaf at 2024-08-15T16:14:01+02:00
Fix C calls with SIMD vectors

This commit fixes the code generation for C calls, to take into account
the calling convention.

This is particularly tricky on Windows, where all vectors are expected
to be passed by reference. See Note [The Windows X64 C calling convention]
in GHC.CmmToAsm.X86.CodeGen.

- - - - -
7d1a3cc5 by sheaf at 2024-08-15T16:19:46+02:00
GHC calling convention: clarifications

This commit clarifies that the GHC calling convention, on X86_64, uses
xmm1, ..., xmm6 for argument passing. It does not use xmm0, because
that's the convention we asked the LLVM compiler authors to define for
usage with GHC.

This unfortunately means a discrepancy with the C calling convention
(which does use xmm0, for the first argument and for the result).

Fixes #25156

- - - - -

30 changed files:

- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/Cmm/Type.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs
- compiler/GHC/CmmToAsm/Format.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs
- compiler/GHC/CmmToAsm/PPC/Ppr.hs
- compiler/GHC/CmmToAsm/PPC/Regs.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f759277e63fa2cd80e66df3ed489b258f249f292...7d1a3cc5a87289cba192c517603e35c86511a06b

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/f759277e63fa2cd80e66df3ed489b258f249f292...7d1a3cc5a87289cba192c517603e35c86511a06b
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240815/63bed7f9/attachment-0001.html>