[Git][ghc/ghc][wip/ncg-simd] 24 commits: Define Eq1, Ord1, Show1 and Read1 instances for basic Generic representation types

Fri Aug 2 08:42:49 UTC 2024

sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC

Commits:
a8362630 by Sergey Vinokurov at 2024-07-24T12:22:45-04:00
Define Eq1, Ord1, Show1 and Read1 instances for basic Generic representation types

This way the Generically1 newtype could be used to derive Eq1 and Ord1
for user types with DerivingVia.

The CLC proposal is https://github.com/haskell/core-libraries-committee/issues/273.

The GHC issue is https://gitlab.haskell.org/ghc/ghc/-/issues/24312.

- - - - -
de5d9852 by Simon Peyton Jones at 2024-07-24T12:23:22-04:00
Address #25055, by disabling case-of-runRW# in Gentle phase

See Note [Case-of-case and full laziness]
in GHC.Driver.Config.Core.Opt.Simplify

- - - - -
3f89ab92 by Andreas Klebinger at 2024-07-25T14:12:54+02:00
Fix -freg-graphs for FP and AARch64 NCG (#24941).

It seems we reserve 8 registers instead of four for global regs
based on the layout in Note [AArch64 Register assignments].

I'm not sure it's neccesary, but for now we just accept this state of
affairs and simple update -fregs-graph to account for this.

- - - - -
f6b4c1c9 by Simon Peyton Jones at 2024-07-27T09:45:44-04:00
Fix nasty bug in occurrence analyser

As #25096 showed, the occurrence analyser was getting one-shot info
flat out wrong.

This commit does two things:

* It fixes the bug and actually makes the code a bit tidier too.
  The work is done in the new function
     GHC.Core.Opt.OccurAnal.mkRhsOccEnv,
  especially the bit that prepares the `occ_one_shots` for the RHS.

  See Note [The OccEnv for a right hand side]

* When floating out a binding we must be conservative about one-shot
  info.  But we were zapping the entire demand info, whereas we only
  really need zap the /top level/ cardinality.

  See Note [Floatifying demand info when floating]
  in GHC.Core.Opt.SetLevels

For some reason there is a 2.2% improvement in compile-time allocation
for CoOpt_Read.  Otherwise nickels and dimes.

Metric Decrease:
    CoOpt_Read

- - - - -
646ee207 by Torsten Schmits at 2024-07-27T09:46:20-04:00
add missing cell in flavours table

- - - - -
ec2eafdb by Ben Gamari at 2024-07-28T20:51:12+02:00
users-guide: Drop mention of dead __PARALLEL_HASKELL__ macro

This has not existed for over a decade.

- - - - -
e2f2a56e by Arnaud Spiwack at 2024-07-28T22:21:07-04:00
Add tests for 25081

- - - - -
23f50640 by Arnaud Spiwack at 2024-07-28T22:21:07-04:00
Scale multiplicity in list comprehension

Fixes #25081

- - - - -
d2648289 by romes at 2024-07-30T01:38:12-04:00
TTG HsCmdArrForm: use Fixity via extension point

Also migrate Fixity from GHC.Hs to Language.Haskell.Syntax
since it no longer uses any GHC-specific data types.

Fixed arrow desugaring bug. (This was dead code before.)
Remove mkOpFormRn, it is also dead code, only used in the arrow
desugaring now removed.

Co-authored-by: Fabian Kirchner <kirchner at posteo.de>
Co-authored-by: Alan Zimmerman <alan.zimm at gmail.com>

- - - - -
e258ad54 by Matthew Pickering at 2024-07-30T01:38:48-04:00
ghcup-metadata: More metadata fixes

* Incorrect version range on the alpine bindists
* Missing underscore in "unknown_versioning"

Fixes #25119

- - - - -
72b54c07 by Rodrigo Mesquita at 2024-08-01T00:47:29-04:00
Deriving-via one-shot strict state Monad instances

A small refactor to use deriving via GHC.Utils.Monad.State.Strict
Monad instances for state Monads with unboxed/strict results which all
re-implemented the one-shot trick in the instance and used unboxed
tuples:

* CmmOptM in GHC.Cmm.GenericOpt
* RegM in GHC.CmmToAsm.Reg.Linear.State
* UniqSM in GHC.Types.Unique.Supply

- - - - -
bfe4b3d3 by doyougnu at 2024-08-01T00:48:06-04:00
Rts linker: add case for pc-rel 64 relocation

part of the upstream haskell.nix patches

- - - - -
5843c7e3 by doyougnu at 2024-08-01T00:48:42-04:00
RTS linker: aarch64: better debug information

Dump better debugging information when a symbol address is null.

Part of the haskell.nix patches upstream project

Co-authored-by: Sylvain Henry <sylvain at haskus.fr>

- - - - -
c2e9c581 by Rodrigo Mesquita at 2024-08-01T00:49:18-04:00
base: Add haddocks to HasExceptionContext

Fixes #25091

- - - - -
f954f428 by Sylvain Henry at 2024-08-01T00:49:59-04:00
Only lookup ghcversion.h file in the RTS include-dirs by default.

The code was introduced in 3549c952b535803270872adaf87262f2df0295a4.
It used `getPackageIncludePath` which name doesn't convey that it looks
into all include paths of the preload units too. So this behavior is
probably unintentional and it should be ok to change it.

Fix #25106

- - - - -
951ce3d5 by Matthew Pickering at 2024-08-01T00:50:35-04:00
driver: Fix -Wmissing-home-modules when multiple units have the same module name

It was assumed that module names were unique but that isn't true with
multiple units.

The fix is quite simple, maintain a set of `(ModuleName, UnitId)` and
query that to see whether the module has been specified.

Fixes #25122

- - - - -
bae1fea4 by sheaf at 2024-08-01T00:51:15-04:00
PMC: suggest in-scope COMPLETE sets when possible

This commit modifies GHC.HsToCore.Pmc.Solver.generateInhabitingPatterns
to prioritise reporting COMPLETE sets in which all of the ConLikes
are in scope. This avoids suggesting out of scope constructors
when displaying an incomplete pattern match warning, e.g. in

  baz :: Ordering -> Int
  baz = \case
    EQ -> 5

we prefer:

  Patterns of type 'Ordering' not matched:
      LT
      GT

over:

  Patterns of type 'Ordering' not matched:
      OutOfScope

Fixes #25115

- - - - -
5862aad3 by sheaf at 2024-08-01T17:08:16+02:00
The X86 SIMD patch.

This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.

Main changes:

  - Introduction of vector formats (`GHC.CmmToAsm.Format`)
  - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
    and removal of unused Float virtual register.
  - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
    two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
    (for registers that can be used for scalar floating point values as well
    as vectors).
  - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
    of which format each register is used at, so that the register
    allocator can know if it needs to spill the entire vector register
    or just the lower 64 bits.
  - Modify spill/load/reg-2-reg code to account for vector registers
    (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
  - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
    the format we are storing in any given register, for instance changing
    `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
  - Add logic to lower vector `MachOp`s to X86 assembly
    (see `GHC.CmmToAsm.X86.CodeGen`)
  - Minor cleanups to genprimopcode, to remove the llvm_only attribute
    which is no longer applicable.

Tests for this feature are provided in the "testsuite/tests/simd" directory.

Fixes #7741

Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.

-------------------------
Metric Increase:
    T12707
    T13035
    T13379
    T3294
    T4801
    T5321FD
    T5321Fun
    T783
-------------------------

- - - - -
6b4a07d6 by sheaf at 2024-08-01T17:08:19+02:00
Use xmm registers in genapply

This commit updates genapply to use xmm, ymm and zmm registers, for
stg_ap_v16/stg_ap_v32/stg_ap_v64, respectively.

It also updates the Cmm lexer and parser to produce Cmm vectors rather
than 128/256/512 bit wide scalars for V16/V32/V64, removing bits128,
bits256 and bits512 in favour of vectors.

The Cmm Lint check is weakened for vectors, as (in practice, e.g. on X86)
it is okay to use a single vector register to hold multiple different
types of data, and we don't know just from seeing e.g. "XMM1" how to
interpret the 128 bits of data within.

Fixes #25062

- - - - -
57bd607b by sheaf at 2024-08-01T17:08:19+02:00
Add vector fused multiply-add operations

This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.

- - - - -
0fb911d7 by sheaf at 2024-08-01T17:08:19+02:00
Add vector shuffle primops

This adds vector shuffle primops, such as

```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```

which shuffle the components of the input two vectors into the output vector.

NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.

These are handled in the X86 NCG and the LLVM backend.

Tested in simd009.

- - - - -
4cf68e9d by sheaf at 2024-08-01T17:08:20+02:00
Add Broadcast MachOps

This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).

These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.

- - - - -
69fc289e by sheaf at 2024-08-01T17:08:20+02:00
Fix treatment of signed zero in vector negation

This commit fixes the handling of signed zero in floating-point vector
negation.

A slight hack was introduced to work around the fact that Cmm doesn't
currently have a notion of signed floating point literals
(see get_float_broadcast_value_reg). This can be removed once CmmFloat
can express the value -0.0.

The simd006 test has been updated to use a stricter notion of equality
of floating-point values, which ensure the validity of this change.

- - - - -
3aec009b by sheaf at 2024-08-01T17:08:20+02:00
Add min/max primops

This commit adds min/max primops, such as

  minDouble# :: Double# -> Double# -> Double#
  minFloatX4# :: FloatX4# -> FloatX4# -> FloatX4#
  minWord16X8# :: Word16X8# -> Word16X8# -> Word16X8#

These are supported in:
  - the X86, AArch64 and PowerPC NCGs,
  - the LLVM backend,
  - the WebAssembly and JavaScript backends.

Fixes #25120

- - - - -

30 changed files:

- .gitlab/rel_eng/mk-ghcup-metadata/mk_ghcup_metadata.py
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Asm.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/GenericOpt.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Lexer.x
- compiler/GHC/Cmm/Lint.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/Cmm/Type.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs
- compiler/GHC/CmmToAsm/Format.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC.hs
- compiler/GHC/CmmToAsm/PPC/CodeGen.hs
- compiler/GHC/CmmToAsm/PPC/Instr.hs

The diff was not included because it is too large.

View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/a57beb9ddd1adaa667adc2708b3244f864d40e0c...3aec009b6f34d80318e2f25739f9efe19c028ae8

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/a57beb9ddd1adaa667adc2708b3244f864d40e0c...3aec009b6f34d80318e2f25739f9efe19c028ae8
You're receiving this email because of your account on gitlab.haskell.org.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240802/b43f0968/attachment-0001.html>