[Git][ghc/ghc][wip/ncg-simd] 62 commits: Make flip representation polymorphic, similar to ($) and (&)

sheaf (@sheaf) gitlab at gitlab.haskell.org
Thu Jun 27 10:36:56 UTC 2024



sheaf pushed to branch wip/ncg-simd at Glasgow Haskell Compiler / GHC


Commits:
e0099721 by Andrew Lelechenko at 2024-06-16T17:57:38-04:00
Make flip representation polymorphic, similar to ($) and (&)

CLC proposal: https://github.com/haskell/core-libraries-committee/issues/245

- - - - -
118a1292 by Alan Zimmerman at 2024-06-16T17:58:15-04:00
EPA: Add location to Match Pats list

So we can freely modify the pats and the following item spacing will
still be valid when exact printing.

Closes #24862

- - - - -
db343324 by Fabricio de Sousa Nascimento at 2024-06-17T10:01:51-04:00
compiler: Rejects RULES whose LHS immediately fails to type-check

Fixes GHC crashing on `decomposeRuleLhs` due to ignoring coercion values. This
happens when we have a RULE that does not type check, and enable
`-fdefer-type-errors`. We prevent this to happen by rejecting RULES with an
immediately LHS type error.

Fixes #24026

- - - - -
e7a95662 by Dylan Thinnes at 2024-06-17T10:02:35-04:00
Add hscTypecheckRenameWithDiagnostics, for HLS (#24996)

Use runHsc' in runHsc so that both functions can't fall out of sync

We're currently copying parts of GHC code to get structured warnings
in HLS, so that we can recreate `hscTypecheckRenameWithDiagnostics`
locally. Once we get this function into GHC we can drop the copied code
in future versions of HLS.

- - - - -
d70abb49 by sheaf at 2024-06-18T18:47:20-04:00
Clarify -XGADTs enables existential quantification

Even though -XGADTs does not turn on -XExistentialQuantification,
it does allow the user of existential quantification syntax, without
needing to use GADT-style syntax.

Fixes #20865

- - - - -
13fdf788 by David Binder at 2024-06-18T18:48:02-04:00
Add RTS flag --read-tix-file (GHC Proposal 612)

This commit introduces the RTS flag `--read-tix-file=<yes|no>` which
controls whether a preexisting .tix file is read in at the beginning
of a program run. The default is currently `--read-tix-file=yes` but
will change to `--read-tix-file=no` in a future release of GHC. For
this reason, whenever a .tix file is read in a warning is emitted to
stderr. This warning can be silenced by explicitly passing the
`--read-tix-file=yes` option. Details can be found in the GHC proposal
cited below.

Users can query whether this flag has been used with the help of the
module `GHC.RTS.Flags`. A new field `readTixFile` was added to the
record `HpcFlags`.

These changes have been discussed and approved in
- GHC proposal 612: https://github.com/ghc-proposals/ghc-proposals/pull/612
- CLC proposal 276: https://github.com/haskell/core-libraries-committee/issues/276

- - - - -
f0e3cb6a by Fendor at 2024-06-18T18:48:38-04:00
Improve sharing of duplicated values in `ModIface`, fixes #24723

As a `ModIface` often contains duplicated values that are not
necessarily shared, we improve sharing by serialising the `ModIface`
to an in-memory byte array. Serialisation uses deduplication tables, and
deserialisation implicitly shares duplicated values.

This helps reducing the peak memory usage while compiling in
`--make` mode. The peak memory usage is especially smaller when
generating interface files with core expressions
(`-fwrite-if-simplified-core`).

On agda, this reduces the peak memory usage:

* `2.2 GB` to `1.9 GB` for a ghci session.

On `lib:Cabal`, we report:

* `570 MB` to `500 MB` for a ghci session
* `790 MB` to `667 MB` for compiling `lib:Cabal` with ghc

There is a small impact on execution time, around 2% on the agda code
base.

- - - - -
1bab7dde by Fendor at 2024-06-18T18:48:38-04:00
Avoid unneccessarily re-serialising the `ModIface`

To reduce memory usage of `ModIface`, we serialise `ModIface` to an
in-memory byte array, which implicitly shares duplicated values.

This serialised byte array can be reused to avoid work when we actually
write the `ModIface` to disk.
We introduce a new field to `ModIface` which allows us to save the byte
array, and write it direclty to disk if the `ModIface` wasn't changed
after the initial serialisation.

This requires us to change absolute offsets, for example to jump to the
deduplication table for `Name` or `FastString` with relative offsets, as
the deduplication byte array doesn't contain header information, such as
fingerprints.
To allow us to dump the binary blob to disk, we need to replace all
absolute offsets with relative ones.

We introduce additional helpers for `ModIface` binary serialisation, which
construct relocatable binary blobs. We say the binary blob is relocatable,
if the binary representation can be moved and does not contain any
absolute offsets.

Further, we introduce new primitives for `Binary` that allow to create
relocatable binaries, such as `forwardGetRel` and `forwardPutRel`.

-------------------------
Metric Decrease:
    MultiLayerModulesDefsGhcWithCore
Metric Increase:
    MultiComponentModules
    MultiLayerModules
    T10421
    T12150
    T12234
    T12425
    T13035
    T13253-spj
    T13701
    T13719
    T14697
    T15703
    T16875
    T18698b
    T18140
    T18304
    T18698a
    T18730
    T18923
    T20049
    T24582
    T5837
    T6048
    T9198
    T9961
    mhu-perf
-------------------------

These metric increases may look bad, but they are all completely benign,
we simply allocate 1 MB per module for `shareIface`. As this allocation
is quite quick, it has a negligible impact on run-time performance.
In fact, the performance difference wasn't measurable on my local
machine. Reducing the size of the pre-allocated 1 MB buffer avoids these
test failures, but also requires us to reallocate the buffer if the
interface file is too big. These reallocations *did* have an impact on
performance, which is why I have opted to accept all these metric
increases, as the number of allocated bytes is merely a guidance.

This 1MB allocation increase causes a lot of tests to fail that
generally have a low allocation number. E.g., increasing from 40MB to
41MB is a 2.5% increase.
In particular, the tests T12150, T13253-spj, T18140, T18304, T18698a,
T18923, T20049, T24582, T5837, T6048, and T9961 only fail on i386-darwin
job, where the number of allocated bytes seems to be lower than in other
jobs.
The tests T16875 and T18698b fail on i386-linux for the same reason.

- - - - -
099992df by Andreas Klebinger at 2024-06-18T18:49:14-04:00
Improve documentation of @Any@ type.

In particular mention possible uses for non-lifted types.

Fixes #23100.

- - - - -
5e75412b by Jakob Bruenker at 2024-06-18T18:49:51-04:00
Update user guide to indicate support for 64-tuples

- - - - -
4f5da595 by Andreas Klebinger at 2024-06-18T18:50:28-04:00
lint notes: Add more info to notes.stdout

When fixing a note reference CI fails with a somewhat confusing diff.
See #21123. This commit adds a line to the output file being compared
which hopefully makes it clear this is the list of broken refs, not all
refs.

Fixes #21123

- - - - -
1eb15c61 by Jakob Bruenker at 2024-06-18T18:51:04-04:00
docs: Update mention of ($) type in user guide

Fixes #24909

- - - - -
1d66c9e3 by Jan Hrček at 2024-06-18T18:51:47-04:00
Remove duplicate Anno instances

- - - - -
8ea0ba95 by Sven Tennie at 2024-06-18T18:52:23-04:00
AArch64: Delete unused RegNos

This has the additional benefit of getting rid of the -1 encoding (real
registers start at 0.)

- - - - -
325422e0 by Sjoerd Visscher at 2024-06-18T18:53:04-04:00
Bump stm submodule to current master

- - - - -
64fba310 by Cheng Shao at 2024-06-18T18:53:40-04:00
testsuite: bump T17572 timeout on wasm32

- - - - -
eb612fbc by Sven Tennie at 2024-06-19T06:46:00-04:00
AArch64: Simplify BL instruction

The BL constructor carried unused data in its third argument.

- - - - -
b0300503 by Alan Zimmerman at 2024-06-19T06:46:36-04:00
TTG: Move SourceText from `Fixity` to `FixitySig`

It is only used there, simplifies the use of `Fixity` in the rest of
the code, and is moved into a TTG extension point.

Precedes !12842, to simplify it

- - - - -
842e119b by Rodrigo Mesquita at 2024-06-19T06:47:13-04:00
base: Deprecate some .Internal modules

Deprecates the following modules according to clc-proposal #217:
https://github.com/haskell/core-libraries-committee/issues/217

* GHC.TypeNats.Internal
* GHC.TypeLits.Internal
* GHC.ExecutionStack.Internal

Closes #24998

- - - - -
24e89c40 by Jacco Krijnen at 2024-06-20T07:21:27-04:00
ttg: Use List instead of Bag in AST for LHsBindsLR

Considering that the parser used to create a Bag of binds using a
cons-based approach, it can be also done using lists. The operations in
the compiler don't really require Bag.

By using lists, there is no dependency on GHC.Data.Bag anymore from the
AST.

Progress towards #21592

- - - - -
04f5bb85 by Simon Peyton Jones at 2024-06-20T07:22:03-04:00
Fix untouchability test

This MR fixes #24938.  The underlying problem was tha the test for
"does this implication bring in scope any equalities" was plain wrong.

See
  Note [Tracking Given equalities] and
  Note [Let-bound skolems]
both in GHC.Tc.Solver.InertSet.

Then
* Test LocalGivenEqs succeeds for a different reason than before;
  see (LBS2) in Note [Let-bound skolems]

* New test T24938a succeeds because of (LBS2), whereas it failed
  before.

* Test LocalGivenEqs2 now fails, as it should.

* Test T224938, the repro from the ticket, fails, as it should.

- - - - -
9a757a27 by Simon Peyton Jones at 2024-06-20T07:22:40-04:00
Fix demand signatures for join points

This MR tackles #24623 and #23113

The main change is to give a clearer notion of "worker/wrapper arity", esp
for join points. See GHC.Core.Opt.DmdAnal
     Note [Worker/wrapper arity and join points]
This Note is a good summary of what this MR does:

(1) The "worker/wrapper arity" of an Id is
    * For non-join-points: idArity
    * The join points: the join arity (Id part only of course)
    This is the number of args we will use in worker/wrapper.
    See `ww_arity` in `dmdAnalRhsSig`, and the function `workWrapArity`.

(2) A join point's demand-signature arity may exceed the Id's worker/wrapper
    arity.  See the `arity_ok` assertion in `mkWwBodies`.

(3) In `finaliseArgBoxities`, do trimBoxity on any argument demands beyond
    the worker/wrapper arity.

(4) In WorkWrap.splitFun, make sure we split based on the worker/wrapper
    arity (re)-computed by workWrapArity.

- - - - -
5e8faaf1 by Jan Hrček at 2024-06-20T07:23:20-04:00
Update haddocks of Import/Export AST types

- - - - -
cd512234 by Hécate Kleidukos at 2024-06-20T07:24:02-04:00
haddock: Update bounds in cabal files and remove allow-newer stanza in cabal.project

- - - - -
8a8ff8f2 by Rodrigo Mesquita at 2024-06-20T07:24:38-04:00
cmm: Don't parse MO_BSwap for W8

Don't support parsing bswap8, since bswap8 is not really an operation
and would have to be implemented as a no-op (and currently is not
implemented at all).

Fixes #25002

- - - - -
5cc472f5 by sheaf at 2024-06-20T07:25:14-04:00
Delete unused testsuite files

These files were committed by mistake in !11902.
This commit simply removes them.

- - - - -
7b079378 by Matthew Pickering at 2024-06-20T07:25:50-04:00
Remove left over debugging pragma from 2016

This pragma was accidentally introduced in 648fd73a7b8fbb7955edc83330e2910428e76147

The top-level cost centres lead to a lack of optimisation when compiling
with profiling.

- - - - -
c872e09b by Hécate Kleidukos at 2024-06-20T19:28:36-04:00
haddock: Remove unused pragmata, qualify usages of Data.List functions, add more sanity checking flags by default

This commit enables some extensions and GHC flags in the cabal file in a way
that allows us to reduce the amount of prologuing on top of each file.

We also prefix the usage of some List functions that removes ambiguity
when they are also exported from the Prelude, like foldl'.
In general, this has the effect of pointing out more explicitly
that a linked list is used.

Metric Increase:
    haddock.Cabal
    haddock.base
    haddock.compiler

- - - - -
8c87d4e1 by Arnaud Spiwack at 2024-06-20T19:29:12-04:00
Add test case for #23586

- - - - -
568de8a5 by Arnaud Spiwack at 2024-06-20T19:29:12-04:00
When matching functions in rewrite rules: ignore multiplicity

When matching a template variable to an expression, we check that it
has the same type as the matched expression. But if the variable `f` has
type `A -> B` while the expression `e` has type `A %1 -> B`, the match was
previously rejected.

A principled solution would have `f` substituted by `\(%Many x) -> e
x` or some other appropriate coercion. But since linearity is not
properly checked in Core, we can be cheeky and simply ignore
multiplicity while matching. Much easier.

This has forced a change in the linter which, when `-dlinear-core-lint`
is off, must consider that `a -> b` and `a %1 -> b` are equal. This is
achieved by adding an argument to configure the behaviour of
`nonDetCmpTypeX` and modify `ensureEqTys` to call to the new behaviour
which ignores multiplicities when comparing two `FunTy`.

Fixes #24725.

- - - - -
c8a8727e by Simon Peyton Jones at 2024-06-20T19:29:12-04:00
Faster type equality

This MR speeds up type equality, triggered by perf regressions that
showed up when fixing #24725 by parameterising type equality over
whether to ignore multiplicity.

The changes are:

* Do not use `nonDetCmpType` for type /equality/. Instead use a specialised
  type-equality function, which we have always had!

  `nonDetCmpType` remains, but I did not invest effort in refactoring
  or optimising it.

* Type equality is parameterised by
    - whether to expand synonyms
    - whether to respect multiplicities
    - whether it has a RnEnv2 environment
  In this MR I systematically specialise it for static values of these
  parameters.  Much more direct and predictable than before.  See
  Note [Specialising type equality]

* We want to avoid comparing kinds if possible.  I refactored how this
  happens, at least for `eqType`.
  See Note [Casts and coercions in type comparison]

* To make Lint fast, we want to avoid allocating a thunk for <msg> in
      ensureEqTypes ty1 ty2 <msg>
  because the test almost always succeeds, and <msg> isn't needed.
  See Note [INLINE ensureEqTys]

Metric Decrease:
    T13386
    T5030

- - - - -
21fc180b by Ryan Hendrickson at 2024-06-22T10:40:55-04:00
base: Add inits1 and tails1 to Data.List

- - - - -
d640a3b6 by Sebastian Graf at 2024-06-22T10:41:32-04:00
Derive previously hand-written `Lift` instances (#14030)

This is possible now that #22229 is fixed.

- - - - -
33fee6a2 by Sebastian Graf at 2024-06-22T10:41:32-04:00
Implement the "Derive Lift instances for data types in template-haskell" proposal (#14030)

After #22229 had been fixed, we can finally derive the `Lift` instance for the
TH AST, as proposed by Ryan Scott in
https://mail.haskell.org/pipermail/libraries/2015-September/026117.html.

Fixes #14030, #14296, #21759 and #24560.

The residency of T24471 increases by 13% because we now load `AnnLookup`
from its interface file, which transitively loads the whole TH AST.
Unavoidable and not terrible, I think.

Metric Increase:
    T24471

- - - - -
383c01a8 by Matthew Pickering at 2024-06-22T10:42:08-04:00
bindist: Use complete relative paths when cding to directories

If a user has configured CDPATH on their system then `cd lib` may change
into an unexpected directory during the installation process.

If you write `cd ./lib` then it will not consult `CDPATH` to determine
what you mean.

I have added a check on ghcup-ci to verify that the bindist installation
works in this situation.

Fixes #24951

- - - - -
5759133f by Hécate Kleidukos at 2024-06-22T10:42:49-04:00
haddock: Use the more precise SDocContext instead of DynFlags

The pervasive usage of DynFlags (the parsed command-line options passed
to ghc) blurs the border between different components of Haddock, and
especially those that focus solely on printing text on the screen.

In order to improve the understanding of the real dependencies of a
function, the pretty-printer options are made concrete earlier in the
pipeline instead of late when pretty-printing happens.

This also has the advantage of clarifying which functions actually
require DynFlags for purposes other than pretty-printing, thus making
the interactions between Haddock and GHC more understandable when
exploring the code base.

See Henry, Ericson, Young. "Modularizing GHC".
https://hsyl20.fr/home/files/papers/2022-ghc-modularity.pdf. 2022

- - - - -
749e089b by Alexander McKenna at 2024-06-22T10:43:24-04:00
Add INLINE [1] pragma to compareInt / compareWord

To allow rules to be written on the concrete implementation of
`compare` for `Int` and `Word`, we need to have an `INLINE [1]`
pragma on these functions, following the
`matching_overloaded_methods_in_rules` note in `GHC.Classes`.

CLC proposal https://github.com/haskell/core-libraries-committee/issues/179

Fixes https://gitlab.haskell.org/ghc/ghc/-/issues/22643

- - - - -
db033639 by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
ci: Enable strict ghc-toolchain setting for bindists

- - - - -
14308a8f by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
ghc-toolchain: Improve parse failure error

Improves the error message for when `ghc-toolchain` fails to read a
valid `Target` value from a file (in doFormat mode).

- - - - -
6e7cfff1 by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
bindist: ghc-toolchain related options in configure

- - - - -
958d6931 by Matthew Pickering at 2024-06-24T17:21:15-04:00
ci: Fail when bindist configure fails when installing bindist

It is better to fail earlier if the configure step fails rather than
carrying on for a more obscure error message.

- - - - -
f48d157d by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
ghc-toolchain: Fix error logging indentation

- - - - -
f1397104 by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
bindist: Correct default.target substitution

The substitution on `default.target.in` must be done after
`PREP_TARGET_FILE` is called -- that macro is responsible for
setting the variables that will be effectively substituted in the target
file. Otherwise, the target file is invalid.

Fixes #24792 #24574

- - - - -
665e653e by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
configure: Prefer tool name over tool path

It is non-obvious whether the toolchain configuration should use
full-paths to tools or simply their names. In addressing #24574, we've
decided to prefer executable names over paths, ultimately, because the
bindist configure script already does this, thus is the default in ghcs
out there.

Updates the in-tree configure script to prefer tool names
(`AC_CHECK_TOOL` rather than `AC_PATH_TOOL`) and `ghc-toolchain` to
ignore the full-path-result of `findExecutable`, which it previously
used over the program name.

This change doesn't undo the fix in bd92182cd56140ffb2f68ec01492e5aa6333a8fc
because `AC_CHECK_TOOL` still takes into account the target triples,
unlike `AC_CHECK_PROG/AC_PATH_PROG`.

- - - - -
463716c2 by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
dist: Don't forget to configure JavascriptCPP

We introduced a configuration step for the javascript preprocessor, but
only did so for the in-tree configure script.

This commit makes it so that we also configure the javascript
preprocessor in the configure shipped in the compiler bindist.

- - - - -
e99cd73d by Rodrigo Mesquita at 2024-06-24T17:21:15-04:00
distrib: LlvmTarget in distrib/configure

LlvmTarget was being set and substituted in the in-tree configure, but
not in the configure shipped in the bindist.

We want to set the LlvmTarget to the canonical LLVM name of the platform
that GHC is targetting.

Currently, that is going to be the boostrapped llvm target (hence the
code which sets LlvmTarget=bootstrap_llvm_target).

- - - - -
4199aafe by Matthew Pickering at 2024-06-24T17:21:51-04:00
Update bootstrap plans for recent GHC versions (9.6.5, 9.8.2, 9.10.10)

- - - - -
f599d816 by Matthew Pickering at 2024-06-24T17:21:51-04:00
ci: Add 9_10 bootstrap testing job

- - - - -
8f4b799d by Hécate Kleidukos at 2024-06-24T17:22:30-04:00
haddock: Move the usage of mkParserOpts directly to ppHyperlinkedModuleSource in order to avoid passing a whole DynFlags

Follow up to !12931

- - - - -
210cf1cd by Hécate Kleidukos at 2024-06-24T17:22:30-04:00
haddock: Remove cabal file linting rule

This will be reintroduced with a properly ignored commit
when the cabal files are themselves formatted for good.

- - - - -
7fe85b13 by Peter Trommler at 2024-06-24T22:03:41-04:00
PPC NCG: Fix sign hints in C calls

Sign hints for parameters are in the second component of the pair.

Fixes #23034

- - - - -
949a0e0b by Andrew Lelechenko at 2024-06-24T22:04:17-04:00
base: fix missing changelog entries

- - - - -
1bfa9111 by Andreas Klebinger at 2024-06-26T21:49:53-04:00
GHCi interpreter: Tag constructor closures when possible.

When evaluating PUSH_G try to tag the reference we are pushing if it's a
constructor. This is potentially helpful for performance and required to
fix #24870.

- - - - -
caf44a2d by Andrew Lelechenko at 2024-06-26T21:50:30-04:00
Implement Data.List.compareLength and Data.List.NonEmpty.compareLength

`compareLength xs n` is a safer and faster alternative to `compare (length xs) n`.
The latter would force and traverse the entire spine (potentially diverging),
while the former traverses as few elements as possible.

The implementation is carefully designed to maintain as much laziness as possible.

As per https://github.com/haskell/core-libraries-committee/issues/257

- - - - -
f4606ae0 by Serge S. Gulin at 2024-06-26T21:51:05-04:00
Unicode: adding compact version of GeneralCategory (resolves #24789)

The following features are applied:
1. Lookup code like Cmm-switches (draft implementation proposed by Sylvain Henry @hsyl20)
2. Nested ifs (logarithmic search vs linear search) (the idea proposed by Sylvain Henry @hsyl20)

-------------------------
Metric Decrease:
    size_hello_artifact
    size_hello_unicode
-------------------------

- - - - -
0e424304 by Hécate Kleidukos at 2024-06-26T21:51:44-04:00
haddock: Restructure import statements

This commit removes idiosyncrasies that have accumulated with the years
in how import statements were laid out, and defines clear but simple
guidelines in the CONTRIBUTING.md file.

- - - - -
9b8ddaaf by Arnaud Spiwack at 2024-06-26T21:52:23-04:00
Rename test for #24725

I must have fumbled my tabs when I copy/pasted the issue number in
8c87d4e1136ae6d28e92b8af31d78ed66224ee16.

- - - - -
b0944623 by Arnaud Spiwack at 2024-06-26T21:52:23-04:00
Add original reproducer for #24725

- - - - -
2dcd8946 by sheaf at 2024-06-27T10:36:48+00:00
The X86 SIMD patch.

This commit adds support for 128 bit wide SIMD vectors and vector
operations to GHC's X86 native code generator.

Main changes:

  - Introduction of vector formats (`GHC.CmmToAsm.Format`)
  - Introduction of 128-bit virtual register (`GHC.Platform.Reg`),
    and removal of unused Float virtual register.
  - Refactor of `GHC.Platform.Reg.Class.RegClass`: it now only contains
    two classes, `RcInteger` (for general purpose registers) and `RcFloatOrVector`
    (for registers that can be used for scalar floating point values as well
    as vectors).
  - Modify `GHC.CmmToAsm.X86.Instr.regUsageOfInstr` to keep track
    of which format each register is used at, so that the register
    allocator can know if it needs to spill the entire vector register
    or just the lower 64 bits.
  - Modify spill/load/reg-2-reg code to account for vector registers
    (`GHC.CmmToAsm.X86.Instr.{mkSpillInstr, mkLoadInstr, mkRegRegMoveInstr, takeRegRegMoveInstr}`).
  - Modify the register allocator code (`GHC.CmmToAsm.Reg.*`) to propagate
    the format we are storing in any given register, for instance changing
    `Reg` to `RegFormat` or `GlobalReg` to `GlobalRegUse`.
  - Add logic to lower vector `MachOp`s to X86 assembly
    (see `GHC.CmmToAsm.X86.CodeGen`)
  - Minor cleanups to genprimopcode, to remove the llvm_only attribute
    which is no longer applicable.

Tests for this feature are provided in the "testsuite/tests/simd" directory.

Fixes #7741

Keeping track of register formats adds a small memory overhead to the
register allocator (in particular, regUsageOfInstr now allocates more
to keep track of the `Format` each register is used at). This explains
the following metric increases.

-------------------------
Metric Increase:
    T12707
    T13035
    T13379
    T3294
    T4801
    T5321FD
    T5321Fun
    T783
-------------------------

- - - - -
5777efaf by sheaf at 2024-06-27T10:36:48+00:00
Add vector fused multiply-add operations

This commit adds fused multiply add operations such as `fmaddDoubleX2#`.
These are handled both in the X86 NCG and the LLVM backends.

- - - - -
f61d06cb by sheaf at 2024-06-27T10:36:48+00:00
Add vector shuffle primops

This adds vector shuffle primops, such as

```
shuffleFloatX4# :: FloatX4# -> FloatX4# -> (# Int#, Int#, Int#, Int# #) -> FloatX4#
```

which shuffle the components of the input two vectors into the output vector.

NB: the indices must be compile time literals, to match the X86 SHUFPD
instruction immediate and the LLVM shufflevector instruction.

These are handled in the X86 NCG and the LLVM backend.

Tested in simd009.

- - - - -
2d35ec4b by sheaf at 2024-06-27T10:36:48+00:00
Add Broadcast MachOps

This adds proper MachOps for broadcast instructions, allowing us to
produce better code for broadcasting a value than simply packing that
value (doing many vector insertions in a row).

These are lowered in the X86 NCG and LLVM backends. In the LLVM backend,
it uses the previously introduced shuffle instructions.

- - - - -


30 changed files:

- .gitlab-ci.yml
- .gitlab/ci.sh
- .gitlab/generate-ci/gen_ci.hs
- .gitlab/jobs.yaml
- compiler/GHC.hs
- compiler/GHC/Builtin/PrimOps.hs
- compiler/GHC/Builtin/Types.hs
- compiler/GHC/Builtin/primops.txt.pp
- compiler/GHC/ByteCode/Instr.hs
- compiler/GHC/Cmm.hs
- compiler/GHC/Cmm/CallConv.hs
- compiler/GHC/Cmm/Graph.hs
- compiler/GHC/Cmm/Liveness.hs
- compiler/GHC/Cmm/MachOp.hs
- compiler/GHC/Cmm/Node.hs
- compiler/GHC/Cmm/Opt.hs
- compiler/GHC/Cmm/Parser.y
- compiler/GHC/Cmm/ProcPoint.hs
- compiler/GHC/Cmm/Reg.hs
- compiler/GHC/Cmm/Sink.hs
- compiler/GHC/CmmToAsm.hs
- compiler/GHC/CmmToAsm/AArch64.hs
- compiler/GHC/CmmToAsm/AArch64/CodeGen.hs
- compiler/GHC/CmmToAsm/AArch64/Instr.hs
- compiler/GHC/CmmToAsm/AArch64/Ppr.hs
- compiler/GHC/CmmToAsm/AArch64/Regs.hs
- compiler/GHC/CmmToAsm/Config.hs
- compiler/GHC/CmmToAsm/Format.hs
- compiler/GHC/CmmToAsm/Instr.hs
- compiler/GHC/CmmToAsm/PPC.hs


The diff was not included because it is too large.


View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/08de0594d1d03cbf37328f551de0c9e57842bdcb...2d35ec4bd1da99dcb0eb4bd82164bb825d78c463

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/08de0594d1d03cbf37328f551de0c9e57842bdcb...2d35ec4bd1da99dcb0eb4bd82164bb825d78c463
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240627/108245b5/attachment-0001.html>


More information about the ghc-commits mailing list