The Curious Case of T6084 -or- Register Confusion with LLVM
Simon Peyton Jones
simonpj at microsoft.com
Thu Sep 21 07:44:57 UTC 2017
| One way to make this happen would be for C-- call nodes to carry information
| about the calling convention of the target (e.g. how many arguments of each
| type the function expects; in the same way identifiers in Core carry their
| type).
That's be entirely possible for "known" calls, where the target is known, but not for "unknown" (i.e higher order) ones where the target of the call varies.
The "Making a fast curry" paper goes into this in some detail. I think we already have different entry points for these two cases. So maybe they could have different entry conventions...
Simon
| -----Original Message-----
| From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ben Gamari
| Sent: 20 September 2017 16:54
| To: Moritz Angermann <moritz.angermann at gmail.com>; GHC developers <ghc-
| devs at haskell.org>
| Subject: Re: The Curious Case of T6084 -or- Register Confusion with LLVM
|
| Moritz Angermann <moritz.angermann at gmail.com> writes:
|
| [snip]
| >
| > I should not have the YMM*, and ZMM* registers as I don’t have any AVX
| > nor AVX512; that looks like only a patch away. However we try to
| > optimize our register, such that we can pass up to six doubles or six
| > floats or any combination of both if needed in registers, without
| > having to allocate them on the stack, by assuming overlapping registers
| (See Note [Overlapping global registers]).
| >
| > And as such a full function signature in LLVM would as opposed to one
| > that’s based on the “live” registers as we have right now, would
| > consist of 12 float/double registers, and LLVM only maps 6. My
| > current idea is to, pass only the explicit F1,D1,…,F3,D3 and try to
| > disable the register overlapping for LLVM. This would probably force
| > more floating values to be stack allocated rather than passed via
| > registers, but would likely guarantee that the registers match up.
| > The other option I can think of is to define some viertual generic
| > floating registers in the llvm code gen: V1,…,V6 and then perform
| > something like
| >
| > F1 <- V1 as float
| > D1 <- V1 as double
| >
| > in the body of the function, while trying to use the `live`
| > information at the call site to decide which of F1 or D1 to pass as V1.
| >
| Arguably the fundamental problem here is the assumption that all STG entry-
| points have the same machine-level calling convention. As you point out, our
| calling conventions in fact change due to things like register overlap.
| Ideally the LLVM we produce would reflect this.
|
| One way to make this happen would be for C-- call nodes to carry information
| about the calling convention of the target (e.g. how many arguments of each
| type the function expects; in the same way identifiers in Core carry their
| type). Unfortunately a brief look at the code generator suggests that this
| may require a fair amount of plumbing.
|
| It's important to note though that this overlap problem is something that
| will need to be addressed eventually if we are are to have proper SIMD
| support (due to overlap between XMM, YMM, and ZMM).
|
| Cheers,
|
| - Ben
More information about the ghc-devs
mailing list