The Curious Case of T6084 -or- Register Confusion with LLVM

Thu Sep 21 07:44:57 UTC 2017

|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type).

That's be entirely possible for "known" calls, where the target is known, but not for "unknown" (i.e higher order) ones where the target of the call varies.

The "Making a fast curry" paper goes into this in some detail.  I think we already have different entry points for these two cases.  So maybe they could have different entry conventions...

Simon

|  -----Original Message-----
|  From: ghc-devs [mailto:ghc-devs-bounces at haskell.org] On Behalf Of Ben Gamari
|  Sent: 20 September 2017 16:54
|  To: Moritz Angermann <moritz.angermann at gmail.com>; GHC developers <ghc-
|  devs at haskell.org>
|  Subject: Re: The Curious Case of T6084 -or- Register Confusion with LLVM
|  
|  Moritz Angermann <moritz.angermann at gmail.com> writes:
|  
|  [snip]
|  >
|  > I should not have the YMM*, and ZMM* registers as I don’t have any AVX
|  > nor AVX512; that looks like only a patch away.  However we try to
|  > optimize our register, such that we can pass up to six doubles or six
|  > floats or any combination of both if needed in registers, without
|  > having to allocate them on the stack, by assuming overlapping registers
|  (See Note [Overlapping global registers]).
|  >
|  > And as such a full function signature in LLVM would as opposed to one
|  > that’s based on the “live” registers as we have right now, would
|  > consist of 12 float/double registers, and LLVM only maps 6.  My
|  > current idea is to, pass only the explicit F1,D1,…,F3,D3 and try to
|  > disable the register overlapping for LLVM.  This would probably force
|  > more floating values to be stack allocated rather than passed via
|  > registers, but would likely guarantee that the registers match up.
|  > The other option I can think of is to define some viertual generic
|  > floating registers in the llvm code gen: V1,…,V6 and then perform
|  > something like
|  >
|  >   F1 <- V1 as float
|  >   D1 <- V1 as double
|  >
|  > in the body of the function, while trying to use the `live`
|  > information at the call site to decide which of F1 or D1 to pass as V1.
|  >
|  Arguably the fundamental problem here is the assumption that all STG entry-
|  points have the same machine-level calling convention. As you point out, our
|  calling conventions in fact change due to things like register overlap.
|  Ideally the LLVM we produce would reflect this.
|  
|  One way to make this happen would be for C-- call nodes to carry information
|  about the calling convention of the target (e.g. how many arguments of each
|  type the function expects; in the same way identifiers in Core carry their
|  type). Unfortunately a brief look at the code generator suggests that this
|  may require a fair amount of plumbing.
|  
|  It's important to note though that this overlap problem is something that
|  will need to be addressed eventually if we are are to have proper SIMD
|  support (due to overlap between XMM, YMM, and ZMM).
|  
|  Cheers,
|  
|  - Ben