simd branch ready for review
Simon Marlow
marlowsd at gmail.com
Thu Jan 31 13:56:05 CET 2013
On 31/01/13 11:38, Geoffrey Mainland wrote:
> I've pushed my simd branch to darcs.haskell.org. Everything has been
> rebased against HEAD. Simon PJ and I looked over the changes together
> already, but I wanted to give you (and everyone on ghc-devs) the
> opportunity to look things over before I merge to HEAD. Simon PJ and I
> came up with a few questions/notes for you, but hopefully nothing that
> should delay a merge.
I'm happy for these to go in - we've already discussed the design a few
times, and you've incorporated changes we agreed before, so as far as
I'm concerned it's all good. Go for it!
> * Win32 issues
>
> Modern 32-bit x86 *NIX systems align the stack to 16-bytes, but Win32
> aligns only to 4-bytes. LLVM does not assume 16-byte stack
> alignment. Instead, on platforms where 16-byte stack alignment is not
> guaranteed, it 1) always outputs a function prologue that 2) aligns
> the stack to a 16-byte boundary with an "and" instructions, and it
> also 3) disables tail calls. Because LLVM aligns the stack for a
> function that has SSE register spills, it also generates movaps
> instructions (aligned SSE moves) for the spills.
I must be misunderstanding your use of "always" above, because that
would imply that the LLVM backend doesn't work on Win32 at all. Maybe
LLVM only aligns the stack when it needs to store SSE values?
> This makes SSE support on Win32 difficult, and in my opinion not
> worth worrying about.
>
> The alternative is to 1) patch LLVM to disable the stack-alignment
> code so that we recover the ability to use tail calls and so that ebp
> scribbled over by the prologue and 2) patch the mangler to rewrite
> LLVM's movaps (move aligned) instructions to movups (move unaligned)
> instructions. I have these patches, but they are not included in the
> simd branch.
I don't have an opinion here - maybe ask David T what he'd prefer.
> * How hard would it be to dump ArgRep for PrimRep? It looks
> straightforward. Is it worth doing?
ArgRep makes fewer distinctions than PrimRep, in particular it collapses
IntRep/WordRep/AddrRep into N and Int64Rep/Word64Rep into L.
I doubt it would improve things to get rid of ArgRep. It's only used in
a very few places, but those places would get more complicated if they
had to use PrimRep instead, because instead of pattern-matching on N you
would need a guard. I think ArgRep is ok, because it matches the
different ways we pass arguments to functions.
> * How hard would it be to track bit width in PrimRep? I recall chatting
> with you once about adding explicit support for, e.g., 8- and 16-bit
> Word/Int primops instead of relying on narrowing. Since SIMD vectors
> need to know the exact bit-width of their elements, I've had to create
> a PrimElemRep data type in compiler/types/TyCon.lhs, but I'd really
> like to be able to re-use PrimRep instead.
This is something we really should do, but it's a big job. Feel free to
have a go in your spare time!
> * If we replaced all old-style C-- code, could we get rid of the
> explicit STG registers completely? Simon PJ suggested that we use real
> machine registers directly, so, for example, GlobalReg's constructors
> would have FastString fields instead of Int fields.
It's difficult to get rid of *all* the old-style C--. Some of the
places I kept explicit-stack code because I was being lazy, but some of
them are really hard to write in new C-- (or at least are hard to write
in new C-- that compiles to good code).
I don't think that renaming R1 to %rbx (etc.) achieves a lot. It would
make things a bit more difficult for the LLVM backend, which has to
reverse the mapping. You do need a platform-independent name for R1 in
some places, like codeGen for example.
I have thought about whether you could remove R1 and co altogether (not
just rename them to machine registers) by extending Cmm to include
information about incoming parameters. e.g. for a function f(x,y,z), we
generate
f:
x = R1 -- %rbx
y = R2 -- %r14
z = R3 -- %rsi
... body of f ...
and here's the tricky bit: we almost never want to move those
assignments, because they generate no code. The register allocator
remembers that x is in %rbx and moves on; x can be spilled and %rbx can
be reused at any time. But if you had a misguided optimisation pass
that sinks these assignments down into the code somewhere, *then* the
code is worse, because now %rbx is live from the beginning of the
function until its use, and we have fewer registers to play with.
With this in mind, it seems natural to represent the code as
f(x = %rbx, y = %r14, z = %rsi):
... body of f ...
ie. statically prevent the motion of the copy-in assignments by
explicitly including them in the representation of a function. This is
a cool idea, but then we have to do the same for return points. Instead
of just plain labels, we have labels with register assignments.
Furthermore, we sometimes like to jump directly to a return point from
another code path (a join point), and load up the registers explicitly.
This is where being able to write an assignment to R1 comes in handy.
So I decided not to pursue this. It might still be a good idea, I'm
not sure.
> * Could we add a CmmType field to GlobalReg's constructors? You'll see
> that I added a new XmmReg constructor to GlobalReg, but because I
> don't know the type of an XmmReg, I have to bitcast everywhere in the
> generated LLVM code because LLVM wants to know not just that a value
> is a 16-byte vector, but that it is, e.g., a 16-byte vector containing
> 2 64-bit doubles. Having a CmmType attached to a GlobalReg---or
> pairing a GlobalReg with a CmmType when assigning registers---would
> let me avoid all these casts.
We already have a function
globalRegType :: DynFlags -> GlobalReg -> CmmType
so I see that you're guessing in the case of XmmReg. Why not just add
the necessary information to XmmReg so that you don't have to guess in
globalRegType?
Cheers,
Simon
More information about the ghc-devs
mailing list