simd branch ready for review

David Terei davidterei at
Thu Jan 31 20:10:47 CET 2013

On 31 January 2013 09:52, Geoffrey Mainland <mainland at> wrote:
> On 01/31/2013 12:56 PM, Simon Marlow wrote:
>> On 31/01/13 11:38, Geoffrey Mainland wrote:
>>> I've pushed my simd branch to Everything has been
>>> rebased against HEAD. Simon PJ and I looked over the changes together
>>> already, but I wanted to give you (and everyone on ghc-devs) the
>>> opportunity to look things over before I merge to HEAD. Simon PJ and I
>>> came up with a few questions/notes for you, but hopefully nothing that
>>> should delay a merge.
>> I'm happy for these to go in - we've already discussed the design a
>> few times, and you've incorporated changes we agreed before, so as far
>> as I'm concerned it's all good. Go for it!
> Cool.
>>> * Win32 issues
>>> Modern 32-bit x86 *NIX systems align the stack to 16-bytes, but Win32
>>> aligns only to 4-bytes. LLVM does not assume 16-byte stack
>>> alignment. Instead, on platforms where 16-byte stack alignment is not
>>> guaranteed, it 1) always outputs a function prologue that 2) aligns
>>> the stack to a 16-byte boundary with an "and" instructions, and it
>>> also 3) disables tail calls. Because LLVM aligns the stack for a
>>> function that has SSE register spills, it also generates movaps
>>> instructions (aligned SSE moves) for the spills.
>> I must be misunderstanding your use of "always" above, because that
>> would imply that the LLVM backend doesn't work on Win32 at all. Maybe
>> LLVM only aligns the stack when it needs to store SSE values?
> You are correct---the stack-aligning prologue is only added by LLVM when
> SSE values are written to the stack, so this wasn't a problem before we
> had SSE support.
>>> This makes SSE support on Win32 difficult, and in my opinion not
>>> worth worrying about.
>>> The alternative is to 1) patch LLVM to disable the stack-alignment
>>> code so that we recover the ability to use tail calls and so that ebp
>>> scribbled over by the prologue and 2) patch the mangler to rewrite
>>> LLVM's movaps (move aligned) instructions to movups (move unaligned)
>>> instructions. I have these patches, but they are not included in the
>>> simd branch.
>> I don't have an opinion here - maybe ask David T what he'd prefer.
> Requiring an LLVM hack seems pretty bad, and David yelled when I changed
> the mangler since he wants to get rid of it eventually. My patches are
> still around, so if we decide Win32 support is important, I can always
> add the changes.

Not supporting Win32 sucks but yes, I want to move to just requiring
LLVM un-patched and no mangler. How ugly are the patches for LLVM? I'd
be supportive of it if the plan is to get them merged upstream.
Otherwise, I don't think it is worth the effort of having to carry
around our own patched LLVM for installation on windows.


>>> * Could we add a CmmType field to GlobalReg's constructors? You'll see
>>> that I added a new XmmReg constructor to GlobalReg, but because I
>>> don't know the type of an XmmReg, I have to bitcast everywhere in the
>>> generated LLVM code because LLVM wants to know not just that a value
>>> is a 16-byte vector, but that it is, e.g., a 16-byte vector containing
>>> 2 64-bit doubles. Having a CmmType attached to a GlobalReg---or
>>> pairing a GlobalReg with a CmmType when assigning registers---would
>>> let me avoid all these casts.
>> We already have a function
>> globalRegType :: DynFlags -> GlobalReg -> CmmType
>> so I see that you're guessing in the case of XmmReg. Why not just add
>> the necessary information to XmmReg so that you don't have to guess in
>> globalRegType?
> There doesn't seem to be a clear best choice for this extra info. A
> CmmType seems reasonable, and if I'm adding a CmmType to XmmReg, why not
> add it everywhere and simplify globalRegType? I'll go ahead and stick
> with what I have now.
> Thanks for all your answers.
> Geoff
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at

More information about the ghc-devs mailing list