simd branch ready for review

Thu Jan 31 21:30:23 CET 2013

On 01/31/2013 07:10 PM, David Terei wrote:
> On 31 January 2013 09:52, Geoffrey Mainland <mainland at apeiron.net> wrote:
>> On 01/31/2013 12:56 PM, Simon Marlow wrote:
>>> On 31/01/13 11:38, Geoffrey Mainland wrote:
>>>> * Win32 issues
>>>>
>>>> Modern 32-bit x86 *NIX systems align the stack to 16-bytes, but Win32
>>>> aligns only to 4-bytes. LLVM does not assume 16-byte stack
>>>> alignment. Instead, on platforms where 16-byte stack alignment is not
>>>> guaranteed, it 1) always outputs a function prologue that 2) aligns
>>>> the stack to a 16-byte boundary with an "and" instructions, and it
>>>> also 3) disables tail calls. Because LLVM aligns the stack for a
>>>> function that has SSE register spills, it also generates movaps
>>>> instructions (aligned SSE moves) for the spills.
>>>
>>> I must be misunderstanding your use of "always" above, because that
>>> would imply that the LLVM backend doesn't work on Win32 at all. Maybe
>>> LLVM only aligns the stack when it needs to store SSE values?
>>
>> You are correct---the stack-aligning prologue is only added by LLVM when
>> SSE values are written to the stack, so this wasn't a problem before we
>> had SSE support.
>>
>>>> This makes SSE support on Win32 difficult, and in my opinion not
>>>> worth worrying about.
>>>>
>>>> The alternative is to 1) patch LLVM to disable the stack-alignment
>>>> code so that we recover the ability to use tail calls and so that ebp
>>>> scribbled over by the prologue and 2) patch the mangler to rewrite
>>>> LLVM's movaps (move aligned) instructions to movups (move unaligned)
>>>> instructions. I have these patches, but they are not included in the
>>>> simd branch.
>>>
>>> I don't have an opinion here - maybe ask David T what he'd prefer.
>>
>> Requiring an LLVM hack seems pretty bad, and David yelled when I changed
>> the mangler since he wants to get rid of it eventually. My patches are
>> still around, so if we decide Win32 support is important, I can always
>> add the changes.
>
> Not supporting Win32 sucks but yes, I want to move to just requiring
> LLVM un-patched and no mangler. How ugly are the patches for LLVM? I'd
> be supportive of it if the plan is to get them merged upstream.
> Otherwise, I don't think it is worth the effort of having to carry
> around our own patched LLVM for installation on windows.

The patch against LLVM 3.0 is here:

https://github.com/mainland/ghc-simd-tests/blob/master/patches/llvm-3.0.patch

If you were to look, you'd see that it's not appropriate for upstream
integration. Please don't look :)

Since we have support for Win64 as of GHC 7.6, I vote that we forget
about Win32 support for SSE.

Simon, this reminds me of two other issues...

1) SSE vector values are only passed in registers on x86-64 anyway right
now. MAX_REAL_FLOAT_REG and MAX_REAL_DOUBLE_REG are both #defined to 0
on x86-32 in includes/stg/MachRegs.h. Are floats and double not passed
in registers on x86-32? I'm confused as to how this works. The GHC
calling convention in LLVM certainly says they are passed in registers.

2) SSE support is processor and platform dependent. What is the proper
way for the programmer to know what SSE primitives are available? A CPP
define? If so, what should it be called?

Right now one can look at the TARGET_* and __GLASGOW_HASKELL_LLVM__ CPP
macros and make a decision as to whether or not SSE primitives are
available, but that's not a great solution. Also, what happens when we
want to add AVX support? How do we control the inclusion of AVX support
when building GHC, and how do we let the programmer know that the AVX
primops/primtypes are available for use?

Geoff