SIMD/SSE support & alignment

Tue Mar 12 21:08:38 CET 2013

Hey,

On Tue, 2013-03-12 at 14:09 +0000, Geoffrey Mainland wrote:
> On 03/10/2013 09:52 PM, Nicolas Trangez wrote:
> > ...
> 
> Hi Nicolas,
> 
> Have you read our paper about the SIMD work? It's available here:
> 
> https://research.microsoft.com/en-us/um/people/simonpj/papers/ndp/haskell-beats-C.pdf

I didn't read that one before (read other stream-fusion related papers
before), but did now. I got most of it already while reading the vector
simd branch commits. Benchmarks results look very nice!

I'm afraid I didn't 'get' how the framework would allow for both AVX and
SSE instructions to work on streams, since it seems to assume Multi's
are always a fixed number of bytes wide (in this case 16 for SSE).

> The paper describes the issues involved with integrated SIMD
> instructions with the vector fusion framework.
> 
> There are two primary issues with alignment: stack alignment and heap
> alignment.
> 
> We cannot rely on the stack being properly aligned for AVX spills on any
> platform, and LLVM's stack fixup code does not play well with GHC, so we
> *rewrite* all AVX spill instructions to their unaligned counterparts. On
> Win32 we must do the same for SSE.

Does this imply stack values are always 16-byte aligned?
I haven't worked with AVX yet (my CPU doesn't support it).

> Unboxed vectors are allocated by GHC, and it does not align memory on
> 16-byte boundaries, so our first cut at SSE intrinsics simply used
> unaligned accesses. Obviously with ForeignPtr's we can control alignment
> and potentially use the aligned variants of SSE instructions, but this
> will almost double the number of primops. One could imagine extending
> our fusion framework to transition to aligned move instructions.

Right. I created the patch of #7067
(http://hackage.haskell.org/trac/ghc/ticket/7067) for vector-simd
purposed back then (adding mallocForeignPtrAlignedBytes and
mallocPlainForeignPtrAlignedBytes).

> Finally, LLVM 3.2 does not work with GHC. This means we cannot yet take
> advantage of its new vectorization optimizations, which is a shame.
> 
> So, four projects for you or anyone else who is interested, in rough
> dependency order:
> 
> 1) Get LLVM 3.2 working with GHC's LLVM back end.

According to other mails in this thread this should be fixed. I'll give
it a go.

> 2) Fix the stack alignment issue with LLVM. This will likely require a
> patch to LLVM.

I'm afraid that's a bit out of my league for now :-)

> 3) Add support for aligned move primops.

I looked into this before, might give it a stab.

> 4) Extend the current SIMD fusion framework to handle transitioning to
> aligned move instructions. As an alternative, only use aligned move
> instructions on memory that we know is aligned.

This is why I sent my previous mail initially: is there any plan how to
approach the 'memory that we know is aligned' bit? Would it make sense
to have a more general 'alignment restriction' framework for arbitrary
values, not only unboxed vectors (if there are any other use-cases)?

> These are all on my todo list, but my plate is quite full at the moment.

Heh, sounds familiar ;-)

Thanks,

Nicolas