[Haskell-cafe] vector, alignment and SIMD through FFI
nicolas at incubaid.com
Fri Jul 6 22:06:47 CEST 2012
Recently I've been playing with the implementation of an algorithm, for
which we already have highly-optimized implementations available (in
plain C/C++ as well as OCaml with calls to C through FFI).
The algorithm works on buffers/arrays/vectors/whatever you want to call
it, which needs to be combined in certain ways. This can be highly
optimized by using SIMD instructions (like the ones provides by several
I'd like to get a to Haskell version which is comparable in efficiency
as the existing versions, whilst remaining as 'functional' as possible.
I don't mind jumping into some low-level C glue and FFI (using ccall or
custom primops), but this should be limited.
Currently I have something working (still highly unoptimized) using
(unboxed) vectors from the vector package, using mutable versions within
a well-contained ST environment in some places.
One hot zone of the current version is combining several vectors, and
the performance of this operation could be greatly improved by using
SIMD instructions. There's one catch though: when using these, memory
should be aligned on certain boundaries (16 byte in this case).
First and foremost, to be able to pass my vectors to some C functions, I
should change my code into using Storable vectors (which should be fine,
I guess I can expect similar performance characteristics?). I couldn't
find any information on alignment guarantees of these vectors though...
Which is how I get to my question: are there any such guarantees? If
not, are there any pointers to how to proceed with this? I guess
tracking alignment at the type level should be the goal, you can find
some code trying to explain my reasoning up to now at the end of this
I have some issues with this:
- I'd need to re-implement almost all vector operations, which seems
- It doesn't actually work right now ;-)
- It'd be nice to be able to encode 'compatible' alignment: as an
example, a 16 byte aligned buffer is also 8 byte aligned...
I hope the above explains somewhat my goal. Any thoughts & help on this
would be very welcome!
module Data.Vector.SIMD (
import qualified Data.Vector.Storable as SV
import Foreign.Storable (Storable, sizeOf)
import Foreign.Ptr (Ptr, FunPtr)
import Foreign.ForeignPtr (ForeignPtr, newForeignPtr)
import System.IO.Unsafe (unsafePerformIO)
class Alignment a where
alignment :: a -> Int
instance Alignment A8Byte where
alignment _ = 8
instance Alignment A16Byte where
alignment _ = 16
newtype Alignment a => SIMDVector a b = V (SV.Vector b)
replicate :: (Alignment a, Storable b) => a -> Int -> b -> SIMDVector a
replicate a n b = V v
ptr = unsafePerformIO $ do
v <- _mm_malloc n (alignment a)
-- memset etc
v = SV.unsafeFromForeignPtr0 ptr n
-- These are 2 _stub versions of the procedures since xmminstr.h (or
-- when using GCC) contains them as inline procedures which are not
-- as-is in a library. There should be some C module which exports
-- _mm_malloc_stub and _mm_free_stub, which simply includes xmminstr.h
-- the underlying procedures.
foreign import ccall "_mm_malloc_stub" _mm_malloc_stub :: Int -> Int ->
IO (Ptr a)
foreign import ccall "_mm_free_stub" _mm_free_stub :: FunPtr (Ptr a ->
_mm_malloc :: Storable a => Int -> Int -> IO (ForeignPtr a)
_mm_malloc s l = do
-- This fails:
-- Ambiguous type variable `a0' in the constraint:
-- (Storable a0) arising from a use of `sizeOf'
-- v <- c_mm_malloc (s * sizeOf (undefined :: a)) l
newForeignPtr _mm_free_stub undefined
-- This allocates a 16 byte aligned output buffer, takes 2 existing ones
-- calls some FFI function to perform some magic.
-- The implementation could run inside ST, if the FFI import (which e.g.
-- on a mutable buffer and returns IO ()) is lifted into ST using
mySIMDFun :: SIMDVector A16Byte a -> SIMDVector A16Byte a -> SIMDVector
mySIMDFun a b = undefined
More information about the Haskell-Cafe