[Haskell-cafe] Efficient matrix multiply using accelerate

Wed Sep 4 13:53:31 CEST 2013

I've been trying to get some speed out of the accelerate library today.
What I want to implement is something as simple as a matrix multiply.
I'd like it to be fast and memory efficient.
Given the equation
C = AB

where
  A is nxr
  B is rxm
  C is nxm

it seem reasonable to allocate three arrays on the GPU wiht n*r, r*m
and n*m elements respectively.

Anyone know how to achieve this with accelerate? My first thought was
to use the generate function to create the new C array, but I didn't
manage to wrap my head around all the fancy type features that pop up
when you want to return an array C that has dimensions dependent on
the dimensions of it's inputs, A and B.

I've search around a bit and found this [1] example implementation but
it is just as slow as a simple sequential algorithm in C. I would be
very thankful for any advice for working with accelerate!

Here's a snippet of what I have tried to make. There are several
errors in there. Maybe I'm approaching the problem from the wrong
angle.

matMul' arr brr =
  let dotProd shp =
        let (Z :. rowsA :. _)     = unlift (shape arr)    :: (Z :. Exp
Int :. Exp Int)
            (Z :. _     :. colsB) = unlift (shape brr)    :: (Z :. Exp
Int :. Exp Int)
            (Z :. i :. j) = unlift shp :: (Z :. Exp Int :. Exp Int)
            rs = lift (Z :. All :.) (unlift i)
            cs = (lift (Z :.) (unlift j)) (:. All)
        in the $ A.fold1All (+) $ A.zipWith (+) (flatten (slice arr
rs)) (flatten (slice brr cs))
  in A.generate (lift (Z :. rowsA :. colsB)) dotProd

[1] http://www.mail-archive.com/haskell-cafe@haskell.org/msg102782.html