[Haskell-cafe] Efficient matrix multiply using accelerate
Morten Olsen Lysgaard
morten at lysgaard.no
Wed Sep 4 13:53:31 CEST 2013
I've been trying to get some speed out of the accelerate library today.
What I want to implement is something as simple as a matrix multiply.
I'd like it to be fast and memory efficient.
Given the equation
C = AB
where
A is nxr
B is rxm
C is nxm
it seem reasonable to allocate three arrays on the GPU wiht n*r, r*m
and n*m elements respectively.
Anyone know how to achieve this with accelerate? My first thought was
to use the generate function to create the new C array, but I didn't
manage to wrap my head around all the fancy type features that pop up
when you want to return an array C that has dimensions dependent on
the dimensions of it's inputs, A and B.
I've search around a bit and found this [1] example implementation but
it is just as slow as a simple sequential algorithm in C. I would be
very thankful for any advice for working with accelerate!
Here's a snippet of what I have tried to make. There are several
errors in there. Maybe I'm approaching the problem from the wrong
angle.
matMul' arr brr =
let dotProd shp =
let (Z :. rowsA :. _) = unlift (shape arr) :: (Z :. Exp
Int :. Exp Int)
(Z :. _ :. colsB) = unlift (shape brr) :: (Z :. Exp
Int :. Exp Int)
(Z :. i :. j) = unlift shp :: (Z :. Exp Int :. Exp Int)
rs = lift (Z :. All :.) (unlift i)
cs = (lift (Z :.) (unlift j)) (:. All)
in the $ A.fold1All (+) $ A.zipWith (+) (flatten (slice arr
rs)) (flatten (slice brr cs))
in A.generate (lift (Z :. rowsA :. colsB)) dotProd
[1] http://www.mail-archive.com/haskell-cafe@haskell.org/msg102782.html
More information about the Haskell-Cafe
mailing list