# [Haskell-cafe] Efficient matrix multiply using accelerate

Morten Olsen Lysgaard morten at lysgaard.no
Wed Sep 4 13:53:31 CEST 2013

```I've been trying to get some speed out of the accelerate library today.
What I want to implement is something as simple as a matrix multiply.
I'd like it to be fast and memory efficient.
Given the equation
C = AB

where
A is nxr
B is rxm
C is nxm

it seem reasonable to allocate three arrays on the GPU wiht n*r, r*m
and n*m elements respectively.

Anyone know how to achieve this with accelerate? My first thought was
to use the generate function to create the new C array, but I didn't
manage to wrap my head around all the fancy type features that pop up
when you want to return an array C that has dimensions dependent on
the dimensions of it's inputs, A and B.

I've search around a bit and found this [1] example implementation but
it is just as slow as a simple sequential algorithm in C. I would be
very thankful for any advice for working with accelerate!

Here's a snippet of what I have tried to make. There are several
errors in there. Maybe I'm approaching the problem from the wrong
angle.

matMul' arr brr =
let dotProd shp =
let (Z :. rowsA :. _)     = unlift (shape arr)    :: (Z :. Exp
Int :. Exp Int)
(Z :. _     :. colsB) = unlift (shape brr)    :: (Z :. Exp
Int :. Exp Int)
(Z :. i :. j) = unlift shp :: (Z :. Exp Int :. Exp Int)
rs = lift (Z :. All :.) (unlift i)
cs = (lift (Z :.) (unlift j)) (:. All)
in the \$ A.fold1All (+) \$ A.zipWith (+) (flatten (slice arr
rs)) (flatten (slice brr cs))
in A.generate (lift (Z :. rowsA :. colsB)) dotProd