[Haskell-cafe] Naive matrix multiplication with Accelerate
Trevor L. McDonell
tmcdonell at cse.unsw.edu.au
Tue Dec 4 06:21:35 CET 2012
As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere…
At a simpler example, here is the outer product of two vectors [1].
vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
where
n = A.size xs
m = A.size ys
xsRepl = A.replicate (lift (Z :. All :. m )) xs
ysRepl = A.replicate (lift (Z :. n :. All)) ys
If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] + … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.
I do the same sort of trick with the n-body demo to get all n^2 particle interactions.
-Trev
[1]: http://en.wikipedia.org/wiki/Outer_product#Vector_multiplication
On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel at uwaterloo.ca> wrote:
> Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.
>
> Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.
>
> - Clark
>
>
> On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell at cse.unsw.edu.au> wrote:
> Hi Clark,
>
> The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.
>
> Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].
>
>
> import Data.Array.Accelerate as A
>
> type Matrix a = Array DIM2 a
>
> matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
> matMul arr brr
> = A.fold (+) 0
> $ A.zipWith (*) arrRepl brrRepl
> where
> Z :. rowsA :. _ = unlift (shape arr) :: Z :. Exp Int :. Exp Int
> Z :. _ :. colsB = unlift (shape brr) :: Z :. Exp Int :. Exp Int
>
> arrRepl = A.replicate (lift $ Z :. All :. colsB :. All) arr
> brrRepl = A.replicate (lift $ Z :. rowsA :. All :. All) (A.transpose brr)
>
>
> If you use github sources rather than the hackage package, those intermediate replicates will get fused away.
>
>
> Cheers,
> -Trev
>
> [1] http://www.cse.unsw.edu.au/~chak/papers/KCLPL10.html
>
>
>
>
> On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel at uwaterloo.ca> wrote:
>
>> Hello cafe,
>>
>> I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.
>>
>> Here's what I have so far:
>>
>> import qualified Data.Array.Accelerate as A
>> import Data.Array.Accelerate ( (:.)(..)
>> , Acc
>> , Vector
>> , Scalar
>> , Elt
>> , fold
>> , slice
>> , constant
>> , Array
>> , Z(..), DIM1, DIM2
>> , fromList
>> , All(..)
>> , generate
>> , lift, unlift
>> , shape
>> )
>> import Data.Array.Accelerate.Interpreter ( run )
>>
>> dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
>> dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys
>>
>> type Matrix a = Array DIM2 a
>>
>> getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
>> getRow n mat = slice mat . constant $ Z :. n :. All
>>
>> -- Naive matrix multiplication:
>> --
>> -- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
>> matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
>> matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
>> \ix ->
>> let (Z :. i :. j) = unlift ix
>> in getRow i a `dotP` getRow j b
>> where
>> b = A.transpose b' -- I assume row indexing is faster than column indexing...
>> (Z :. nrows :. _ ) = unlift $ shape a
>> (Z :. _ :. ncols) = unlift $ shape b
>>
>>
>> This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.
>>
>> So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?
>>
>> Thanks for your help,
>> - Clark
>>
>> [1] http://hackage.haskell.org/package/accelerate
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20121204/ea9a3d9f/attachment.htm>
More information about the Haskell-Cafe
mailing list