[Haskell-cafe] Naive matrix multiplication with Accelerate

Tue Dec 4 06:21:35 CET 2012

As far as I am aware, the only description is in the Repa paper. I you are right, it really should be explained properly somewhere… 

At a simpler example, here is the outer product of two vectors [1].

vvProd :: (IsNum e, Elt e) => Acc (Vector e) -> Acc (Vector e) -> Acc (Matrix e)
vvProd xs ys = A.zipWith (*) xsRepl ysRepl
  where
    n           = A.size xs
    m           = A.size ys

    xsRepl      = A.replicate (lift (Z :. All :. m  )) xs
    ysRepl      = A.replicate (lift (Z :. n   :. All)) ys

If we then `A.fold (+) 0` the matrix, it would reduce along each row producing a vector. So the first element of that vector is going to be calculated as (xs[0] * ys[0] + xs[0] * ys[1] +  … xs[0] * ys[m-1]). That's the idea we want for our matrix multiplication … but I agree, it is difficult for me to visualise as well.

I do the same sort of trick with the n-body demo to get all n^2 particle interactions.

-Trev

 [1]: http://en.wikipedia.org/wiki/Outer_product#Vector_multiplication

On 04/12/2012, at 3:41 AM, Clark Gaebel <cgaebel at uwaterloo.ca> wrote:

> Ah. I see now. Silly Haskell making inefficient algorithms hard to write and efficient ones easy. It's actually kind of annoying when learning, but probably for the best.
> 
> Is there a good write-up of the algorithm you're using somewhere? The Repa paper was very brief in its explaination, and I'm having trouble visualizing the mapping of the 2D matricies into 3 dimensions.
> 
>   - Clark
> 
> 
> On Mon, Dec 3, 2012 at 2:06 AM, Trevor L. McDonell <tmcdonell at cse.unsw.edu.au> wrote:
> Hi Clark,
> 
> The trick is that most accelerate operations work over multidimensional arrays, so you can still get around the fact that we are limited to flat data-parallelism only.
> 
> Here is matrix multiplication in Accelerate, lifted from the first Repa paper [1].
> 
> 
> import Data.Array.Accelerate as A
> 
> type Matrix a = Array DIM2 a
> 
> matMul :: (IsNum e, Elt e) => Acc (Matrix e) -> Acc (Matrix e) -> Acc (Matrix e)
> matMul arr brr
>   = A.fold (+) 0
>   $ A.zipWith (*) arrRepl brrRepl
>   where
>     Z :. rowsA :. _     = unlift (shape arr)    :: Z :. Exp Int :. Exp Int
>     Z :. _     :. colsB = unlift (shape brr)    :: Z :. Exp Int :. Exp Int
> 
>     arrRepl             = A.replicate (lift $ Z :. All   :. colsB :. All) arr
>     brrRepl             = A.replicate (lift $ Z :. rowsA :. All   :. All) (A.transpose brr)
> 
> 
> If you use github sources rather than the hackage package, those intermediate replicates will get fused away.
> 
> 
> Cheers,
> -Trev
> 
>  [1] http://www.cse.unsw.edu.au/~chak/papers/KCLPL10.html
> 
> 
> 
> 
> On 03/12/2012, at 5:07 PM, Clark Gaebel <cgaebel at uwaterloo.ca> wrote:
> 
>> Hello cafe,
>> 
>> I've recently started learning about cuda and hetrogenous programming, and have been using accelerate [1] to help me out. Right now, I'm running into trouble in that I can't call parallel code from sequential code. Turns out GPUs aren't exactly like Repa =P.
>> 
>> Here's what I have so far:
>> 
>> import qualified Data.Array.Accelerate as A
>> import Data.Array.Accelerate ( (:.)(..)
>>                              , Acc
>>                              , Vector
>>                              , Scalar
>>                              , Elt
>>                              , fold
>>                              , slice
>>                              , constant
>>                              , Array
>>                              , Z(..), DIM1, DIM2
>>                              , fromList
>>                              , All(..)
>>                              , generate
>>                              , lift, unlift
>>                              , shape
>>                              )
>> import Data.Array.Accelerate.Interpreter ( run )
>> 
>> dotP :: (Num a, Elt a) => Acc (Vector a) -> Acc (Vector a) -> Acc (Scalar a)
>> dotP xs ys = fold (+) 0 $ A.zipWith (*) xs ys
>> 
>> type Matrix a = Array DIM2 a
>> 
>> getRow :: Elt a => Int -> Acc (Matrix a) -> Acc (Vector a)
>> getRow n mat = slice mat . constant $ Z :. n :. All
>> 
>> -- Naive matrix multiplication:
>> --
>> -- index (i, j) is equal to the ith row of 'a' `dot` the jth row of 'b'
>> matMul :: A.Acc (Matrix Double) -> A.Acc (Matrix Double) -> A.Acc (Matrix Double)
>> matMul a b' = A.generate (constant $ Z :. nrows :. ncols) $
>>                 \ix ->
>>                   let (Z :. i :. j) = unlift ix
>>                    in getRow i a `dotP` getRow j b
>>     where
>>         b = A.transpose b' -- I assume row indexing is faster than column indexing...
>>         (Z :. nrows :.   _  ) = unlift $ shape a
>>         (Z :.   _   :. ncols) = unlift $ shape b
>> 
>> 
>> This, of course, gives me errors right now because I'm calling getRow and dotP from within the generation function, which expects Exp[ression]s, not Acc[elerated computation]s.
>> 
>> So maybe I need to replace that line with an inner for loop? Is there an easy way to do that with Accelerate?
>> 
>> Thanks for your help,
>>   - Clark
>> 
>> [1] http://hackage.haskell.org/package/accelerate
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20121204/ea9a3d9f/attachment.htm>