<p dir="ltr">Read what I linked.  <br>

You are benchmarking repa for exactly the pessimal workload that it is bad at. </p>

<p dir="ltr">Repa is for point wise parallel and local convolution parallel programs.  The way repa can express matrix multiplication is exactly the worst way to implement a parallel matrix mult.  Like, pretty pessimal wrt a memory traffic / communication complexity metric of performance.  </p>

<p dir="ltr">Benchmark something like image blur algorithms and repa will really shine.  </p>

<p dir="ltr">Right now your benchmark is the repa equivalent of noticing that random access on singly linked lists is slow :)  </p>

<div class="gmail_quote">On Mar 15, 2015 2:44 PM, "Anatoly Yakovenko" <<a href="mailto:aeyakovenko@gmail.com">aeyakovenko@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I am not really focusing on matrix multiply specifically.  So the real<br>

problem is that the implementation using parallelized functions is<br>

slower then the sequential one, and adding more threads makes it<br>

barely as fast as the sequential one.<br>

<br>

So why would i ever use the parallelized versions?<br>

<br>

<br>

On Sat, Mar 14, 2015 at 9:24 AM, Carter Schonwald<br>

<<a href="mailto:carter.schonwald@gmail.com">carter.schonwald@gmail.com</a>> wrote:<br>

> <a href="http://www.cs.utexas.edu/users/flame/pubs/blis3_ipdps14.pdf" target="_blank">http://www.cs.utexas.edu/users/flame/pubs/blis3_ipdps14.pdf</a> this paper<br>

> (among many others by the blis project) articulates some of the ideas i<br>

> allude to pretty well (with pictures!)<br>

><br>

> On Sat, Mar 14, 2015 at 12:21 PM, Carter Schonwald<br>

> <<a href="mailto:carter.schonwald@gmail.com">carter.schonwald@gmail.com</a>> wrote:<br>

>><br>

>> dense matrix product is not an algorithm that makes sense in repa's<br>

>> execution model,<br>

>> in square matrix multiply of two N x N matrices, each result entry depends<br>

>> on 2n values total across the  two input matrices.<br>

>> even then, thats actually the wrong way to parallelize dense matrix<br>

>> product! its worth reading the papers about goto blas and the more recent<br>

>> blis project. a high performance dense matrix multipy winds up needing to do<br>

>> some nested array parallelism with mutable updates to have efficient sharing<br>

>> of sub computations!<br>

>><br>

>><br>

>><br>

>> On Fri, Mar 13, 2015 at 9:03 PM, Anatoly Yakovenko <<a href="mailto:aeyakovenko@gmail.com">aeyakovenko@gmail.com</a>><br>

>> wrote:<br>

>>><br>

>>> you think the backed would make any difference?  this seems like a<br>

>>> runtime issue to me, how are the threads scheduled by the ghc runtime?<br>

>>><br>

>>> On Fri, Mar 13, 2015 at 4:58 PM, KC <<a href="mailto:kc1956@gmail.com">kc1956@gmail.com</a>> wrote:<br>

>>> > How is the LLVM?<br>

>>> ><br>

>>> > --<br>

>>> > --<br>

>>> ><br>

>>> > Sent from an expensive device which will be obsolete in a few months!<br>

>>> > :D<br>

>>> ><br>

>>> > Casey<br>

>>> ><br>

>>> ><br>

>>> > On Mar 13, 2015 10:24 AM, "Anatoly Yakovenko" <<a href="mailto:aeyakovenko@gmail.com">aeyakovenko@gmail.com</a>><br>

>>> > wrote:<br>

>>> >><br>

>>> >> <a href="https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8" target="_blank">https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8</a><br>

>>> >><br>

>>> >><br>

>>> >> so i am seeing basically results with N4 that are as good as using<br>

>>> >> sequential computation on my macbook for the matrix multiply<br>

>>> >> algorithm.  any idea why?<br>

>>> >><br>

>>> >> Thanks,<br>

>>> >> Anatoly<br>

>>> >> _______________________________________________<br>

>>> >> Haskell-Cafe mailing list<br>

>>> >> <a href="mailto:Haskell-Cafe@haskell.org">Haskell-Cafe@haskell.org</a><br>

>>> >> <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>

>>> _______________________________________________<br>

>>> Haskell-Cafe mailing list<br>

>>> <a href="mailto:Haskell-Cafe@haskell.org">Haskell-Cafe@haskell.org</a><br>

>>> <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a><br>

>><br>

>><br>

><br>

</blockquote></div>