[Haskell-cafe] repa parallelization results

Anatoly Yakovenko aeyakovenko at gmail.com
Sun Mar 15 18:44:00 UTC 2015


I am not really focusing on matrix multiply specifically.  So the real
problem is that the implementation using parallelized functions is
slower then the sequential one, and adding more threads makes it
barely as fast as the sequential one.

So why would i ever use the parallelized versions?


On Sat, Mar 14, 2015 at 9:24 AM, Carter Schonwald
<carter.schonwald at gmail.com> wrote:
> http://www.cs.utexas.edu/users/flame/pubs/blis3_ipdps14.pdf this paper
> (among many others by the blis project) articulates some of the ideas i
> allude to pretty well (with pictures!)
>
> On Sat, Mar 14, 2015 at 12:21 PM, Carter Schonwald
> <carter.schonwald at gmail.com> wrote:
>>
>> dense matrix product is not an algorithm that makes sense in repa's
>> execution model,
>> in square matrix multiply of two N x N matrices, each result entry depends
>> on 2n values total across the  two input matrices.
>> even then, thats actually the wrong way to parallelize dense matrix
>> product! its worth reading the papers about goto blas and the more recent
>> blis project. a high performance dense matrix multipy winds up needing to do
>> some nested array parallelism with mutable updates to have efficient sharing
>> of sub computations!
>>
>>
>>
>> On Fri, Mar 13, 2015 at 9:03 PM, Anatoly Yakovenko <aeyakovenko at gmail.com>
>> wrote:
>>>
>>> you think the backed would make any difference?  this seems like a
>>> runtime issue to me, how are the threads scheduled by the ghc runtime?
>>>
>>> On Fri, Mar 13, 2015 at 4:58 PM, KC <kc1956 at gmail.com> wrote:
>>> > How is the LLVM?
>>> >
>>> > --
>>> > --
>>> >
>>> > Sent from an expensive device which will be obsolete in a few months!
>>> > :D
>>> >
>>> > Casey
>>> >
>>> >
>>> > On Mar 13, 2015 10:24 AM, "Anatoly Yakovenko" <aeyakovenko at gmail.com>
>>> > wrote:
>>> >>
>>> >> https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
>>> >>
>>> >>
>>> >> so i am seeing basically results with N4 that are as good as using
>>> >> sequential computation on my macbook for the matrix multiply
>>> >> algorithm.  any idea why?
>>> >>
>>> >> Thanks,
>>> >> Anatoly
>>> >> _______________________________________________
>>> >> Haskell-Cafe mailing list
>>> >> Haskell-Cafe at haskell.org
>>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>
>>
>


More information about the Haskell-Cafe mailing list