[Haskell-cafe] repa parallelization results

Anatoly Yakovenko aeyakovenko at gmail.com
Thu Jan 14 20:57:12 UTC 2016


Not sure what changed, but after rerunning it I get expected results:

anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N2
benchmarking P
time                 1.791 s    (1.443 s .. 2.304 s)
                     0.991 R²   (0.974 R² .. 1.000 R²)
mean                 1.803 s    (1.750 s .. 1.855 s)
std dev              90.06 ms   (0.0 s .. 90.90 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.225 s    (2.685 s .. 3.837 s)
                     0.996 R²   (0.985 R² .. 1.000 R²)
mean                 3.033 s    (2.857 s .. 3.142 s)
std dev              165.0 ms   (0.0 s .. 188.7 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html
anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N4
benchmarking P
time                 1.851 s    (1.326 s .. 2.316 s)
                     0.990 R²   (0.964 R² .. 1.000 R²)
mean                 1.784 s    (1.693 s .. 1.901 s)
std dev              106.3 ms   (0.0 s .. 119.8 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.329 s    (3.041 s .. 3.944 s)
                     0.996 R²   (0.993 R² .. 1.000 R²)
mean                 3.173 s    (3.100 s .. 3.244 s)
std dev              119.6 ms   (0.0 s .. 121.9 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html
anatolys-MacBook:rbm anatolyy$  dist/build/proto/proto +RTS -N
benchmarking P
time                 1.717 s    (1.654 s .. 1.830 s)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.717 s    (1.701 s .. 1.728 s)
std dev              16.64 ms   (0.0 s .. 19.20 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking S
time                 3.127 s    (3.079 s .. 3.222 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 3.105 s    (3.094 s .. 3.116 s)
std dev              18.12 ms   (543.9 as .. 18.50 ms)
variance introduced by outliers: 19% (moderately inflated)

perf log written to dist/perf-mmult.html



On Thu, Jan 14, 2016 at 11:22 AM Thomas Miedema <thomasmiedema at gmail.com>
wrote:

> To avoid any confusion, this was a reply to the following email:
>
>
> On Fri, Mar 13, 2015 at 6:23 PM, Anatoly Yakovenko <aeyakovenko at gmail.com>
>  wrote:
>
>> https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8
>>
>>
>> so i am seeing basically results with N4 that are as good as using
>> sequential computation on my macbook for the matrix multiply
>> algorithm.  any idea why?
>>
>> Thanks,
>> Anatoly
>>
>
> On Thu, Jan 14, 2016 at 8:19 PM, Thomas Miedema <thomasmiedema at gmail.com>
> wrote:
>
>> Anatoly: I also ran your benchmark, and can not reproduce your findings.
>>
>> Note that GHC does not make effective use of hyperthreads (
>> https://ghc.haskell.org/trac/ghc/ticket/9221#comment:12). So don't use
>> -N4 when you have only a dual core machine. Maybe that's why you were
>> getting bad results? I also notice a `NaN` in one of your timing results. I
>> don't know how that is possible, or if it affected your results. Could you
>> try running your benchmark again, but this time with -N2?
>>
>> On Sat, Mar 14, 2015 at 5:21 PM, Carter Schonwald <
>> carter.schonwald at gmail.com> wrote:
>>
>>> dense matrix product is not an algorithm that makes sense in repa's
>>> execution model,
>>>
>>
>> Matrix multiplication is the first example in the first repa paper:
>> http://benl.ouroborus.net/papers/repa/repa-icfp2010.pdf. Look at figures
>> 2 and 7.
>>
>>     "we measured very good absolute speedup, ×7.2 for 8 cores, on
>> multicore hardware"
>>
>> Doing a quick experiment with 2 threads (my laptop doesn't have more
>> cores):
>>
>> $ cabal install repa-examples    # I did not bother with `-fllvm`
>> ...
>>
>> $ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204
>> elapsedTimeMS   = 6491
>>
>> $ ~/.cabal/bin/repa-mmult -random 1024 1024 -random 1024 1204 +RTS -N2
>> elapsedTimeMS   = 3393
>>
>> This is with GHC 7.10.3 and repa-3.4.0.1 (and dependencies from
>> http://www.stackage.org/snapshot/lts-3.22)
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160114/f2467934/attachment.html>


More information about the Haskell-Cafe mailing list