[Haskell-cafe] A Performance Puzzle

Sat Aug 4 15:34:00 UTC 2018

Hi Claude,

Building with llvm is an excellent idea.  I see the same 4x performance
improvement that you noted.  I also tried building the benchmarks with
and without the '-threaded' option and saw no difference in run times. 
Perhaps the threaded gc issues are behind us.

I'll incorporate your changes into the repo on GitHub.  Thank you.

If the obvious deficiencies have been fixed by building with llvm, then
the next place to look for improvement is the matrix multiplication.  I
profiled the test last night and have not digested the results, but
matrixMultiply still stands out as taking a lot of time.

Best Wishes,

Greg

On 8/2/18 11:29 PM, Claude Heiland-Allen wrote:
> Hi Gregory,
>
> On 03/08/18 01:16, Gregory Wright wrote:
>> That's an interesting point.  Could the generation of the random
>> matrix be that slow?  Something to check.
> It's not that it's slow by itself, I think it's that the CAF mVals
> ::[Double] is retained, taking ~40MB of heap which slows down GC.
>
> Using criterion's `env` isn't so hard, and gets a much nicer looking
> heap profile graph.  See new benchmark code attached.
>
> Graphs:
> https://mathr.co.uk/tmp/luSolve-bench.svg
> https://mathr.co.uk/tmp/luSolve-bench-env.svg
>
>>
>> On 8/2/18 7:47 PM, Vanessa McHale wrote:
>>> Looking at your benchmarks you may be benchmarking the wrong thing.
>>> The function you are benchmarking is runLUFactor, which generates
>>> random matrices in addition to factoring them.
>>>
>>> On 08/02/2018 05:27 PM, Gregory Wright wrote:
>>>> benchmarking LUSolve/luFactor 1000 x 1000 matrix
>>>>
>>>> time                 1.940 s    (1.685 s .. 2.139 s)
>>>>                      0.998 R²   (0.993 R² .. 1.000 R²)
>>>> mean                 1.826 s    (1.696 s .. 1.880 s)
>>>>
> I started at mean 1.50s with your code.
>>>>
>>>> std dev              93.63 ms   (5.802 ms .. 117.8 ms)
>>>> variance introduced by outliers: 19% (moderately inflated)
>>>>
>
> Making the `env` change and compiling with -fllvm (as suggested in
> #haskell on irc.freenode.net, for a 4x speed boost) brought my time
> for that benchmark to mean 257ms. +RTS -s tells me productivity is
> 99.1%, which is pretty high.
> I compiled the benchmark by hand for best speed, as cabal seems to add
> -prof which slows the bench down slightly.
> I also compiled without -threaded, because the code isn't parallelized
> afaict, and parallel GC can be a bottleneck (is this still true?).
>
>
> Claude
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180804/69c527ef/attachment.html>