[Haskell-cafe] Help optimising a Haskell program
david at drmaciver.com
Tue Mar 22 09:11:07 CET 2011
On 22 March 2011 02:00, Jesper Louis Andersen
<jesper.louis.andersen at gmail.com> wrote:
> On Tue, Mar 22, 2011 at 00:59, David MacIver <david at drmaciver.com> wrote:
>> It's for rank aggregation - taking a bunch of partial rankings of some
>> items from users and turning them into an overall ranking (aka "That
>> thing that Hammer Principle does").
> Two questions immediately begs themselves:
> * Can we go parallel? :P
Maybe. A lot of this is inherently sequential. Some bits are
parallelisable, but my initial attempts at exploiting that made very
little performance difference. I'd rather exhaust what I can from
single-core performance first.
> * What does +RTS -s -RTS say? Specifically, what is the current
./rank +RTS -s
3,466,696,368 bytes allocated in the heap
212,888,240 bytes copied during GC
51,949,568 bytes maximum residency (10 sample(s))
5,477,016 bytes maximum slop
105 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 6546 collections, 0 parallel, 0.93s, 0.93s elapsed
Generation 1: 10 collections, 0 parallel, 0.32s, 0.32s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 7.11s ( 7.12s elapsed)
GC time 1.25s ( 1.25s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 8.37s ( 8.37s elapsed)
%GC time 15.0% (15.0% elapsed)
Alloc rate 487,319,292 bytes per MUT second
Productivity 85.0% of total user, 85.0% of total elapsed
So if I'm reading this right, my hypothesis that allocation was most
of the cost seems to be wrong? I don't know how much of that MUT time
is allocation, but I'd expect it to be < GC time.
> Do we get an improvement with +RTS -A2m -H128m -RTS ?
> (Force the heap to be somewhat up there from day one, perhaps try
This seems to consistently give about a 0.4s improvement, which isn't
nothing but isn't a particularly interesting chunck of 8s (actually
it's 8.4s -> 8s). Setting it to 256M doesn't make any difference.
More information about the Haskell-Cafe