[Haskell-cafe] Help optimising a Haskell program

Tue Mar 22 16:49:01 CET 2011

On Tue, Mar 22, 2011 at 1:11 AM, David MacIver <david at drmaciver.com> wrote:

> On 22 March 2011 02:00, Jesper Louis Andersen
> <jesper.louis.andersen at gmail.com> wrote:
> > On Tue, Mar 22, 2011 at 00:59, David MacIver <david at drmaciver.com>
> wrote:
> >
> >> It's for rank aggregation - taking a bunch of partial rankings of some
> >> items from users and turning them into an overall ranking (aka "That
> >> thing that Hammer Principle does").
> >
> > Two questions immediately begs themselves:
> >
> > * Can we go parallel? :P
>
> Maybe. A lot of this is inherently sequential. Some bits are
> parallelisable, but my initial attempts at exploiting that made very
> little performance difference. I'd rather exhaust what I can from
> single-core performance first.
>
> > * What does +RTS -s -RTS say? Specifically, what is the current
> > productivity?
>
> ./rank +RTS -s
>   3,466,696,368 bytes allocated in the heap
>     212,888,240 bytes copied during GC
>      51,949,568 bytes maximum residency (10 sample(s))
>       5,477,016 bytes maximum slop
>             105 MB total memory in use (0 MB lost due to fragmentation)
>
>  Generation 0:  6546 collections,     0 parallel,  0.93s,  0.93s elapsed
>  Generation 1:    10 collections,     0 parallel,  0.32s,  0.32s elapsed
>
>  INIT  time    0.00s  (  0.00s elapsed)
>  MUT   time    7.11s  (  7.12s elapsed)
>  GC    time    1.25s  (  1.25s elapsed)
>  EXIT  time    0.00s  (  0.00s elapsed)
>  Total time    8.37s  (  8.37s elapsed)
>
>  %GC time      15.0%  (15.0% elapsed)
>
>  Alloc rate    487,319,292 bytes per MUT second
>
>  Productivity  85.0% of total user, 85.0% of total elapsed
>
> So if I'm reading this right, my hypothesis that allocation was most
> of the cost seems to be wrong? I don't know how much of that MUT time
> is allocation, but I'd expect it to be < GC time.
>
> > Do we get an improvement with +RTS -A2m -H128m -RTS ?
> > (Force the heap to be somewhat up there from day one, perhaps try
> > -H256m.
>
> This seems to consistently give about a 0.4s improvement, which isn't
> nothing but isn't a particularly interesting chunck of 8s (actually
> it's 8.4s -> 8s). Setting it to 256M doesn't make any difference.
>

You should use criterion to make sure that your assessments of the
performance difference are grounded in statistically robust reasoning :)
 Progression may also help.

GHC 7 has a new rts option, +RTS -H -RTS that gives you a good high default
for these memory numbers.  You might give it ago, but I doubt it's going to
help much here.

One thing I noticed was that you calculate the length of elts more than once
inside aggregate.  It's probably not going to account for a big boost, but
I'd recommend doing it only once.  Something like:
  let lenelts = length elts

Then use lenelts everywhere.

Also, it seems like you use Data.Set to sort and uniquify the input then you
throw away the Set, that's potentially costly.  There are probably better
types for aggregate than [[a]] -> [a].  You use Arrays, but it's possible
that a different vector library such as Data.Vector gives better performance
here.  Or, perhaps you want that instead of lists everywhere.  For example,
you build an index and a reverse index.  It seems like with an array or
vector you could eliminate at least one index and have O(1) lookup time
instead of Data.Map's O(log n) lookup time.

I hope that helps,
Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110322/b5b608f5/attachment.htm>