Low-level array performance
Dan Doel
dan.doel at gmail.com
Mon Jun 16 15:51:31 EDT 2008
Greetings,
Recently, due to scattered complaints I'd seen on the internet, I set about to
rewrite the fannkuch [1] benchmark on the Great Computer Language Shootout.
The current entry uses Ptr/Addr#, malloc, etc. so it's not particularly
representative of code one would actually write in Haskell these days. Over
the past few days, I've written several versions of the benchmark, and
collaborated a bit with local speed guru Don Stewart, but an entry that bests
the current entry does not seem to be in the cards currently. So, I thought
I'd write about some of the issues, and hopefully get some encouraging news
about the situation.
Issue 1: STUArrays aren't optimized as fully as one might ideally expect.
This isn't so much of an issue, I suppose, except possibly for an environment
like the shootout. I wrote versions of the benchmark (and particular pieces
of the overall benchmark, as well) for both STUArrays, and MUArrays from
dons' new uvector [2] library. The STUArray versions are consistently slower,
and a look at the core reveals that STUArray code contains significantly more
boxing than uvector. This isn't a big deal, as there's no reason not to use
uvector, aside from it not being blessed by being in the GHC distribution.
However, I thought I'd bring it up in case anyone hasn't heard of the library
yet.
Issue 2: Reading from/writing to a MutableByteArray# is slower than an Addr#
This is, I think, the crux of the issue. The main content of the benchmark is
reversing/shifting items in an array. To get a somewhat easier look at the
core, I boiled things down to a benchmark that just reverses a small array
many times. In the interest of further reducing things, I wrote a version of
the benchmark that uses raw Addr#s, and a version that uses raw
MutableByteArray#s. I've attached both versions.
The fannkuch benchmark (input 11) with byte arrays runs in around 12 seconds
on my machine. To get the reversal benchmark to around that time, I can tell
it to, for instance, reverse a 10-element array 250 million times. Those same
inputs to the Addr# version of the reverse benchmark result in a runtime of 7
to 8 seconds, so there's a significant discrepancy at the level the benchmark
actually runs at (the current fannkuch entry that uses Addr#/Ptr takes around
9 seconds, so there's other stuff going on in the benchmark evening them out,
but this is likely the source of the difference between the two).
uvector performance on the reversal benchmark is comparable with
MutableByteArray#, so the practical slowdown using a library for actual
public consumption seems to be in the fact that it's based on byte arrays
(STUArray again falls further behind by retaining too many boxes).
So, to wrap things up: dons and I were rather curious about the performance
differences between MutableByteArray# and Addr#, as one might expect them to
be comparable, both being low-level, raw pointer type structures; the code for
using the two is nearly identical.
(If one were to raise an Issue 3, it'd likely be that, even with Addr# and
malloc, which is currently the fastest Haskell solution, we're not that fast.
A glance at that shootout list reveals that we're beat by a myriad of
Java-based languages, Python (Psyco), Ada, Ocaml, Eiffel, Lisp (SBCL), Clean,
and of course C++, C, Fortran, etc. Is there any hope of getting performance
improvements that will push us up in that list? I've heard that people are
working on improved code generation, but I don't know if that can help with
this sort of thing (or if it's more of a runtime system area).
While writing this mail, it came to mind that this is also a potential source
of slowness in the sorting 'library' I've been writing for uvector. I already
have some routines that are competitive with/beat qsort from glibc, but GNU's
C++ STL introsort implementation has remained untouched. However, if
reads/writes simply take twice as much time in Haskell as they do in C++, my
results aren't as surprising.)
Anyhow, thanks for any light that can be shed on the subject.
1: http://shootout.alioth.debian.org/gp4/benchmark.php?test=fannkuch
2: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/uvector
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ByteArr.hs
Type: text/x-hssrc
Size: 1413 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20080616/4aea9bc1/ByteArr.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Ptr.hs
Type: text/x-hssrc
Size: 867 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20080616/4aea9bc1/Ptr.bin
More information about the Glasgow-haskell-users
mailing list