[Haskell] Programming language shootout (completing the
Haskell entry)
Duncan Coutts
duncan at coutts.uklinux.net
Tue Mar 30 12:55:21 EST 2004
On Tue, 2004-03-30 at 11:51, Simon Marlow wrote:
> I've done some cache profiling of GHC's code myself, and Nick Nethercote
> did some very detailed measurements a while back (see his recent post
> for details).
>
> The upshot of what he found is that we could benefit from some
> prefetching, perhaps on the order of 10-20%. Particularly prefetching
> in the allocation area during evaluation, to ensure that memory about to
> be written to is in the cache, and similar techniques during GC could
> help. However, actually taking advantage of this is quite hard -
> prefetching instructions aren't standard, and even when they are getting
> any benefit can depend on cache architecture and other effects which
> vary between processor families. Getting things wrong often results in
> a slowdown. It's just too brittle.
When compiling via gcc, there's the __builtin_prefetch function:
http://gcc.gnu.org/onlinedocs/gcc-3.3.3/gcc/Other-Builtins.html
(about 2/3rds of the way down the page)
It provides semi-portable prefetching on supported targets. That is
cpu's that support prefetch instructions with some sane common semantics
(non-faulting etc). It has optional parameters to control read/write and
expected locality.
see also:
http://gcc.gnu.org/projects/prefetch.html
I didn't read it very carefully, but it's not clear (on the x86 cpus) if
there is a prefetch instruction that is common between the amd & intel
flavours, ie would a ghc binary with prefetch be portable between
P3/P4/Athlon.
Of course for ghc's native code generator, you're on your own. :-(
On the other hand, it might not be necessary to generate prefetch
instructions, much of the speedup might be obtainable but just adding
them in the allocator & gc parts of the rts. Since the rts is written in
GNU C, you could use gcc's prefetch function.
Duncan
More information about the Haskell
mailing list