[Haskell] Programming language shootout (completing the Haskell entry)

Duncan Coutts duncan at coutts.uklinux.net
Tue Mar 30 12:55:21 EST 2004

On Tue, 2004-03-30 at 11:51, Simon Marlow wrote:
> I've done some cache profiling of GHC's code myself, and Nick Nethercote
> did some very detailed measurements a while back (see his recent post
> for details).
> The upshot of what he found is that we could benefit from some
> prefetching, perhaps on the order of 10-20%.  Particularly prefetching
> in the allocation area during evaluation, to ensure that memory about to
> be written to is in the cache, and similar techniques during GC could
> help.  However, actually taking advantage of this is quite hard -
> prefetching instructions aren't standard, and even when they are getting
> any benefit can depend on cache architecture and other effects which
> vary between processor families.  Getting things wrong often results in
> a slowdown.  It's just too brittle.

When compiling via gcc, there's the __builtin_prefetch function:

(about 2/3rds of the way down the page)

It provides semi-portable prefetching on supported targets. That is
cpu's that support prefetch instructions with some sane common semantics
(non-faulting etc). It has optional parameters to control read/write and
expected locality.

see also:

I didn't read it very carefully, but it's not clear (on the x86 cpus) if
there is a prefetch instruction that is common between the amd & intel
flavours, ie would a ghc binary with prefetch be portable between

Of course for ghc's native code generator, you're on your own. :-(

On the other hand, it might not be necessary to generate prefetch
instructions, much of the speedup might be obtainable but just adding
them in the allocator & gc parts of the rts. Since the rts is written in
GNU C, you could use gcc's prefetch function.


More information about the Haskell mailing list