[Haskell] Programming language shootout (completing the Haskell
entry)
Jan-Willem Maessen - Sun Labs East
Janwillem.Maessen at Sun.COM
Tue Mar 30 09:49:16 EST 2004
As another data point, when I was writing the run-time system both for
pH (on an 8-way Sun) and later for Eager Haskell (on x86) I
experimented with simple prefetching of various sorts. Two things
seemed to be moderately effective:
* The heap was organized into "chunks" which were parceled out among
running threads. By writing into the pages of a chunk when it is
allocated, the necessary TLB entries can be pulled in early. This
seemed to be a universal win on both architectures. On a system with
full software TLB miss handling it would be a wash, I expect.
* Writing a line or two ahead in the heap. The idea here is to get
the appropriate lines into the cache (and dirty) before they actually
saw heavy use. Alas, this can cause fenceposting problems when you
get near the end of a chunk (it's not safe to write past a chunk
boundary). A processor which buffers writes doesn't benefit
enormously, either; we can buffer the last cache line in the heap
until it has been completely written, and never fetch from memory at
all. There were no big wins here in my experience.
One has to be careful about the semantics of prefetch instructions, by
the way. At least on SPARC they probably don't do what you want; if
my memory serves me correctly, there is a separate prefetch cache
that's connected to the floating-point load/store unit, and what you
actually want for most Haskell-ish code would be a non-blocking load
instruction instead. On x86 the prefetch instructions are part of the
SIMD floating-point instruction set, and might (or might not) have
similar caveats.
The prefetch builtins for gcc are a pretty recent innovation, by the
way. I would have loved to have them back when I was experimenting.
Of course, you'd want to make sure you got an appropriate instruction
out of the compiler.
Andrew Cheadle wrote:
> Personally I think looking at 'cache-concious' layout of specific data
> structures within the allocator and depth first vs breadth first copying
> within the collector provide more realistic optimisation opportunities. I
> believe there's been a fair amount of work on this, although specific papers
> elude me at the mo.
I would agree with this---though the conclusion from the GC literature
seems to be that no particular copying strategy is best for all
applications.
-Jan-Willem Maessen
More information about the Haskell
mailing list