help wrt semantics / primops for pure prefetches

Edward Kmett ekmett at
Thu Nov 27 10:20:49 UTC 2014

My general experience with prefetching is that it is almost never a win
when done just on trees, as in the usual mark-sweep or copy-collection
garbage collector walk. Why? Because the time from the time you prefetch to
the time you use the data is too variable. Stack disciplines and prefetch
don't mix nicely.

If you want to see a win out of it you have to free up some of the ordering
of your walk, and tweak your whole application to support it. e.g. if you
want to use prefetching in garbage collection, the way to do it is to
switch from a strict stack discipline to using a small fixed-sized queue on
the output of the stack, then feed prefetch on the way into the queue
rather than as you walk the stack. That paid out for me as a 10-15% speedup
last time I used it after factoring in the overhead of the extra queue. Not
too bad for a weekend project. =)

Without that sort of known lead-in time, it works out that prefetching is
usually a net loss or vanishes into the noise.

As for the array ops, davean has a couple of cases w/ those for which the
prefetching operations are a 20-25% speedup, which is what motivated Carter
to start playing around with these again. I don't know off hand how easily
those can be turned into public test cases though.


On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <marlowsd at> wrote:

> I haven't been watching this, but I have one question: does prefetching
> actually *work*?  Do you have benchmarks (or better still, actual
> library/application code) that show some improvement?  I admit to being
> slightly sceptical - when I've tried using prefetching in the GC it has
> always been a struggle to get something that shows an improvement, and even
> when I get things tuned on one machine it typically makes things slower on
> a different processor.  And that's in the GC, doing it at the Haskell level
> should be even harder.
> Cheers,
> Simon
> On 22/11/2014 05:43, Carter Schonwald wrote:
>> Hey Everyone,
>> in
>> and
>> is some preliminary work to fix up how the pure versions of the prefetch
>> primops work is laid out and prototyped.
>> However, while it nominally fixes up some of the problems with how the
>> current pure prefetch apis are fundamentally borken,  the simple design
>> in D350 isn't quite ideal, and i sketch out some other ideas in the
>> associated ticket #9353
>> I'd like to make sure  pure prefetch in 7.10 is slightly less broken
>> than in 7.8, but either way, its pretty clear that working out the right
>> fixed up design wont happen till 7.12. Ie, whatever makes 7.10, there
>> WILL have to be breaking changes to fix those primops for 7.12
>> thanks and any feedback / thoughts appreciated
>> -Carter
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at
>>  _______________________________________________
> ghc-devs mailing list
> ghc-devs at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ghc-devs mailing list