help wrt semantics / primops for pure prefetches

Edward Kmett ekmett at
Fri Nov 28 16:39:20 UTC 2014

The main takeaway I had from my work with prefetching was that if you can
shove things into a fixed-sized queue and prefetch on the way into the
queue instead of doing it just to sort of kickstart the next element during
a tree traversal that is going to be demanded too fast to take full
advantage of the latency, then you can smooth out a lot of the cross system

It is just incredibly invasive. =(

Re: doing prefetching in the mark phase, I just skimmed and found
takes which appears to take a similar approach.


On Fri, Nov 28, 2014 at 3:42 AM, Simon Marlow <marlowsd at> wrote:

> Thanks for this.  In the copying GC I was using prefetching during the
> scan phase, where you do have a pretty good tunable knob for how far ahead
> you want to prefetch.  The only variable is the size of the objects being
> copied, but most tend to be in the 2-4 words range.  I did manage to get
> 10-15% speedups with optimal tuning, but it was a slowdown on a different
> machine or with wrong tuning, which is why GHC doesn't have any of this
> right now.
> Glad to hear this can actually be used to get real speedups in Haskell, I
> will be less sceptical from now on :)
> Cheers,
> Simon
> On 27/11/2014 10:20, Edward Kmett wrote:
>> My general experience with prefetching is that it is almost never a win
>> when done just on trees, as in the usual mark-sweep or copy-collection
>> garbage collector walk. Why? Because the time from the time you prefetch
>> to the time you use the data is too variable. Stack disciplines and
>> prefetch don't mix nicely.
>> If you want to see a win out of it you have to free up some of the
>> ordering of your walk, and tweak your whole application to support it.
>> e.g. if you want to use prefetching in garbage collection, the way to do
>> it is to switch from a strict stack discipline to using a small
>> fixed-sized queue on the output of the stack, then feed prefetch on the
>> way into the queue rather than as you walk the stack. That paid out for
>> me as a 10-15% speedup last time I used it after factoring in the
>> overhead of the extra queue. Not too bad for a weekend project. =)
>> Without that sort of known lead-in time, it works out that prefetching
>> is usually a net loss or vanishes into the noise.
>> As for the array ops, davean has a couple of cases w/ those for which
>> the prefetching operations are a 20-25% speedup, which is what motivated
>> Carter to start playing around with these again. I don't know off hand
>> how easily those can be turned into public test cases though.
>> -Edward
>> On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <marlowsd at
>> <mailto:marlowsd at>> wrote:
>>     I haven't been watching this, but I have one question: does
>>     prefetching actually *work*?  Do you have benchmarks (or better
>>     still, actual library/application code) that show some improvement?
>>     I admit to being slightly sceptical - when I've tried using
>>     prefetching in the GC it has always been a struggle to get something
>>     that shows an improvement, and even when I get things tuned on one
>>     machine it typically makes things slower on a different processor.
>>     And that's in the GC, doing it at the Haskell level should be even
>>     harder.
>>     Cheers,
>>     Simon
>>     On 22/11/2014 05:43, Carter Schonwald wrote:
>>         Hey Everyone,
>>         in
>>         <>
>>         and
>>         https://phabricator.haskell.__org/D350
>>         <>
>>         is some preliminary work to fix up how the pure versions of the
>>         prefetch
>>         primops work is laid out and prototyped.
>>         However, while it nominally fixes up some of the problems with
>>         how the
>>         current pure prefetch apis are fundamentally borken,  the simple
>>         design
>>         in D350 isn't quite ideal, and i sketch out some other ideas in
>> the
>>         associated ticket #9353
>>         I'd like to make sure  pure prefetch in 7.10 is slightly less
>> broken
>>         than in 7.8, but either way, its pretty clear that working out
>>         the right
>>         fixed up design wont happen till 7.12. Ie, whatever makes 7.10,
>>         there
>>         WILL have to be breaking changes to fix those primops for 7.12
>>         thanks and any feedback / thoughts appreciated
>>         -Carter
>>         _________________________________________________
>>         ghc-devs mailing list
>>         ghc-devs at <mailto:ghc-devs at>
>>         <>
>>     _________________________________________________
>>     ghc-devs mailing list
>>     ghc-devs at <mailto:ghc-devs at>
>>     <>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the ghc-devs mailing list