help wrt semantics / primops for pure prefetches

Edward Kmett ekmett at gmail.com
Fri Nov 28 16:39:20 UTC 2014


The main takeaway I had from my work with prefetching was that if you can
shove things into a fixed-sized queue and prefetch on the way into the
queue instead of doing it just to sort of kickstart the next element during
a tree traversal that is going to be demanded too fast to take full
advantage of the latency, then you can smooth out a lot of the cross system
variance.

It is just incredibly invasive. =(

Re: doing prefetching in the mark phase, I just skimmed and found
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9090&rep=rep1&type=pdf
takes which appears to take a similar approach.

-Edward

On Fri, Nov 28, 2014 at 3:42 AM, Simon Marlow <marlowsd at gmail.com> wrote:

> Thanks for this.  In the copying GC I was using prefetching during the
> scan phase, where you do have a pretty good tunable knob for how far ahead
> you want to prefetch.  The only variable is the size of the objects being
> copied, but most tend to be in the 2-4 words range.  I did manage to get
> 10-15% speedups with optimal tuning, but it was a slowdown on a different
> machine or with wrong tuning, which is why GHC doesn't have any of this
> right now.
>
> Glad to hear this can actually be used to get real speedups in Haskell, I
> will be less sceptical from now on :)
>
> Cheers,
> Simon
>
> On 27/11/2014 10:20, Edward Kmett wrote:
>
>> My general experience with prefetching is that it is almost never a win
>> when done just on trees, as in the usual mark-sweep or copy-collection
>> garbage collector walk. Why? Because the time from the time you prefetch
>> to the time you use the data is too variable. Stack disciplines and
>> prefetch don't mix nicely.
>>
>> If you want to see a win out of it you have to free up some of the
>> ordering of your walk, and tweak your whole application to support it.
>> e.g. if you want to use prefetching in garbage collection, the way to do
>> it is to switch from a strict stack discipline to using a small
>> fixed-sized queue on the output of the stack, then feed prefetch on the
>> way into the queue rather than as you walk the stack. That paid out for
>> me as a 10-15% speedup last time I used it after factoring in the
>> overhead of the extra queue. Not too bad for a weekend project. =)
>>
>> Without that sort of known lead-in time, it works out that prefetching
>> is usually a net loss or vanishes into the noise.
>>
>> As for the array ops, davean has a couple of cases w/ those for which
>> the prefetching operations are a 20-25% speedup, which is what motivated
>> Carter to start playing around with these again. I don't know off hand
>> how easily those can be turned into public test cases though.
>>
>> -Edward
>>
>> On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <marlowsd at gmail.com
>> <mailto:marlowsd at gmail.com>> wrote:
>>
>>     I haven't been watching this, but I have one question: does
>>     prefetching actually *work*?  Do you have benchmarks (or better
>>     still, actual library/application code) that show some improvement?
>>     I admit to being slightly sceptical - when I've tried using
>>     prefetching in the GC it has always been a struggle to get something
>>     that shows an improvement, and even when I get things tuned on one
>>     machine it typically makes things slower on a different processor.
>>     And that's in the GC, doing it at the Haskell level should be even
>>     harder.
>>
>>     Cheers,
>>     Simon
>>
>>
>>     On 22/11/2014 05:43, Carter Schonwald wrote:
>>
>>         Hey Everyone,
>>         in
>>         https://ghc.haskell.org/trac/__ghc/ticket/9353
>>         <https://ghc.haskell.org/trac/ghc/ticket/9353>
>>         and
>>         https://phabricator.haskell.__org/D350
>>         <https://phabricator.haskell.org/D350>
>>
>>         is some preliminary work to fix up how the pure versions of the
>>         prefetch
>>         primops work is laid out and prototyped.
>>
>>         However, while it nominally fixes up some of the problems with
>>         how the
>>         current pure prefetch apis are fundamentally borken,  the simple
>>         design
>>         in D350 isn't quite ideal, and i sketch out some other ideas in
>> the
>>         associated ticket #9353
>>
>>         I'd like to make sure  pure prefetch in 7.10 is slightly less
>> broken
>>         than in 7.8, but either way, its pretty clear that working out
>>         the right
>>         fixed up design wont happen till 7.12. Ie, whatever makes 7.10,
>>         there
>>         WILL have to be breaking changes to fix those primops for 7.12
>>
>>         thanks and any feedback / thoughts appreciated
>>         -Carter
>>
>>
>>         _________________________________________________
>>         ghc-devs mailing list
>>         ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>>         http://www.haskell.org/__mailman/listinfo/ghc-devs
>>         <http://www.haskell.org/mailman/listinfo/ghc-devs>
>>
>>     _________________________________________________
>>     ghc-devs mailing list
>>     ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>>     http://www.haskell.org/__mailman/listinfo/ghc-devs
>>     <http://www.haskell.org/mailman/listinfo/ghc-devs>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20141128/962eab26/attachment.html>


More information about the ghc-devs mailing list