help wrt semantics / primops for pure prefetches
Edward Kmett
ekmett at gmail.com
Fri Nov 28 16:39:20 UTC 2014
The main takeaway I had from my work with prefetching was that if you can
shove things into a fixed-sized queue and prefetch on the way into the
queue instead of doing it just to sort of kickstart the next element during
a tree traversal that is going to be demanded too fast to take full
advantage of the latency, then you can smooth out a lot of the cross system
variance.
It is just incredibly invasive. =(
Re: doing prefetching in the mark phase, I just skimmed and found
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.9090&rep=rep1&type=pdf
takes which appears to take a similar approach.
-Edward
On Fri, Nov 28, 2014 at 3:42 AM, Simon Marlow <marlowsd at gmail.com> wrote:
> Thanks for this. In the copying GC I was using prefetching during the
> scan phase, where you do have a pretty good tunable knob for how far ahead
> you want to prefetch. The only variable is the size of the objects being
> copied, but most tend to be in the 2-4 words range. I did manage to get
> 10-15% speedups with optimal tuning, but it was a slowdown on a different
> machine or with wrong tuning, which is why GHC doesn't have any of this
> right now.
>
> Glad to hear this can actually be used to get real speedups in Haskell, I
> will be less sceptical from now on :)
>
> Cheers,
> Simon
>
> On 27/11/2014 10:20, Edward Kmett wrote:
>
>> My general experience with prefetching is that it is almost never a win
>> when done just on trees, as in the usual mark-sweep or copy-collection
>> garbage collector walk. Why? Because the time from the time you prefetch
>> to the time you use the data is too variable. Stack disciplines and
>> prefetch don't mix nicely.
>>
>> If you want to see a win out of it you have to free up some of the
>> ordering of your walk, and tweak your whole application to support it.
>> e.g. if you want to use prefetching in garbage collection, the way to do
>> it is to switch from a strict stack discipline to using a small
>> fixed-sized queue on the output of the stack, then feed prefetch on the
>> way into the queue rather than as you walk the stack. That paid out for
>> me as a 10-15% speedup last time I used it after factoring in the
>> overhead of the extra queue. Not too bad for a weekend project. =)
>>
>> Without that sort of known lead-in time, it works out that prefetching
>> is usually a net loss or vanishes into the noise.
>>
>> As for the array ops, davean has a couple of cases w/ those for which
>> the prefetching operations are a 20-25% speedup, which is what motivated
>> Carter to start playing around with these again. I don't know off hand
>> how easily those can be turned into public test cases though.
>>
>> -Edward
>>
>> On Thu, Nov 27, 2014 at 4:36 AM, Simon Marlow <marlowsd at gmail.com
>> <mailto:marlowsd at gmail.com>> wrote:
>>
>> I haven't been watching this, but I have one question: does
>> prefetching actually *work*? Do you have benchmarks (or better
>> still, actual library/application code) that show some improvement?
>> I admit to being slightly sceptical - when I've tried using
>> prefetching in the GC it has always been a struggle to get something
>> that shows an improvement, and even when I get things tuned on one
>> machine it typically makes things slower on a different processor.
>> And that's in the GC, doing it at the Haskell level should be even
>> harder.
>>
>> Cheers,
>> Simon
>>
>>
>> On 22/11/2014 05:43, Carter Schonwald wrote:
>>
>> Hey Everyone,
>> in
>> https://ghc.haskell.org/trac/__ghc/ticket/9353
>> <https://ghc.haskell.org/trac/ghc/ticket/9353>
>> and
>> https://phabricator.haskell.__org/D350
>> <https://phabricator.haskell.org/D350>
>>
>> is some preliminary work to fix up how the pure versions of the
>> prefetch
>> primops work is laid out and prototyped.
>>
>> However, while it nominally fixes up some of the problems with
>> how the
>> current pure prefetch apis are fundamentally borken, the simple
>> design
>> in D350 isn't quite ideal, and i sketch out some other ideas in
>> the
>> associated ticket #9353
>>
>> I'd like to make sure pure prefetch in 7.10 is slightly less
>> broken
>> than in 7.8, but either way, its pretty clear that working out
>> the right
>> fixed up design wont happen till 7.12. Ie, whatever makes 7.10,
>> there
>> WILL have to be breaking changes to fix those primops for 7.12
>>
>> thanks and any feedback / thoughts appreciated
>> -Carter
>>
>>
>> _________________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>> http://www.haskell.org/__mailman/listinfo/ghc-devs
>> <http://www.haskell.org/mailman/listinfo/ghc-devs>
>>
>> _________________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org <mailto:ghc-devs at haskell.org>
>> http://www.haskell.org/__mailman/listinfo/ghc-devs
>> <http://www.haskell.org/mailman/listinfo/ghc-devs>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20141128/962eab26/attachment.html>
More information about the ghc-devs
mailing list