[Haskell-cafe] DpH/repa cache locality
benl at ouroborus.net
Sun Jul 18 05:58:57 EDT 2010
The difficulty with optimising for cache effects is that you're effectively introducing sequential dependencies between elements of the array you're processing.
To say this another way: If you can evaluate the array elements in any order then you can evaluate them in parallel. Adding restrictions like "element i must be processed before element i+1" can improve case usage but also restricts the evaluation order, and makes it less obvious how to parallelise the code.
For Repa, I think we'll end providing primitive operators for doing things like 2d image convolutions. Moving from a linear image convolution, where the all the pixels in one row are processed before moving onto the next, to a blocked based convolution is really a change in algorithm -- and not something I'd expect a general purpose compiler optimisation to do.
On 13/07/2010, at 9:49 , Gregory Crosswhite wrote:
> Hey everyone,
> Just out of curiosity, what work is being done in the data parallel
> haskell / repa projects regarding cache locality? The reason I am
> asking is because, as I understand it, the biggest bottleneck on today's
> processors are cache misses, and the reason why optimized
> platform-specific linear algebra libraries perform well is because they
> divide the data into chunks that are optimally sized for the cache in
> order to maximize the number of operations performed per memory access.
> I wouldn't expect data parallel haskell/repa to automatically know what
> the perfect chunking strategy should be on each platform, but are there
> any plans being made at all to do something like this?
> (To be explicit, this isn't meant as a criticism; I'm just curious and
> am interested in seeing discussion on this topic by those more
> knowledgeable than I. :-) )
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe