[Haskell-cafe] faster faster faster but not uglier (how to make nice code AND nice core)?

Joachim Durchholz jo at durchholz.org
Wed May 19 21:43:49 UTC 2021


> * and not low-level enough: How do I tell GHC to pack (coerce?)
> `data Pos` into `Word64`? (It's not
> https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/pragmas.html#unpack-pragma
> ?)
> 
> 
> And would it help? Or is it even needed?
> If I have the data spread over several words,
> it could still be fine - as long as it's kept in registers?

Actually, as long as it is kept in a CPU cache line.

Cf. Ulrich Drepper: What every programmer should know about memory, 
https://www.akkadia.org/drepper/cpumemory.pdf

The paper tells me that data locality wrt. cache lines (i.e. keeping 
data accessed together in a single cache line) can have an 
order-of-magnitude effect.

(It also talks about multithreading, which can have two orders of 
magnitude. It's not relevant to vector optimization though.)

It's quite possible that the speedups from using a CPU's vector 
operations is mostly because of better cacheline locality since the 
vector operations enforce data locality - though vector operations 
probably give you a nice boost on top of that.

Does ghc do memory locality analysis?
It would need to find out what data items are going to be accessed 
roughly at the same time, and making sure they're close together in memory.
Deforestation and such will help with locality as a nice side effect 
(because you get rid of list spines and such so the data stretches 
across less cache lines anyway), but is there any analysis on top of that?

Regards,
Jo


More information about the Haskell-Cafe mailing list