[Haskell-cafe] Glome.hs-0.3 and other things
Andrew Coppin
andrewcoppin at btinternet.com
Mon Apr 21 14:54:24 EDT 2008
Jim Snow wrote:
> Useful references: "What Every Programmer Needs to Know About Memory"
> http://lwn.net/Articles/250967/
Thank you for monopolising my entire afternoon. This is probably the
single most interesting thing I've read in ages! :-D
Thinking about it, maybe this explains my sorting benchmark - sorting
Word32 values probably involves more shifting data from place to place
than actual computation, so it's probably limited by the bandwidth of
the FSB than the CPU's ability to perform number chrunking. Running it
in multiple threads is probably just thrashing the L2 cache rather than
actually helping...
After studying all this material, I do find myself feeling slightly
concerned. The article shows how in C / C++ / assembly you can arrange
your data and order your computations to make maximum use of the various
caches and avoid certain bottlenecks in the system. And the *vast*
performance difference it can yield. But what happens if you happen to
be writing your program in Smalltalk or Java or... Haskell. Up here,
you're so far from the hardware that it would seem that you have *no
hope* of using the CPU caches effectively. Think about it - data
scattered all over the heap in an essentially random pattern, getting
moved around periodically by the GC. [Oh, the GC! That sounds like it
must nail the hell out of the cache. And even if it doesn't, it more or
less negates its usefulness because all the "hot" data is now at a
different address.] Just think about trying to process a Haskell list -
all those multiple levels of pointer indirection! The CPU has no hope of
prefetching that stuff... Damn, it seems as if there's simply no way of
ever making Haskell fast. :-(
I suppose the idea is that Haskell is supposed to help you work at a
higher level of abstraction, so you can concentrate on building better
*algorithms* which require less work in the first place. Surely using an
algorithm in a suitably superior complexity class could more than make
up for the performance loss from lack of cache coherence. But people who
work in C and C++ know how to build efficient algorithms too, so... hmm,
we seem to be in trouble here.
More information about the Haskell-Cafe
mailing list