[Haskell-cafe] optimising for vector units

Tue Jul 27 10:31:20 EDT 2004

Ketil Malde wrote:
> Jan-Willem Maessen - Sun Labs East <Janwillem.Maessen at Sun.COM> writes:
>>There are, I believe, a couple of major challenges:
>>   * It's easy to identify very small pieces of parallel work, but much
>>     harder to identify large, yet finite, pieces of work.  Only the
>>     latter are really worth parallelizing.
> 
> By the former, are you thinking of so small grain that it is handled
> by out-of-order execution units in the CPU?  And/or the C compiler?

I'm thinking of so small grain that the cost of fine-grained thread 
creation (tens of instructions) is comparable to the cost of running 
the thread itself (hundreds of instructions).

ILP is a bit of a sideshow here.  Usually there's plenty of integer 
ILP hanging about already.

>>   * If you don't compute speculatively, you'll never find enough work
>>     to do.
> 
> 
> Although I'm not familiar with the issues, my point is that the number
> of CPUs available, even in common household pee cees, is already more
> than one (P4 hyper-threading), and could be something like eight in
> the not-so-distant future.  It no longer matters (much) if you waste
> `cycles, cycles are cheap.  (The next next IA64, Montecito is 1.7G
> transistors, including 24Mb on-chip cache.  The P4 is big, but you
> could fit thirty of them in that space.  No way Montecito is going to
> have anywhere near 30x the performance)
> 
> So speculative execution, even if you end up throwing away 50% of the
> work you do, could in theory make your program faster anyway.  This is
> a headache for C programs; my hope would be that a functional language
> would make it easier.

Be careful.  A P4 will slow down if you get it hot enough.  So 
"throwing away" that bit of extra performance may actually make things 
slower...  And who gets to use the throwaway performance, anyhow?  You 
want it, the OS wants it, other applications on your machine want 
it---you have to be able to adjust to varying amounts of "extra" 
compute power hanging around.  [Actually, speaking from experience, 
half the compute power on my SMP ends up going to the GUI, the OS, or 
my browser---and it's great to have those things not slow down 
uniprocessor compute-bound tasks.]

>>   * If you compute speculatively, you need some way to *stop* working
>>     on useless, yet infinite computations.
> 
> 
> And you need to choose which computations to start working on, I guess.
> Predicting the future never was easy :-)

Indeed.

> [perhaps getting off-topic, but hey, this is -cafe]

That's why I subscribe to -cafe at all!

-Jan-Willem Maessen

> 
> -kzm