[Haskell-cafe] optimising for vector units

Wed Jul 28 04:47:47 EDT 2004

>That was me.  I think you're underestimating the cost of starting
>threads even in this very lightweight world.

Maybe... Perhaps haskell could be made to resemble dataflow instructions
more... If when a computation completes we insert the result directly
into the data structure which represents function, we can infact pick any
function for execution with _no_ overhead. (In a true dataflow system any
instruction from any thread can be executed with no overhead)

The point it at any time we have N functions ready for execution (a function
is ready for execution when all its arguments are ready)... we can pick and
execute any (or all of these if enough execution units are ready) of these.

The suggestion I guess is to only use instructions that get their arguments
from main memory. This way any instruction can be sequenced with no overhead
on any CPU. With modern on-die core-speed caches this can be almost as fast
a registers (with good cache access patterns) ... Note that I am only suggesting
interleaving instructions at the function level, so registers can be used within
functions... of course as things get more and more parallel we may see
hardware with no registers, just pipelined high speed cache access. (The 
hardware may well use registers to pre-fetch cache values, but that can
be made transparent to the software)...

Hardware manufacturers have hit the limit for pure sequential execution speed,
so more parallelism is the only way forward (see Intels revised roadmap, they
abandoned the pentium 4 and 5 and have focused on an updated _low_power_
pentium 3M, and are planning multi core versions for more speed).

C and other imperitive languages focus toom
much on  the how and not the what to be easy to use in such multi-cpu
environments... A language with abstracts and hides the parallelism could
well take off in a big way.

Keean.