[Haskell-cafe] CPU with Haskell support
Joachim Durchholz
jo at durchholz.org
Wed Jan 20 12:43:32 UTC 2016
Am 20.01.2016 um 13:16 schrieb Serguey Zefirov:
> You are unneccessary overly pessimistic, let me show you somethings you,
> probably, have not thought or heard about.
Okaaaay...
> A demonstration from the industry, albeit not quite hardware industry:
>
> http://www.disneyanimation.com/technology/innovations/hyperion - "Hyperion
> handles several million light rays at a time by sorting and bundling them
> together according to their directions. When the rays are grouped in this
> way, many of the rays in a bundle hit the same object in the same region of
> space. This similarity of ray hits then allows us – and the computer – to
> optimize the calculations for the objects hit."
Sure. If you have a few gazillion of identical algorithms, you can
parallelize on that. That's the reason why 3D cards even took off, the
graphics pipeline grew processing capabilities and evolved into a
(rather restricted) GPU core model. So it's not necessarily impossible
to build something useful, merely very unlikely.
> Then, let me bring up an old idea of mine:
> https://mail.haskell.org/pipermail/haskell-cafe/2009-August/065327.html
>
> Basically, we can group identical closures into vectors, ready for SIMD
> instructions to operate over them. The "vectors" should work just like
> Data.Vector.Unboxed - instead of vector of tuple of arguments there should
> be a tuple of vectors with individual arguments (and results to update for
> lazy evaluation).
>
> Combine this with sorting of addresses in case of references and you can
> get a lot of speedup by doing... not much.
Heh. Such stuff could work - *provided* that you can really make a case
of having enough similar work.
Still, I'd work on making a model of that on GPGPU hardware first.
Two advantages:
1) No hardware investment.
2) You can see what the low-hanging fruit are and get a rough first idea
how much parallelization really gives you.
The other approach: See what you can get out of a Xeon with really many
cores (14, or even more).
Compare the single-GPGPU vs. multi-GPGPU speedup with the single-CPUcore
vs. multi-CPUcore speedup. That might provide insight into how well the
interconnects and cache coherence protocols interfere with the multicore
speedup.
Why I'm so central on multicore? Because that's where hardware is going
to go, because hardware isn't going to clock much higher but people will
still want to improve performance.
Actually I think that single-core improvements aren't going to be very
important. First on my list would be exploiting multicore, second cache
locality. There's more to be gotten from there than from specialized
hardware, IMVHO.
Regards,
Jo
More information about the Haskell-Cafe
mailing list