[Haskell-cafe] CPU with Haskell support
Serguey Zefirov
sergueyz at gmail.com
Wed Jan 20 12:16:41 UTC 2016
You are unneccessary overly pessimistic, let me show you somethings you,
probably, have not thought or heard about.
2016-01-20 10:51 GMT+03:00 Joachim Durchholz <jo at durchholz.org>:
> Am 19.01.2016 um 23:12 schrieb Henning Thielemann:
>
>>
>> Fortunately, there are
>> processors that are designed for custom instruction set extensions:
>> https://en.wikipedia.org/wiki/Xtensa
>>
>
> Unfortunately, the WP article does not say anything that couldn't be said
> about, say, an ARM core. Other than that Xtensa core being some VLIW design.
>
> Would it be sensible to create a processor based on such a design?
>>
>
> Very, very unlikely, for multiple reasons.
>
> Special-purpose CPUs have been built, most notably for LISP, less notably
> for Java, and probably for other purposes that I haven't heard of.
> Invariably, their architectural advantages were obsoleted by economy of
> scale: Mainstream CPUs are being produced in such huge numbers that Intel
> etc. could affort more engineers to optimize every nook and cranny, more
> engineers to optimize the structure downscaling, and larger fabs that could
> do more chips on more one-time expensive but per-piece cheap equipment, and
> in the end, the special-purpose chips were slower and more expensive. It's
> an extremely strong competition you are facing if you try this.
>
> Also, it is very easy to misidentify the actual bottlenecks and make
> instructions for the wrong ones.
> If caching is the main bottleneck (which it usually is), no amount of CPU
> improvement will help you and you'll simply need a larger cache. Or,
> probably, a compiler that knows enough about the program and its data flow
> to arrange the data in a cache-line-friendly fashion.
>
A demonstration from the industry, albeit not quite hardware industry:
http://www.disneyanimation.com/technology/innovations/hyperion - "Hyperion
handles several million light rays at a time by sorting and bundling them
together according to their directions. When the rays are grouped in this
way, many of the rays in a bundle hit the same object in the same region of
space. This similarity of ray hits then allows us – and the computer – to
optimize the calculations for the objects hit."
Then, let me bring up an old idea of mine:
https://mail.haskell.org/pipermail/haskell-cafe/2009-August/065327.html
Basically, we can group identical closures into vectors, ready for SIMD
instructions to operate over them. The "vectors" should work just like
Data.Vector.Unboxed - instead of vector of tuple of arguments there should
be a tuple of vectors with individual arguments (and results to update for
lazy evaluation).
Combine this with sorting of addresses in case of references and you can
get a lot of speedup by doing... not much.
>
> I do not think this is going to be a long-term problem though. Pure
> languages have huge advantages for fine-grained parallel processing, and
> CPU technology is pushing towards multiple cores, so that's a natural
> match. As pure languages come into more widespread use, the engineers at
> Intel, AMD etc. will look at what the pure languages need, and add
> optimizations for these.
>
> Just my 2 cents.
> Jo
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160120/c752f4d6/attachment.html>
More information about the Haskell-Cafe
mailing list