[jhc] Grin.

Sun Dec 7 05:43:58 EST 2008

On Sun, Dec 7, 2008 at 11:26 AM, John Meacham <john at repetae.net> wrote:
> On Sat, Dec 06, 2008 at 08:23:48AM +0100, Lemmih wrote:
>> I've been looking at some grin output and it seems that tags have been
>> dropped in favour of indirect calls. That is, 'eval' now does an
>> indirect call instead of inspecting the node tag. Is there a reason
>> for this?
>
> There were a couple reasons that functionality was added, Mainly it came
> down to flexibility of implementation. It provides a straightforward
> path for separate compilation. It is not intended that indirect calls be
> used always for evaluation, it just provides a universal fall-back case
> when direct execution is not appropriate for various reasons. Although
> jhc is a whole program optimizer, some support for separate compilation
> would be very convinient, if nothing else it will be needed for dynamic
> code loading at run-time. Another reason had to do with the points-to
> analysis, I found myself "chasing my tail" a lot of the time, in the
> face of features such as arbitrary impredicativity and GADTs, it became
> harder and harder to ensure the points-to analysis would produce an
> accurate and useful result in all possible cases, being able to fall
> back on an indirect evaluation in those cases simplified things
> dramatically. improving the front end, and improving the accuracy of the
> point-to analysis could be decoupled and no longer had to be done in
> lock-step. The dynamic nature of haskell programs also played a part,
> when looking at a histogram of the number of targets a given heap
> location could evaluate to it was very tail-heavy, I found that by far,
> exactly 1 was the most common possibility. Furthermore, after a point,
> the size of the jump table in the cache killed the advantage of the
> direct call over an indirect one. So all in all, it made sense to
> concentrate on a couple cases, the direct case, when you can reduce the
> number of possibilities to just a couple choices, and the fallback case
> where an indirect call is used. Note that where the "sweet spot" lies
> for optimal performance in terms of direct vs indirect execution is
> rather CPU architecture dependent.
>
> http://www.complang.tuwien.ac.at/forth/threading/
>
> contains some test code for testing various threading methods in a forth
> implementation, which is somewhat relevant at least in demonstrating how
> different CPUs behave.

What about the missed optimization opportunities? In the GRIN paper,
Boquist shows how 'foldl (+) 0 [1..n]' could be compiled to a strict
loop with no allocations. Without points-to analysis (which seems to
be completely disable at this point) most of the important
optimizations are blocked and the resulting program is far less
efficient than it should be.

-- 
Cheers,
  Lemmih