CMM-to-ASM: Register allocation wierdness

Thu Jun 16 10:53:12 UTC 2016

On 16 June 2016 at 13:59, Ben Gamari <ben at smart-cactus.org> wrote:
>
> It actually came to my attention while researching this that the
> -fregs-graph flag is currently silently ignored [2]. Unfortunately this
> means you'll need to build a new compiler if you want to try using it.

Yes I did try -fregs-graph and -fregs-iterative both. To debug why nothing
changed I had to compare the executables produced with and without the
flags and found them identical.  A note in the manual could have saved me
some time since that's the first place to go for help. I was wondering if I
am making a mistake in the build and if it is not being rebuilt
properly. Your note confirms my observation, it indeed does not change
anything.

> All-in-all, the graph coloring allocator is in great need of some love;
> Harendra, perhaps you'd like to have a try at dusting it off and perhaps
> look into why it regresses in compiler performance? It would be great if
> we could use it by default.

Yes, I can try that. In fact I was going in that direction and then stopped
to look at what llvm does. llvm gave me impressive results in some cases
though not so great in others. I compared the code generated by llvm and it
perhaps did a better job in theory (used fewer instructions) but due to
more spilling the end result was pretty similar.

But I found a few interesting optimizations that llvm did. For example,
there was a heap adjustment and check in the looping path which was
redundant and was readjusted in the loop itself without use. LLVM either
removed the redundant  _adjustments_ in the loop or moved them out of the
loop. But it did not remove the corresponding heap _checks_. That makes me
wonder if the redundant heap checks can also be moved or removed. If we can
do some sort of loop analysis at the CMM level itself and avoid or remove
the redundant heap adjustments as well as checks or at least float them out
of the cycle wherever possible. That sort of optimization can make a
significant difference to my case at least. Since we are explicitly aware
of the heap at the CMM level there may be an opportunity to do better than
llvm if we optimize the generated CMM or the generation of CMM itself.

A thought that came to my mind was whether we should focus on getting
better code out of the llvm backend or the native code generator. LLVM
seems pretty good at the specialized task of code generation and low level
optimization, it is well funded, widely used and has a big community
support. That allows us to leverage that huge effort and take advantage of
the new developments. Does it make sense to outsource the code generation
and low level optimization tasks to llvm and ghc focussing on higher level
optimizations which are harder to do at the llvm level? What are the
downsides of using llvm exclusively in future?

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20160616/c0a73fc7/attachment.html>