<div dir="ltr"><br><div class="gmail_extra">Thanks Ben! I have my responses inline below.</div><div class="gmail_extra"><br><div class="gmail_quote">On 16 June 2016 at 18:07, Ben Gamari <span dir="ltr"><<a href="mailto:ben@smart-cactus.org" target="_blank">ben@smart-cactus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span class=""><br>
</span>Indeed; I've opened D2335 [1] to reenable -fregs-graph and add an<br>
appropriate note to the users guide.<br></blockquote><div> </div><div>Thanks! That was quick.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">For the record, I have also struggled with register spilling issues in<br>
the past. See, for instance, #10012, which describes a behavior which<br>
arises from the C-- sinking pass's unwillingness to duplicate code<br>
across branches. While in general it's good to avoid the code bloat that<br>
this duplication implies, in the case shown in that ticket duplicating<br>
the computation would be significantly less code than the bloat from<br>
spilling the needed results.<br></blockquote><div><br></div><div>Not sure if this is possible but when unsure we can try both and compare if the duplication results in significantly more code than no duplication and make a decision based on that. Though that will slow down the compilation. Maybe we can bundle slower passes in something like -O3, meaning it will be slow and may or may not provide better results?</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<span class=""><br>
> But I found a few interesting optimizations that llvm did. For example,<br>
> there was a heap adjustment and check in the looping path which was<br>
> redundant and was readjusted in the loop itself without use. LLVM either<br>
> removed the redundant _adjustments_ in the loop or moved them out of the<br>
> loop. But it did not remove the corresponding heap _checks_. That makes me<br>
> wonder if the redundant heap checks can also be moved or removed. If we can<br>
> do some sort of loop analysis at the CMM level itself and avoid or remove<br>
> the redundant heap adjustments as well as checks or at least float them out<br>
> of the cycle wherever possible. That sort of optimization can make a<br>
> significant difference to my case at least. Since we are explicitly aware<br>
> of the heap at the CMM level there may be an opportunity to do better than<br>
> llvm if we optimize the generated CMM or the generation of CMM itself.<br>
><br>
</span>Very interesting, thanks for writing this down! Indeed if these checks<br>
really are redundant then we should try to avoid them. Do you have any<br>
code you could share that demosntrates this?<br></blockquote><div><br></div><div>The gist that I provided in this email thread earlier demonstrates it. Here it is again:</div><div><br></div><div><a href="https://gist.github.com/harendra-kumar/7d34c6745f604a15a872768e57cd2447">https://gist.github.com/harendra-kumar/7d34c6745f604a15a872768e57cd2447</a><br></div><div><br></div><div>If you look at the CMM trace in the gist. Start at label c4ic where we allocate space on heap (+48). Now, there are many possible paths from this point on some of those use the heap and some don't. I have marked those which use the heap by curly braces, the rest do not use it at all.</div><div>1) c4ic (allocate) -> c4mw -> {c4pv} -> ...</div><div>2) c4ic (allocate) -> c4mw -> c4pw -> ((c4pr -> ({c4pe} -> ... | c4ph -> ...)) | cp4ps -> ...)</div><div><br></div><div>If we can place this allocation at both c4pv and c4pe instead of the common parent then we can save the fast path from this check. The same thing applies to the allocation at label c4jd as well.</div><div><br></div><div>I have the code to produce this CMM, I can commit it on a branch and leave it in the github repository so that we can use it for fixing.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
It would be great to open Trac tickets to track some of the optimization<br></blockquote><div><br></div><div>Will do.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">There is indeed a question of where we wish to focus our optimization<br>
efforts. However, I think using LLVM exclusively would be a mistake.<br>
LLVM is a rather large dependency that has in the past been rather<br>
difficult to track (this is why we now only target one LLVM release in a<br>
given GHC release). Moreover, it's significantly slower than our<br>
existing native code generator. There are a number of reasons for this,<br>
some of which are fixable. For instance, we currently make no effort to tell<br>
LLVM which passes are worth running and which we've handled; this is<br>
something which should be fixed but will require a rather significant<br>
investment by someone to determine how GHC's and LLVM's passes overlap,<br>
how they interact, and generally which are helpful (see GHC #11295).<br>
<br>
Furthermore, there are a few annoying impedance mismatches between Cmm<br>
and LLVM's representation. This can be seen in our treatment of proc<br>
points: when we need to take the address of a block within a function<br>
LLVM requires that we break the block into a separate procedure, hiding<br>
many potential optimizations from the optimizer. This was discussed<br>
further on this list earlier this year [2]. It would be great to<br>
eliminate proc-point splitting but doing so will almost certainly<br>
require cooperation from LLVM.<br></blockquote><div><br></div><div>It sounds like we need to continue with both for now and see how the llvm option pans out. There is clearly no reason for a decisive tilt towards llvm in near future.</div><div><br></div><div>-harendra </div></div><br></div></div>