[GHC] #8971: Native Code Generator 7.8.1 RC2 is not as optimized as 7.6.3...

Fri Apr 25 06:23:36 UTC 2014

#8971: Native Code Generator 7.8.1 RC2 is not as optimized as 7.6.3...
--------------------------------------------+------------------------------
        Reporter:  GordonBGood              |            Owner:
            Type:  bug                      |           Status:  new
        Priority:  normal                   |        Milestone:
       Component:  Compiler (NCG)           |          Version:  7.8.1-rc2
      Resolution:                           |         Keywords:
Operating System:  Unknown/Multiple         |     Architecture:
 Type of failure:  Runtime performance bug  |  Unknown/Multiple
       Test Case:                           |       Difficulty:  Unknown
        Blocking:                           |       Blocked By:
                                            |  Related Tickets:
--------------------------------------------+------------------------------

Comment (by GordonBGood):

 Replying to [comment:11 jstolarek]:
 > I know nothing about 7.6 Cmm generation, but I worked on 7.8 Cmm
 pipeline this summer so I can offer some guidance. First I'd like to
 clarify a few things:

 Thanks for your explanations and clarifications, some of which I have dug
 out for myself in the last few minutes - I had already edit added
 something to the post to which you are replying:

 > > both NCG and LLVM will share the same C-- output
 >
 > No, they will not. LLVM backend requires "proc-point splitting". This
 means we need to turn every Cmm block that is succcesor of more than one
 block into a separate procedure (at least that is my understanding). See
 [https://github.com/ghc/ghc/blob/f8e12e2b396e0c475e1403ab8ac3fc4d63c1681e/compiler/cmm/CmmPipeline.hs#L104
 here] and
 [https://github.com/ghc/ghc/blob/f8e12e2b396e0c475e1403ab8ac3fc4d63c1681e/compiler/cmm/CmmPipeline.hs#L77
 here] to see source of differences between Cmm generated for both
 backends.
 [https://github.com/ghc/ghc/blob/f8e12e2b396e0c475e1403ab8ac3fc4d63c1681e/compiler/cmm/CmmProcPoint.hs#L35
 Here] you'll find more on proc-points.

 Learned something new here.  OK, end result is that using LLVM as a proof
 that the problem is in NCG because they use the same CMM code is invalid,
 so it was worth checking the CMM code.

 > > straight cmm dump has regressed to have lost even the basic
 optimizations that were there with the older version.
 >
 > I'm not sure what was the philosophy behind 7.6 Cmm geneartion but in
 7.8 we just generate Cmm from STG in the simplest possible way and then
 optimize that Cmm. This is similar to generating Core from Haskell and
 then doing a series of core-to-core transformations. So you need to look
 at the final Cmm, not the one that comes out from the Cmm->STG pass.

 OK, I dug that out for myself and now see that in the final pass the CMM
 code is almost identical for the code that triggers the symptom (other
 than for the constants recorded as 32-bit depth for 32-bit registers where
 they used to be recorded as 64-bit even though only 32-bits were used).

 > > It may be that the NCG is using the non-optimized version of CMM as a
 source
 >
 > It is using the optimized version. You can see for yourself in the
 [https://github.com/ghc/ghc/blob/f8e12e2b396e0c475e1403ab8ac3fc4d63c1681e/compiler/main/HscMain.hs#L1237
 tryNewCodeGen function] and its
 [https://github.com/ghc/ghc/blob/f8e12e2b396e0c475e1403ab8ac3fc4d63c1681e/compiler/main/HscMain.hs#L1159
 call site].

 OK, so given (almost) identical final CMM code, the problem is (likely)
 not in the CMM code but in the NCG.

 > >  Anticipating your next request to look at the STG output
 >
 > Wrong anticipation :-) Look at generated assembly. Only this can tell
 you what is the real difference between generated code. I wouldn't be
 surprised to see something in the lines of #8048.
 >
 > > Thus, the bug/regression appears to be go further back than just the
 new NCG (which is likely using the non-optimized CMM code as input) but
 also to the CMM code generator in that it is producing much less efficient
 CMM code.

 I had already posted the inner loop from the assembly as part of the
 original bug report; someone requested that I post CMM output, which shows
 it is not (likely) the problem and we are back to NCG as you confirm here.

 > Let me repeat: a) NCG is using optimized Cmm (look at the code); b)
 don't look at the un-optimized Cmm - it's irrelevant.

 Yup, you have made that clear:  I never want to look at CMM code ever
 again.

 > Finally, I you want to learn more about Cmm pipeline and Cmm debugging
 see [wiki:Commentary/Compiler/CodeGen].
 >
 > Hope that helps.

 Yes, I found that link on my own which is how I came to a fuller
 understanding of the contents of the new CMM file and its relationship to
 the final opt-cmm file.  But you have been most helpful in explaining that
 LLVM and NCG do not (can not) use the same CMM source.

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8971#comment:14>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler