[GHC] #8275: Loopification breaks profiling

Thu Sep 19 17:56:59 CEST 2013

#8275: Loopification breaks profiling
----------------------------------------+----------------------------------
        Reporter:  jstolarek            |            Owner:  jstolarek
            Type:  bug                  |           Status:  new
        Priority:  highest              |        Milestone:
       Component:  Profiling            |          Version:  7.7
      Resolution:                       |         Keywords:
Operating System:  Unknown/Multiple     |     Architecture:
 Type of failure:  Building GHC failed  |  Unknown/Multiple
       Test Case:                       |       Difficulty:  Unknown
        Blocking:  8298                 |       Blocked By:
                                        |  Related Tickets:
----------------------------------------+----------------------------------

Comment (by jstolarek):

 Edward, there is some progress. There is a single test in the testsuite
 that fails when loopification is turned on, but works if it is off. That
 test is T1735. Assuming that you built GHC with profiling libraries you
 can run this test from `testsuite/tests` with:
 {{{
 make WAY=normal EXTRA_HC_OPTS="-prof -fprof-auto -rtsopts" TEST=T1735
 }}}
 and it should pass. Including `-floopification` on the list of extra
 options should make the test fail. Examining the Cmm dumps is a pain -
 they are over 200k lines - but I found a single place where loopification
 occurs. Attached are the dumps of this single function with and without
 loopification. `-input` suffix denotes code produced by the code
 generator, `-output` suffix denotes code after it has gone through all
 optimisations in Cmm pipeline.

 If you look at the output code you will notice one incorrect
 transformation in the loopified version: stack check gets duplicated and
 inserted at the end of function, right before making a tail call. There
 are two issues here:

   1) Correctness. Cmm pipeline performs an incorrect transformation on a
 valid Cmm input program. It duplicates stack check at the end of a
 function, which violates assumption that stack checks are made only at the
 entry to a function (see !CmmLayoutStack, line 208). Putting stack check
 at the end makes it invalid, because we set required amount of stack to be
 a fixed value. As a result at the end of a function we are checking for
 more stack that we actually need (becuase we have moved the stack
 pointer). Simon proposes to fix that by referring to `<old + 0>` instead
 of `Sp` (before stack layout). To me it is not clear whether this stack
 check is direct cause of a segfault, though it certainly is not correct.

   2) Performance. Putting stack check right before making a call makes no
 sense. This is probably the direct result of control flow optimisations
 pass, but there is a deeper problem here. When making a loopified tail
 call we want to avoid re-doing the stack check because it is not necessary
 to do it again, but we want to perform heap check. So loopified tail call
 should make a jump in between those two. Currently generation of stack and
 heap checks is closely coupled with each other in the code generator and
 it will probably take some work to separate them. Surprisingly, sometimes
 we get good code (loopification jumps between stack and heap checks) and
 sometimes we don't. Neither I nor Simon know why this happens - I need to
 look at the code and solve that mystery.

 I wasn't able to figure out one thing. The loopified tail call omits one
 block of code related to profiling (the one with magic numbers). I don't
 know whether this is correct or not. Edward, are you able to tell whether
 this can be done or not?

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8275#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler