[GHC] #14827: Recognize when inlining would create a join point

GHC ghc-devs at haskell.org
Wed Feb 21 16:17:50 UTC 2018


#14827: Recognize when inlining would create a join point
-------------------------------------+-------------------------------------
        Reporter:  ersetzen          |                Owner:  (none)
            Type:  feature request   |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.2.2
      Resolution:                    |             Keywords:  JoinPoints
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by ersetzen):

 [https://gist.github.com/Tarmean/f97a6463aaad8069416cc6810e8ba4e5 Here are
 both versions and the corresponding dump-simpl output] (only the last line
 is changed).

 I had to rewrite it somewhat because the original created a 10k line core
 function. This version created ~450 lines of core when I compiled it with
 {{{-O2 -ddump-simpl -ddump-stg -dsuppress-uniques -dsuppress-all -fno-
 liberate-case -ddump-to-file  -fforce-recomp -fno-spec-constr -ticky
 -ticky-LNE}}}, which admittedly is a bit of a mouthful.

 I think ticky output first is probably simplest? Without inline pragma:


 {{{
 **************************************************

     Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
 --------------------------------------------------------------------------------
       15847     380328          0   0                      lvl2{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX
      302632   24210560          0   0                      lvl5{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER
    63931922          0          0   1 i
 $wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in
 s4ER
   135874515          0          0   0                      $j{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER
   136224692          0          0   1 i                    $wscan{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Eu
    21810147   36418408          0   3 iwi                  $wbuildTable{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4E5
       63392          0          0   2 SC                   snoc'{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun)
      366024          0          0   1 L                    checkAll{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX
      142632    4057088          0   1 L                    go1{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4DR
       16029    1014272          0   1 L                    go{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj
         182          0          0   2 LS                   go1{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj
           1         16          0   1 L
 longestCommonSubstring{v} (fun)
           4         96          0   0                      main1{v} (fun)
           1          0          0   0                      main4{v} (fun)
           1          0          0   0                      main{v} (fun)

 **************************************************
 }}}

 With inline pragma:
 {{{
 **************************************************

     Entries      Alloc    Alloc'd  Non-void Arguments      STG Name
 --------------------------------------------------------------------------------
       15847     380328          0   0                      lvl2{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho
      302632   24210560          0   0                      lvl3{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih
    63931922          0          0   1 i
 $wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in
 s4Ih
   135874515          0          0   0                      $j{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih
  8551263604          0          0   3 iwi                  $wbuildTable{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw
   136224692          0          0   1 i                    $wscan{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw
       63392          0          0   2 SC                   snoc'{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun)
      366024   47624072          0   1 L                    checkAll{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho
      142632    4057088          0   1 L                    go1{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hi
       16029    1014272          0   1 L                    go{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj
         182          0          0   2 LS                   go1{v}
 (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj
           1         16          0   1 L
 longestCommonSubstring{v} (fun)
           4         96          0   0                      main1{v} (fun)
           1          0          0   0                      main4{v} (fun)
           1          0          0   0                      main{v} (fun)

 **************************************************
 }}}

 Removing the inline pragma moves the result allocation from $wscan to
 $wbuildTable and we don't have to allocate the $wbuildTable closure since
 it's a join point.

 More drastically, the $wbuildTable entries go down from 8551263604 to
 21810147! Perf also shows that in the INLINE version the shiftLeft in
 $wbuildTable is the hottest instruction by quite some margin.

 [https://gist.github.com/Tarmean/0afe4d3a515c7d47cc526698180d1578 Finally
 a diff between the two dump-simpl outputs]. Notably all values that are
 floated out are unlifted so this doesn't save any heap allocations. Of
 those only {{{ lvl4 = +# dt2 1# }}} and the $wbuildTable result are used
 multiple times.

 Sorry that this got a bit long.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14827#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list