[GHC] #14827: Recognize when inlining would create a join point
GHC
ghc-devs at haskell.org
Wed Feb 21 16:17:50 UTC 2018
#14827: Recognize when inlining would create a join point
-------------------------------------+-------------------------------------
Reporter: ersetzen | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.2
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by ersetzen):
[https://gist.github.com/Tarmean/f97a6463aaad8069416cc6810e8ba4e5 Here are
both versions and the corresponding dump-simpl output] (only the last line
is changed).
I had to rewrite it somewhat because the original created a 10k line core
function. This version created ~450 lines of core when I compiled it with
{{{-O2 -ddump-simpl -ddump-stg -dsuppress-uniques -dsuppress-all -fno-
liberate-case -ddump-to-file -fforce-recomp -fno-spec-constr -ticky
-ticky-LNE}}}, which admittedly is a bit of a mouthful.
I think ticky output first is probably simplest? Without inline pragma:
{{{
**************************************************
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------------------
15847 380328 0 0 lvl2{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX
302632 24210560 0 0 lvl5{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER
63931922 0 0 1 i
$wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in
s4ER
135874515 0 0 0 $j{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER
136224692 0 0 1 i $wscan{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Eu
21810147 36418408 0 3 iwi $wbuildTable{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4E5
63392 0 0 2 SC snoc'{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun)
366024 0 0 1 L checkAll{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX
142632 4057088 0 1 L go1{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4DR
16029 1014272 0 1 L go{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj
182 0 0 2 LS go1{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj
1 16 0 1 L
longestCommonSubstring{v} (fun)
4 96 0 0 main1{v} (fun)
1 0 0 0 main4{v} (fun)
1 0 0 0 main{v} (fun)
**************************************************
}}}
With inline pragma:
{{{
**************************************************
Entries Alloc Alloc'd Non-void Arguments STG Name
--------------------------------------------------------------------------------
15847 380328 0 0 lvl2{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho
302632 24210560 0 0 lvl3{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih
63931922 0 0 1 i
$wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in
s4Ih
135874515 0 0 0 $j{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih
8551263604 0 0 3 iwi $wbuildTable{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw
136224692 0 0 1 i $wscan{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw
63392 0 0 2 SC snoc'{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun)
366024 47624072 0 1 L checkAll{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho
142632 4057088 0 1 L go1{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hi
16029 1014272 0 1 L go{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj
182 0 0 2 LS go1{v}
(Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj
1 16 0 1 L
longestCommonSubstring{v} (fun)
4 96 0 0 main1{v} (fun)
1 0 0 0 main4{v} (fun)
1 0 0 0 main{v} (fun)
**************************************************
}}}
Removing the inline pragma moves the result allocation from $wscan to
$wbuildTable and we don't have to allocate the $wbuildTable closure since
it's a join point.
More drastically, the $wbuildTable entries go down from 8551263604 to
21810147! Perf also shows that in the INLINE version the shiftLeft in
$wbuildTable is the hottest instruction by quite some margin.
[https://gist.github.com/Tarmean/0afe4d3a515c7d47cc526698180d1578 Finally
a diff between the two dump-simpl outputs]. Notably all values that are
floated out are unlifted so this doesn't save any heap allocations. Of
those only {{{ lvl4 = +# dt2 1# }}} and the $wbuildTable result are used
multiple times.
Sorry that this got a bit long.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14827#comment:11>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list