[GHC] #13623: join points produce bad code for stream fusion
GHC
ghc-devs at haskell.org
Thu Apr 27 23:10:34 UTC 2017
#13623: join points produce bad code for stream fusion
-------------------------------------+-------------------------------------
Reporter: choenerzs | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1-rc1
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: Runtime
Unknown/Multiple | performance bug
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
Below, I am generating to stream fusion streams xs and ys. Both
parameterized on k l. The two streams are then concatenated. Finally I do
a strict left fold.
This example needs the 'vector' package but nothing else.
{{{
module Test where
import Data.Vector.Fusion.Stream.Monadic as S
foo :: Int -> Int -> IO Int
foo = \i j -> S.foldl' (+) 0 $ xs i j S.++ ys i j
where xs k l = S.enumFromStepN k l 2
ys k l = S.enumFromStepN k l 3
{-# Inline xs #-}
{-# Inline ys #-}
{-# Inline foo #-}
}}}
With ghc-8.0.1 I get nice core:
{{{
$wfoo_r1Ai
$wfoo_r1Ai =
\ ww_s1q5 ww1_s1q9 w_s1q2 ->
letrec {
$s$wfoldlM'_loop_s1xc
$s$wfoldlM'_loop_s1xc =
\ sc_s1x7 sc1_s1x5 sc2_s1x6 sc3_s1x4 ->
case tagToEnum# (># sc2_s1x6 0#) of _ {
False -> (# sc_s1x7, I# sc3_s1x4 #);
True ->
$s$wfoldlM'_loop_s1xc
sc_s1x7
(+# sc1_s1x5 ww1_s1q9)
(-# sc2_s1x6 1#)
(+# sc3_s1x4 sc1_s1x5)
}; } in
letrec {
$s$wfoldlM'_loop1_s1x3
$s$wfoldlM'_loop1_s1x3 =
\ sc_s1x2 sc1_s1x0 sc2_s1x1 sc3_s1wZ ->
case tagToEnum# (># sc2_s1x1 0#) of _ {
False -> $s$wfoldlM'_loop_s1xc sc_s1x2 ww_s1q5 3# sc3_s1wZ;
True ->
$s$wfoldlM'_loop1_s1x3
sc_s1x2
(+# sc1_s1x0 ww1_s1q9)
(-# sc2_s1x1 1#)
(+# sc3_s1wZ sc1_s1x0)
}; } in
$s$wfoldlM'_loop1_s1x3 w_s1q2 ww_s1q5 2# 0#
}}}
Now the same with ghc-8.2-rc1. Here,
[https://github.com/haskell/vector/blob/master/Data/Vector/Fusion/Stream/Monadic.hs
Stream.++] function is not fully optimized away (Left and Right
constructors!). Instead we have a join point that executes either of the
two parts (xs or ys) based on a case w2_s1U2 of {Left -> ; Right ->}.
{{{
$wfoo_r23R
$wfoo_r23R
= \ ww_s1Ue ww1_s1Ui w_s1Ub ->
let {
x1_a1tj
x1_a1tj = I# ww_s1Ue } in
let {
tb_a1wC
tb_a1wC = (x1_a1tj, lvl1_r23Q) } in
let {
lvl2_s1Yh
lvl2_s1Yh = Right tb_a1wC } in
joinrec {
$wfoldlM'_loop_s1U8
$wfoldlM'_loop_s1U8 w1_s1U0 ww2_s1U6 w2_s1U2 w3_s1U3
= case w1_s1U0 of { __DEFAULT ->
case w2_s1U2 of {
Left sa_a1yP ->
case sa_a1yP of { (w4_a1zr, m1_a1zs) ->
case m1_a1zs of { I# x2_a1zw ->
case tagToEnum# (># x2_a1zw 0#) of {
False -> jump $wfoldlM'_loop_s1U8 SPEC ww2_s1U6
lvl2_s1Yh w3_s1U3;
True ->
case w4_a1zr of { I# y_a1xT ->
jump $wfoldlM'_loop_s1U8
SPEC
(+# ww2_s1U6 y_a1xT)
(Left (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw 1#)))
w3_s1U3
}
}
}
};
Right sb_a1z3 ->
case sb_a1z3 of { (w4_a1zr, m1_a1zs) ->
case m1_a1zs of { I# x2_a1zw ->
case tagToEnum# (># x2_a1zw 0#) of {
False -> (# w3_s1U3, I# ww2_s1U6 #);
True ->
case w4_a1zr of { I# y_a1xT ->
jump $wfoldlM'_loop_s1U8
SPEC
(+# ww2_s1U6 y_a1xT)
(Right (I# (+# y_a1xT ww1_s1Ui), I# (-# x2_a1zw
1#)))
w3_s1U3
}
}
}
}
}
}; } in
jump $wfoldlM'_loop_s1U8 SPEC 0# (Left (x1_a1tj, lvl_r23P)) w_s1Ub
}}}
For my stream-fusion heavy code, this yields a slowdown of approximately
x4 (10 seconds with ghc-8.2-rc1, 2.5 seconds with ghc-8.0.1).
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13623>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list