[Take 2] Unexpected duplicate join points in "Core" output?
Viktor Dukhovni
ietf-dane at dukhovni.org
Sat Nov 20 18:54:36 UTC 2021
On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:
> For the assembly I opened a ticket:
> https://gitlab.haskell.org/ghc/ghc/-/issues/20714
Thanks, much appreciated. Understood re redundant join points, though
in the non-toy context the redundnat point code is noticeably larger.
join {
exit4
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString)
exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld)
= case touch# dt1 ipv of { __DEFAULT ->
let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
}
} } in
join {
exit5
:: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString)
exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld)
= case touch# dt1 w1 of { __DEFAULT ->
let {
dt3 :: Int#
dt3 = minusAddr# ww4 dt } in
case ==# dt3 dt2 of {
__DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
1# -> jump $wconsume cs (orI# ww2 dt3) ww5
}
} } in
FWIW, these don't appear to be deduplicated, both result from the same
conditional: `acc < q || acc == q && d < 5`. I need some way to make
this compute a single boolean value without forking the continuation.
There's a another source of code bloat that I'd like to run by you...
In the WIP code for Lazy ByteString 'readInt', I started with:
readInt !q !r =
\ !s -> consume s False 0
where
-- All done
consume s at Empty !valid !acc
= if valid then convert acc s else Nothing
-- skip empty chunk
consume (Chunk (BI.BS _ 0) cs) !valid !acc
-- Recurse
= consume cs valid acc
-- process non-empty chunk
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used <= 0 -- No more digits present
-> if valid then convert acc' s else Nothing
| used < len -- valid input not entirely digits
-> let !c' = BU.unsafeDrop used c
in convert acc' $ Chunk c' cs
| otherwise -- try to read more digits
-- Recurse
-> consume cs True acc'
Overflow -> Nothing
Now _digits is the I/O loop I shared before, and the calling code gets
inlined into that recursive loop with various join points. But the loop
gets forked into multiple copies which are compiled separately, because
there are two different recursive calls into "consume" that got compiled
into separate "joinrec { ... }".
So I tried instead:
readInt !q !r =
\ !s -> consume s False 0
where
-- All done
consume s at Empty !valid !acc
= if valid then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
-> consume cs (valid || used > 0) acc'
| used > 0 -- valid input not entirely digits
-> let !c' = BU.unsafeDrop used c
in convert acc' $ Chunk c' cs
| otherwise -- No more digits present
-> if valid then convert acc' s else Nothing
Overflow -> Nothing
But was slightly surprised to find even more duplication (3 copies
instead of tw) of the I/O loop, because in the call:
consume cs (valid || used > 0) acc'
the boolean argument got floated out, giving:
case valid of
True -> consume cs True acc'
_ -> case used > 0 of
True -> consume cs True acc'
_ -> consume cs False acc'
and each of these then generates essentially the same code. To get the
code to be emitted just once, I had to switch from a Bool "valid" to a
bitwise "valid":
readInt !q !r =
\ !s -> consume s 0 0
where
-- All done
consume s at Empty !valid !acc
= if valid /= 0 then convert acc s else Nothing
consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
= case _digits q r c acc of
Result used acc'
| used == len -- try to read more digits
-- Recurse
-> consume cs (valid .|. used) acc'
| used > 0 -- valid input not entirely digits
-> let !c' = BU.unsafeDrop used c
in convert acc' $ Chunk c' cs
| otherwise -- No more digits present
-> if valid /= 0 then convert acc' s else Nothing
Overflow -> Nothing
Is there some way for GHC to figure out to not float out such cheap
computations? The 'Result' constructor is strict, so there's no cost to
evaluating `used > 0`, and cloning the entire computation is I think
the more unfortunate choice...
Adding redundant BangPatterns on `Result !used !acc'` didn't make a
difference. Switching to bitwise logical "or" finally produces just one
copy of the loop.
--
Viktor.
More information about the ghc-devs
mailing list