[Take 2] Unexpected duplicate join points in "Core" output?

Andreas Klebinger klebinger.andreas at gmx.at
Sat Nov 20 20:47:36 UTC 2021


At this point I think it would be good if you could put your problem
into a ghc-ticket.

I can't look in detail into this in greater detail atm because of time
constraints.
And without a ticket it's likely to fall by the wayside eventually.
But it does seem like something where we maybe could do better.

And having good examples for the problematic behaviour is always
immensely helpful
to solve these kinds of problems.

Cheers
Andreas

Am 20/11/2021 um 19:54 schrieb Viktor Dukhovni:
> On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:
>
>> For the assembly I opened a ticket:
>> https://gitlab.haskell.org/ghc/ghc/-/issues/20714
> Thanks, much appreciated.  Understood re redundant join points, though
> in the non-toy context the redundnat point code is noticeably larger.
>
>                  join {
>                    exit4
>                      :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString)
>                    exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld)
>                      = case touch# dt1 ipv of { __DEFAULT ->
>                        let {
>                          dt3 :: Int#
>                          dt3 = minusAddr# ww4 dt } in
>                        case ==# dt3 dt2 of {
>                          __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
>                          1# -> jump $wconsume cs (orI# ww2 dt3) ww5
>                        }
>                        } } in
>                  join {
>                    exit5
>                      :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString)
>                    exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld)
>                      = case touch# dt1 w1 of { __DEFAULT ->
>                        let {
>                          dt3 :: Int#
>                          dt3 = minusAddr# ww4 dt } in
>                        case ==# dt3 dt2 of {
>                          __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5;
>                          1# -> jump $wconsume cs (orI# ww2 dt3) ww5
>                        }
>                        } } in
>
> FWIW, these don't appear to be deduplicated, both result from the same
> conditional: `acc < q || acc == q && d < 5`.  I need some way to make
> this compute a single boolean value without forking the continuation.
>
> There's a another source of code bloat that I'd like to run by you...
> In the WIP code for Lazy ByteString 'readInt', I started with:
>
>        readInt !q !r =
>          \ !s -> consume s False 0
>        where
>          -- All done
>          consume s at Empty !valid !acc
>              = if valid then convert acc s else Nothing
>          -- skip empty chunk
>          consume (Chunk (BI.BS _ 0) cs) !valid !acc
> -- Recurse
>              = consume cs valid acc
>          -- process non-empty chunk
>          consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
>              = case _digits q r c acc of
>                  Result used acc'
>                      | used <= 0 -- No more digits present
>                        -> if valid then convert acc' s else Nothing
>                      | used < len -- valid input not entirely digits
>                        -> let !c' = BU.unsafeDrop used c
>                            in convert acc' $ Chunk c' cs
>                      | otherwise -- try to read more digits
> -- Recurse
>                        -> consume cs True acc'
>                  Overflow -> Nothing
>
> Now _digits is the I/O loop I shared before, and the calling code gets
> inlined into that recursive loop with various join points.  But the loop
> gets forked into multiple copies which are compiled separately, because
> there are two different recursive calls into "consume" that got compiled
> into separate "joinrec { ... }".
>
> So I tried instead:
>
>        readInt !q !r =
>          \ !s -> consume s False 0
>        where
>          -- All done
>          consume s at Empty !valid !acc
>              = if valid then convert acc s else Nothing
>          consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
>              = case _digits q r c acc of
>                  Result used acc'
>                      | used == len -- try to read more digits
> -- Recurse
>                        -> consume cs (valid || used > 0) acc'
>                      | used > 0 -- valid input not entirely digits
>                        -> let !c' = BU.unsafeDrop used c
>                            in convert acc' $ Chunk c' cs
>                      | otherwise -- No more digits present
>                        -> if valid then convert acc' s else Nothing
>                  Overflow -> Nothing
>
> But was slightly surprised to find even more duplication (3 copies
> instead of tw) of the I/O loop, because in the call:
>
>      consume cs (valid || used > 0) acc'
>
> the boolean argument got floated out, giving:
>
>      case valid of
>          True -> consume cs True acc'
>          _ -> case used > 0 of
>              True -> consume cs True acc'
>              _    -> consume cs False acc'
>
> and each of these then generates essentially the same code.  To get the
> code to be emitted just once, I had to switch from a Bool "valid" to a
> bitwise "valid":
>
>        readInt !q !r =
>          \ !s -> consume s 0 0
>        where
>          -- All done
>          consume s at Empty !valid !acc
>              = if valid /= 0 then convert acc s else Nothing
>          consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc
>              = case _digits q r c acc of
>                  Result used acc'
>                      | used == len -- try to read more digits
> -- Recurse
>                        -> consume cs (valid .|. used) acc'
>                      | used > 0 -- valid input not entirely digits
>                        -> let !c' = BU.unsafeDrop used c
>                            in convert acc' $ Chunk c' cs
>                      | otherwise -- No more digits present
>                        -> if valid /= 0 then convert acc' s else Nothing
>                  Overflow -> Nothing
>
> Is there some way for GHC to figure out to not float out such cheap
> computations?  The 'Result' constructor is strict, so there's no cost to
> evaluating `used > 0`, and cloning the entire computation is I think
> the more unfortunate choice...
>
> Adding redundant BangPatterns on `Result !used !acc'` didn't make a
> difference.  Switching to bitwise logical "or" finally produces just one
> copy of the loop.
>


More information about the ghc-devs mailing list