[GHC] #15560: Full laziness destroys opportunities for join points

Mon Aug 27 10:38:21 UTC 2018

#15560: Full laziness destroys opportunities for join points
-------------------------------------+-------------------------------------
        Reporter:  AndreasK          |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.3
  (CodeGen)                          |
      Resolution:                    |             Keywords:  JoinPoints
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:  #14287 #13286     |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------
Changes (by AndreasK):

 * related:  #14287 => #14287 #13286

Comment:

 It seems this was changed deliberately in #13286.

 The ticket does mention examples where join points become top level
 functions as having improved but doesn't contain any performance metrics
 to judge the impact.

 I can see how it might be beneficial for exported functions, but I'm not
 yet convinced that this is better in general.

 I've also looked at the core output of the two testsuite files and at
 least at O2 there seems to be the same amount of floating happening with
 8.0.2 and 8.4.

 I also don't think we will get much benefit out of the original example.
 Looking at the floated code:

 {{{
 $wg_s6x5 [InlPrag=[0], Occ=LoopBreaker]
   :: Int# -> Int# -> Int# -> Int#
 $wg_s6x5 (ww5_s6wV :: Int#) (ww6_s6wZ :: Int#) (ww7_s6x3 :: Int#) =
   case remInt# ww6_s6wZ 2# of {
     __DEFAULT ->
       case ww6_s6wZ of wild5_Xen {
         __DEFAULT ->
           jump $wg_s6x5
             (*# ww5_s6wV ww5_s6wV)
             (quotInt# (-# wild5_Xen 1#) 2#)
             (*# ww5_s6wV ww7_s6x3);
         1# -> *# ww5_s6wV ww7_s6x3
       };
     0# ->
       $wg_s6x5 (*# ww5_s6wV ww5_s6wV) (quotInt# ww6_s6wZ 2#) ww7_s6x3

 $wf_s6xg [InlPrag=[0], Occ=LoopBreaker] :: Int# -> Int# -> Int#
 $wf_s6xg (ww3_X6Hi :: Int#) (ww4_s6xe :: Int#) =
   case remInt# ww4_s6xe 2# of {
     __DEFAULT ->
       case ww4_s6xe of wild3_Xe3 {
         __DEFAULT ->
           $wg_s6x5
             (*# ww3_X6Hi ww3_X6Hi) (quotInt# (-# wild3_Xe3 1#) 2#)
 ww3_X6Hi;
         1# -> ww3_X6Hi
       };
     0# -> $wf_s6xg (*# ww3_X6Hi ww3_X6Hi) (quotInt# ww4_s6xe 2#)

 GHC.Real.$w$s^1 [InlPrag=[0]] :: Int -> Int# -> Int#
 GHC.Real.$w$s^1 =
   \ (w_s6xh :: Int) (ww_s6xl :: Int#) ->
     case tagToEnum# @ Bool (<# ww_s6xl 0#) of {
       False ->
         case ww_s6xl of wild1_XdK {
           __DEFAULT ->
             case w_s6xh of { I# ww2_s6xa -> $wf_s6xg ww2_s6xa wild1_XdK
 };
           0# -> 1#
         };
       True -> case GHC.Real.^2 of wild1_00 { }
     }
 }}}

 * For exponent < 0 we throw an exception so we can probably ignore that
 case when it comes to performance.
 * For exponent == 0 there is an advantage IF `GHC.Real.$w$s^1` get's
 inlined. But an exponent of zero seems like an unlikely case to me.
 * For exponents > 0 it depends heavily on how much get's inlined.
   * If all get inlined (essentially undoing the floating) we save nothing
 as the unfloated variant would have been inlined as well.
   * If we inline `GHC.Real.$w$s^1` and `$wf_s6xg` we save at best one call
 overhead if we don't jump into `wg_s6x5`, otherwise we save nothing.
   * If nothing get's inlined we are worse off as we now have at least as
 much overhead if not more should we jump into the floated functions.

 On the bright side the floated bindings work only on unboxed int's so
 might not cause an additional stack check.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/15560#comment:10>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler