[GHC] #16040: Unboxing-Related Performance Issue with Polymorphic Functions

GHC ghc-devs at haskell.org
Thu Dec 13 09:02:15 UTC 2018


#16040: Unboxing-Related Performance Issue with Polymorphic Functions
-------------------------------------+-------------------------------------
        Reporter:  _recursion        |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.6.2
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
 Type of failure:  Runtime           |  Unknown/Multiple
  performance bug                    |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by simonpj):

 Interesting.  For `test1` (the fast one) we get
 {{{
 Rec {
 $wgo_r2xK :: GHC.Prim.Int# -> GHC.Prim.Int#
 $wgo_r2xK = \ (ww_s2w4 :: GHC.Prim.Int#) ->
             case GHC.Prim.># ww_s2w4 0# of {
               __DEFAULT -> ww_s2w4;
               1# -> $wgo_r2xK (GHC.Prim.-# ww_s2w4 1#) }
 end Rec }

 T16040.$wtest1 :: GHC.Prim.Int# -> GHC.Prim.Int#
 T16040.$wtest1 = \ (ww_s2wd :: GHC.Prim.Int#) -> $wgo_r2xK ww_s2wd

 test1 :: Int -> Int
 test1 = \ (w_s2wa :: Int) ->
         case w_s2wa of { GHC.Types.I# ww1_s2wd ->
         case T16040.$wtest1 ww1_s2wd of ww2_s2wh { __DEFAULT ->
         GHC.Types.I# ww2_s2wh  } }
 }}}
 Notice that loop `$wgo` does not allocation at all.

 For `test2` we get
 {{{
 Rec {
 $wgo1_r2xL :: GHC.Prim.Int# -> (# Int #)
 $wgo1_r2xL = \ (ww_s2wm :: GHC.Prim.Int#) ->
              case GHC.Prim.># ww_s2wm 0# of {
                __DEFAULT -> (# GHC.Types.I# ww_s2wm #);
                1#        -> $wgo1_r2xL (GHC.Prim.-# ww_s2wm 1#) }
 end Rec }

 T16040.$wtest2 :: GHC.Prim.Int# -> Int
 T16040.$wtest2 = \ (ww_s2wv :: GHC.Prim.Int#) ->
                  case $wgo1_r2xL ww_s2wv of { (# ww2_s2wy #) -> ww2_s2wy }

 test2 :: Int -> Int
 test2 = \ (w_s2ws :: Int) ->
         case w_s2ws of { GHC.Types.I# ww1_s2wv -> T16040.$wtest2 ww1_s2wv
 }
 }}}
 Here the `$wgo1` loop does no allocation on its hot path, but does
 allocate an
 `I# ww` box as it returns.

 I think that `$wgo1` is doing a heap-check on ''every'' iteration, at the
 start
 of the function.  It would be better to do the check only on the ''last''
 iteration,
 in the `DEFAULT` branch. I bet these redundant heap checks are what is
 taking
 the time.

 We have long-standing tickets about this; see the summary in comment:2 of
 Trac #14791.

 I would love someone to work on this.  To catch cases like this should
 not be very hard.  By "like this" I mean

 * A primitive case
 * Which has no allocation "upstream" (i.e. before it)
 * And at least one alternative does no allocation.

 ----------
 Matthew is right that Nested CPR would also fix this. The trouble is
 that the recursive `go` from `test2` returns an `X Int`, whose
 representation has two levels of box; eg `X (I# 3#)`.  The current CPR
 transform can't optimise that.  Nested-CPR can, and would make ''neither''
 branch of `$wgo2` allocate

 Tantalizingly, we have a nested-cpr patch nearly ready to go. (See Trac
 #1600.)
 But it needs someone to pay it some sustained attention.


 ----------
 I don't see an obvious band-aid.  This is a long-standing issue, not a
 regression.

-- 
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/16040#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler


More information about the ghc-tickets mailing list