[GHC] #16040: Unboxing-Related Performance Issue with Polymorphic Functions
GHC
ghc-devs at haskell.org
Thu Dec 13 09:02:15 UTC 2018
#16040: Unboxing-Related Performance Issue with Polymorphic Functions
-------------------------------------+-------------------------------------
Reporter: _recursion | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.6.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
Interesting. For `test1` (the fast one) we get
{{{
Rec {
$wgo_r2xK :: GHC.Prim.Int# -> GHC.Prim.Int#
$wgo_r2xK = \ (ww_s2w4 :: GHC.Prim.Int#) ->
case GHC.Prim.># ww_s2w4 0# of {
__DEFAULT -> ww_s2w4;
1# -> $wgo_r2xK (GHC.Prim.-# ww_s2w4 1#) }
end Rec }
T16040.$wtest1 :: GHC.Prim.Int# -> GHC.Prim.Int#
T16040.$wtest1 = \ (ww_s2wd :: GHC.Prim.Int#) -> $wgo_r2xK ww_s2wd
test1 :: Int -> Int
test1 = \ (w_s2wa :: Int) ->
case w_s2wa of { GHC.Types.I# ww1_s2wd ->
case T16040.$wtest1 ww1_s2wd of ww2_s2wh { __DEFAULT ->
GHC.Types.I# ww2_s2wh } }
}}}
Notice that loop `$wgo` does not allocation at all.
For `test2` we get
{{{
Rec {
$wgo1_r2xL :: GHC.Prim.Int# -> (# Int #)
$wgo1_r2xL = \ (ww_s2wm :: GHC.Prim.Int#) ->
case GHC.Prim.># ww_s2wm 0# of {
__DEFAULT -> (# GHC.Types.I# ww_s2wm #);
1# -> $wgo1_r2xL (GHC.Prim.-# ww_s2wm 1#) }
end Rec }
T16040.$wtest2 :: GHC.Prim.Int# -> Int
T16040.$wtest2 = \ (ww_s2wv :: GHC.Prim.Int#) ->
case $wgo1_r2xL ww_s2wv of { (# ww2_s2wy #) -> ww2_s2wy }
test2 :: Int -> Int
test2 = \ (w_s2ws :: Int) ->
case w_s2ws of { GHC.Types.I# ww1_s2wv -> T16040.$wtest2 ww1_s2wv
}
}}}
Here the `$wgo1` loop does no allocation on its hot path, but does
allocate an
`I# ww` box as it returns.
I think that `$wgo1` is doing a heap-check on ''every'' iteration, at the
start
of the function. It would be better to do the check only on the ''last''
iteration,
in the `DEFAULT` branch. I bet these redundant heap checks are what is
taking
the time.
We have long-standing tickets about this; see the summary in comment:2 of
Trac #14791.
I would love someone to work on this. To catch cases like this should
not be very hard. By "like this" I mean
* A primitive case
* Which has no allocation "upstream" (i.e. before it)
* And at least one alternative does no allocation.
----------
Matthew is right that Nested CPR would also fix this. The trouble is
that the recursive `go` from `test2` returns an `X Int`, whose
representation has two levels of box; eg `X (I# 3#)`. The current CPR
transform can't optimise that. Nested-CPR can, and would make ''neither''
branch of `$wgo2` allocate
Tantalizingly, we have a nested-cpr patch nearly ready to go. (See Trac
#1600.)
But it needs someone to pay it some sustained attention.
----------
I don't see an obvious band-aid. This is a long-standing issue, not a
regression.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/16040#comment:5>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list