[GHC] #12150: Compile time performance degradation on code that uses undefined/error with CallStacks
GHC
ghc-devs at haskell.org
Sun Jul 31 18:28:48 UTC 2016
#12150: Compile time performance degradation on code that uses undefined/error with
CallStacks
-------------------------------------+-------------------------------------
Reporter: thomie | Owner:
Type: bug | Status: new
Priority: high | Milestone: 8.0.2
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #10844 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by gridaphobe):
So, although my no-inline patch is now producing better results for the
original CallStack-inlining issue, it doesn't seem to be doing much for
this issue. Upon closer inspection of the Core I noticed something quite
interesting. When we remove the CallStacks from `undefined`, GHC manages
to optimize away the entire set of guards! In stark contrast, leaving the
CallStacks gives us the following Core
{{{
-- RHS size: {terms: 4, types: 7, coercions: 0}
Test.$fFunctorResult_$cfmap [InlPrag=INLINE (sat-args=0)]
:: forall a_aFQ b_aFR.
(a_aFQ -> b_aFR) -> Result a_aFQ -> Result b_aFR
[GblId,
Str=b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False,
Guidance=ALWAYS_IF(arity=0,unsat_ok=False,boring_ok=False)
Tmpl= \ (@ a_aFS) (@ b_aFT) ->
let {
$dIP_s1QH :: GHC.Stack.Types.CallStack
[LclId]
$dIP_s1QH =
GHC.Stack.Types.pushCallStack
(Test.$fFunctorResult10, Test.$fFunctorResult8)
GHC.Stack.Types.emptyCallStack } in
let {
bool1_aqS :: forall a1_a1Dw. a1_a1Dw
[LclId]
bool1_aqS =
\ (@ a1_a1Dw) ->
undefined
@ 'GHC.Types.PtrRepLifted
@ a1_a1Dw
($dIP_s1QH
`cast` (Sym
(GHC.Classes.N:IP[0]
<"callStack">_N
<GHC.Stack.Types.CallStack>_N)
:: (GHC.Stack.Types.CallStack :: *)
~R#
((?callStack::GHC.Stack.Types.CallStack) :: Constraint))) } in
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
False ->
case bool1_aqS @ Bool of {
...]
Test.$fFunctorResult_$cfmap =
\ (@ a_aFS) (@ b_aFT) -> case bool1_aqS of wild_00 { }
}}}
So GHC is optimizing the actual definition, but not the unfolding, whereas
with the old `undefined` it also optimized the unfolding.
I also noticed that I can reproduce both behaviors with a simple wrapper
around `undefined`, by choosing whether the wrapper should take a
CallStack.
{{{
-- myUndef :: HasCallStack => a
-- myUndef :: a
myUndef = undefined
}}}
The HasCallStack variant is slow, but the CallStack-free variant behaves
just like the old undefined. I suppose this is to be expected, but it
seems to suggest that the issue is connected to the fact that a use of
`undefined` is now really an application `undefined $call_stack` rather
than a simple variable. Indeed, if I make one final tweak to the program
and use the **old `error`** instead of `undefined`, I get a similar
behavior to the **new** `undefined`.
{{{
N clauses : time (s)
10 : 0.44
20 : 0.48
40 : 0.54
80 : 0.76
160 : 1.13
320 : 1.93
640 : 3.52
1280 : 6.97
}}}
I'm not really sure where to go from here. It seems like the solution
would be to convince GHC to rewrite the unfolding for the new undefined
(or error for that matter), but rewriting the unfolding is not something
we're supposed to do, right? Which makes me wonder, why was GHC rewriting
the unfolding for the old undefined in the first place?
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12150#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list