[GHC] #8326: Place heap checks common in case alternatives before the case
GHC
ghc-devs at haskell.org
Wed Sep 18 13:58:16 CEST 2013
#8326: Place heap checks common in case alternatives before the case
------------------------------------+-------------------------------------
Reporter: jstolarek | Owner:
Type: task | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.7
Keywords: | Operating System: Unknown/Multiple
Architecture: Unknown/Multiple | Type of failure: None/Unknown
Difficulty: Unknown | Test Case:
Blocked By: | Blocking: 8317
Related Tickets: #1498 |
------------------------------------+-------------------------------------
We would like to have functions that check whether an `Int#` is a valid
tag to represent `Bool` (see Note [Optimizing isTrue#] in ghc-prim):
{{{
isTrue# :: Int# -> Bool
isTrue# 1# = True
isTrue# _ = False
isFalse# :: Int# -> Bool
isFalse# 0# = True
isFalse# _ = False
}}}
We could use them with comparison primops like this:
{{{
f :: Int# -> Int
f x | isTrue# (x ># 0#) = I# x
| otherwise = -(I# x)
}}}
`isTrue#` is optimized away at the Core level:
{{{
A.f =
\ (x_aqM :: GHC.Prim.Int#) ->
case GHC.Prim.># x_aqM 0 of _ {
__DEFAULT -> GHC.Types.I# (GHC.Prim.negateInt# x_aqM);
1 -> GHC.Types.I# x_aqM
}
}}}
but the code genrator produces very bad Cmm code, because it pushes heap
checks into case alternatives:
{{{
{offset
cFd: // stack check
if ((Sp + -16) < SpLim) goto cFr; else goto cFs;
cFr: // not enough place on the stack, call GC
R2 = R2;
R1 = A.f_closure;
call (stg_gc_fun)(R2, R1) args: 8, res: 0, upd: 8;
cFs: // scrutinize (x ># 0#)
_sEU::I64 = R2;
_sEV::I64 = %MO_S_Gt_W64(R2, 0);
if (_sEV::I64 != 1) goto cFg; else goto cFo;
cFg: // False branch
Hp = Hp + 16;
if (Hp > HpLim) goto cFy; else goto cFx;
cFy: // not enough heap, call GC
HpAlloc = 16;
I64[Sp - 16] = cFf;
R1 = _sEV::I64;
I64[Sp - 8] = _sEU::I64;
Sp = Sp - 16;
call stg_gc_unbx_r1(R1) returns to cFf, args: 8, res: 8, upd:
8;
cFf: // re-do the False branch
_sEU::I64 = I64[Sp + 8];
Sp = Sp + 16;
_sEV::I64 = R1;
goto cFg;
cFx: // RHS of False branch
I64[Hp - 8] = GHC.Types.I#_con_info;
I64[Hp] = -_sEU::I64;
R1 = Hp - 7;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
cFo: // True branch
Hp = Hp + 16;
if (Hp > HpLim) goto cFv; else goto cFu;
cFv: // not enough heap, call GC
HpAlloc = 16;
I64[Sp - 16] = cFn;
R1 = _sEV::I64;
I64[Sp - 8] = _sEU::I64;
Sp = Sp - 16;
call stg_gc_unbx_r1(R1) returns to cFn, args: 8, res: 8, upd:
8;
cFn: // re-do the True branch
_sEU::I64 = I64[Sp + 8];
Sp = Sp + 16;
_sEV::I64 = R1;
goto cFo;
cFu: // RHS of True branch
I64[Hp - 8] = GHC.Types.I#_con_info;
I64[Hp] = _sEU::I64;
R1 = Hp - 7;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}
}}}
This results in average 2.5% increase in binary size. By contrast, if we
use `tagToEnum#` instead of `isTrue#` heap check will be placed before
`case` expression and the code will be significantly shorter (this is done
by a special case-on-bool optimization in the code generator - see #8317).
What we would like to do here is:
1. compile case alternatives without placing heap checks inside them
2. each compiled alternative should return amount of heap it needs to
allocate
3. code generator inspects amounts of heap needed by each alternative
and either adds heap checks in alternatives or puts a single check before
the case expression.
Getting this right might be a bit tricky.
1. if all branches allocate some heap then we can just put a common heap
check before the case. Note that we must allocate the higgest amount
required by any of the alternatives and then alternatives that use less
heap must retract the heap pointer accordingly.
2. if we have two alternatives, one of which allocates heap and the
other does not, we should place the heap check only in the alternative
that allocates the stack. This will solve #1498.
3. it is not clear to me what to do if we have combination of the above
(more than one branch that allocates heap and at least one branch that
does not). If we place heap check before the `case` expression we lose
optimization of recursive functions and face the problem described in
#1498. If we push heap checks into branches that allocate heap then we get
code duplication, i.e. the problem that we're addressing in this ticket. I
guess the only way to make correct decission here is to try different
aproaches and measure their performance.
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8326>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list