[GHC] #14372: CMM contains a bunch of tail-merging opportunities
GHC
ghc-devs at haskell.org
Thu Jan 25 11:35:35 UTC 2018
#14372: CMM contains a bunch of tail-merging opportunities
-------------------------------------+-------------------------------------
Reporter: heisenbug | Owner: (none)
Type: feature request | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.2.1
Resolution: | Keywords: CodeGen
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by simonpj):
* keywords: => CodeGen
Old description:
> Compile this code to CMM
> {{{#!hs
> data Small = S1 | S2 | S3 | S4 deriving (Show, Enum)
>
> data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving
> (Show, Enum)
>
> {-# NOINLINE quux #-}
> quux B1 = 'a'
> quux B2 = 'b'
> quux B3 = 'c'
> quux B4 = 'd'
> quux B5 = 'e'
> quux B6 = 'f'
> quux B7 = 'g'
> quux B8 = 'h'
> quux B9 = 'i'
> quux B10 = 'j'
>
> {-# NOINLINE qaax #-}
> qaax B1 = 'a'
> qaax B2 = 'b'
> qaax B3 = 'c'
> qaax B4 = 'd'
> qaax B5 = 'e'
>
> qaax B7 = 'g'
> qaax B8 = 'h'
> qaax B9 = 'i'
> qaax B10 = 'j'
>
> {-# NOINLINE foo #-}
> foo B1 = S1
> foo B2 = S2
> foo B3 = S3
> foo B4 = S4
>
> {-# NOINLINE bar #-}
> bar S1 = B1
> bar S2 = B2
> bar S3 = B3
> bar S4 = B4
>
> main = do print $ take 100000 (repeat (foo <$> [B1 .. B4]))
> print $ take 100000 (repeat (bar <$> [S1 .. S4]))
> print $ take 100000 (repeat (quux <$> [B1 .. B10]))
> print $ qaax B1
> }}}
>
> When `Char` or ''enum-like'' ADT is returned, I see lots of case
> branches, which only differ in the first instruction.
>
> E.g.
> {{{
> c30l: // global
> R1 = stg_CHARLIKE_closure+1649;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30m: // global
> R1 = stg_CHARLIKE_closure+1665;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> u30Z: // global
> if (_c30p::I64 < 9) goto c30n; else goto c30o;
> c30n: // global
> R1 = stg_CHARLIKE_closure+1681;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30o: // global
> R1 = stg_CHARLIKE_closure+1697;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> }}}
>
> It would be nice to factor out the common tails, e.g. by branching to the
> first tail already emitted.
>
> Bonus points for rewriting switch tables to contain the above numbers and
> compile to a lookup + common code.
>
> This is what I am talking about:
>
> {{{
> c307: // global
> _s2ON::P64 = R1;
> _c30j::P64 = _s2ON::P64 & 7;
> switch [1 .. 7] _c30j::P64 {
> case 1 : goto c30d;
> case 2 : goto c30e;
> case 3 : goto c30f;
> case 4 : goto c30g;
> case 5 : goto c30h;
> ...
> }
>
> ...
> c30h: // global
> R1 = stg_CHARLIKE_closure+1617;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30g: // global
> R1 = stg_CHARLIKE_closure+1601;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30f: // global
> R1 = stg_CHARLIKE_closure+1585;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30e: // global
> R1 = stg_CHARLIKE_closure+1569;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> c30d: // global
> R1 = stg_CHARLIKE_closure+1553;
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> }}}
>
> There should be an array [1553, 1569, 1585, ...]
> and each case should be the same:
> {{{
> R1 = stg_CHARLIKE_closure;
> R1 = R1 + array[tag];
> Sp = Sp + 8;
> call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
> }}}
New description:
Compile this code to CMM
{{{#!hs
data Small = S1 | S2 | S3 | S4 deriving (Show, Enum)
data Big = B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | B9 | B10 deriving
(Show, Enum)
{-# NOINLINE quux #-}
quux B1 = 'a'
quux B2 = 'b'
quux B3 = 'c'
quux B4 = 'd'
quux B5 = 'e'
quux B6 = 'f'
quux B7 = 'g'
quux B8 = 'h'
quux B9 = 'i'
quux B10 = 'j'
{-# NOINLINE qaax #-}
qaax B1 = 'a'
qaax B2 = 'b'
qaax B3 = 'c'
qaax B4 = 'd'
qaax B5 = 'e'
qaax B7 = 'g'
qaax B8 = 'h'
qaax B9 = 'i'
qaax B10 = 'j'
{-# NOINLINE foo #-}
foo B1 = S1
foo B2 = S2
foo B3 = S3
foo B4 = S4
{-# NOINLINE bar #-}
bar S1 = B1
bar S2 = B2
bar S3 = B3
bar S4 = B4
main = do print $ take 100000 (repeat (foo <$> [B1 .. B4]))
print $ take 100000 (repeat (bar <$> [S1 .. S4]))
print $ take 100000 (repeat (quux <$> [B1 .. B10]))
print $ qaax B1
}}}
When `Char` or ''enum-like'' ADT is returned, I see lots of case branches,
which only differ in the first instruction.
E.g.
{{{
c30l: // global
R1 = stg_CHARLIKE_closure+1649;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30m: // global
R1 = stg_CHARLIKE_closure+1665;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
u30Z: // global
if (_c30p::I64 < 9) goto c30n; else goto c30o;
c30n: // global
R1 = stg_CHARLIKE_closure+1681;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30o: // global
R1 = stg_CHARLIKE_closure+1697;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}}}
It would be nice to factor out the common tails, e.g. by branching to the
first tail already emitted.
Bonus points for rewriting switch tables to contain the above numbers and
compile to a lookup + common code.
This is what I am talking about:
{{{
c307: // global
_s2ON::P64 = R1;
_c30j::P64 = _s2ON::P64 & 7;
switch [1 .. 7] _c30j::P64 {
case 1 : goto c30d;
case 2 : goto c30e;
case 3 : goto c30f;
case 4 : goto c30g;
case 5 : goto c30h;
...
}
...
c30h: // global
R1 = stg_CHARLIKE_closure+1617;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30g: // global
R1 = stg_CHARLIKE_closure+1601;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30f: // global
R1 = stg_CHARLIKE_closure+1585;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30e: // global
R1 = stg_CHARLIKE_closure+1569;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
c30d: // global
R1 = stg_CHARLIKE_closure+1553;
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}}}
There should be an array [1553, 1569, 1585, ...]
and each case should be the same:
{{{
R1 = stg_CHARLIKE_closure;
R1 = R1 + array[tag];
Sp = Sp + 8;
call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
}}}
See also #14666
--
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/14372#comment:20>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list