factorial: let's get ahead of jhc! :)
Simon Marlow
simonmarhaskell at gmail.com
Fri Feb 24 11:18:30 EST 2006
Bulat Ziganshin wrote:
> i propose to do for start rather small change that will allow to
> make things go several times faster and in particular outperform jhc
> for the small "leaf" loops (i.e. loops that use only GHC primitives
> and don't call other functions). that include factorial, "a[]+=b[i]"
> loop that was discussed here several weeks ago, some shootout
> examples, my own arrays serialization code and so on, so on
>
> the idea is that the following STG code
>
> f :: Int# -> Double# -> ... -> Int#
>
> (i.e. function use only unboxed values/results)
>
> f x y z = code1 in
> case expr1 of
> A -> codeA
> B -> codeB
> C -> codeC in f x_c y_c z_c
> D -> codeD in f x_d y_d z_d
> _ -> code0 in f x0 y0 z0
>
> can be compiled to the following C/C-- code:
>
> f() {
> x = sp[0];
> y = sp[4];
> z = sp[8];
> sp+=12;
>
> code1;
> while (expr1!=A) {
> if (expr1==B) then return codeB;
> else if (expr1==C) then {x=x_c; y=y_c; z=z_c;}
> else if (expr1==D) then {x=x_d; y=y_d; z=z_d;}
> else {x=x0; y=y0; z=z0;}
> code1;
> }
> return codeA;
> }
>
> this compilation don't require any changes in GHC memory model. all we
> need:
>
> 1) add for/if/while to the C-- statement types list (data CmmStmt)
Please don't extend the C-- data type - it's very small and simple
because that makes it easy to manipulate and reason about.
I don't see why you need to, either: you can already express for, if and
while in C-- using conditional branches. I don't think gcc cares
whether you write your code using labels and goto or while/for/if, it
generates the same code either way.
> 2) implement recognizer for such simple STG functions (leaf unboxed
> procedures recognizer)
>
> 3) implement the translation itself
By all means try this. What you want is to compile recursive functions
like this:
f() {
x = arg1;
y = arg2;
L:
... body of f, with args mapped to x & y,
and recursive calls jumping to L passing args in x & y.
}
It's quite hard to do this as a C-- to C-- optimisation, at least
without implementing a lot of other optimisations that we don't already
have. I was hoping that gcc would do it for us, if we compile code like
this:
f() {
L:
... body of f ...
goto L;
}
but sadly it doesn't do the whole job. (see cmmLoopifyForC in
cmm/CmmOpt.hs, which I added today).
So you might try the hacky way of doing this transformation at a higher
level, when generating the C-- in the first place. Putting the args in
temporaries is the easy bit; generating different code for the recursive
call will be more tricky.
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list