inside the GHC code generator
simonmarhaskell at gmail.com
Fri Feb 24 04:52:36 EST 2006
Wow! You make a lot of good suggestions. I think some of them are
unfortunately unworkable, some others have been tried before, but there
are definitely some to try. I'll suggest a few more in this email.
First of all I should say that we don't *want* to use gcc as a code
generator. Everything we've been doing in the back end over the last
few years has been aiming towards dropping the dependency on gcc and
removing the Evil Mangler. Now is not the time to be spending a lot of
effort in beefing up the C back end :-) Our preference is to work on
both the native and C-- back ends; the C-- back end has the potential to
produce the best code and be the most portable. We also plan to keep
the "unregisterised" C back end for the purposes of porting to a new
platform, it's only the registerised path we want to lose.
Having said all that, if we can get better code out of the C back end
without too much effort, then that's certainly worthwhile.
> translated to something like this
> _s1BD = *(Sp + 0);
> if (_s1BD != 1) goto c1C4; // n!=1 ?
> R1 = *(Sp + 4);
> Sp = Sp + 8;
> return (*(Sp + 0))(); // return from factorial
> _s1BI = _s1BD * (*(Sp + 4)); // n*r
> _s1BF = _s1BD - 1; // n-1
> *(Sp + 4) = _s1BI;
> *(Sp + 0) = _s1BF;
> goto factorial;
To be fair, this is only the translation on an architecture that has no
argument registers (x86, and currently x86_64 but hopefully not for long).
The simple optimisation of changing the final tail call to factorial
into a goto to a label at the beginning of the function (as suggested by
John Meacham I think) should improve the code generated by gcc for
functions like this, and is easy to do.
> 1) because it just not asked. you can enable gcc optimization by
> adding "-optc-O6" to the cmdline, but this leads to compilation errors
> on part of modules. it seems that gcc optimization is not compatible
> with "evil mangler" that is the ghc's own optimization trick.
-O3 works, I haven't tried -O6.
> * C++ calling stack should be used as much as possible
> * parameters are passed in C++ way: "factorial(int n, int r)"
This is unworkable, I'm afraid. Go and read Simon's paper on the STG:
there are two main reasons we don't use the C stack: garbage collection
and tail calls. What do you plan to do about GC?
> * recursive definitions translated into the for/while loops if possible
Certainly a good idea.
> * groups of mutually recursive definitions should be also "naturally"
> translated as much as possible
> * functions marked INLINE in Haskell code should be marked the same in
> C++ code
an inline function will have been inlined, so not sure what you mean here!
> * maximally speed up using of already evaluated values. instead of
> unconditional jumping to the closure-computation function just test
> this field and continue computation if it contains NULL (indicating
> that value is already in WHNF). This change will make time penalty of
> boxing data structures significantly, at least 10 times, smaller
This is called semi-tagging, it was tried a long time ago. Certainly
worth trying again, but I very much doubt the gains would be anything
like you claim, and it increases code size.
There are other schemes, including this one that we particularly like:
use a spare bit in a pointer to indicate "points to an evaluated value".
Since there are two spare bits, you can go further and use the 4
values to mean:
0: possibly unevaluated
1: evaluated, tag 0
2: evaluated, tag 1
3: evaluated, tag 2
I believe Robert Ennals tried this as part of his spec eval
implementation, but IIRC he didn't get separate measurements for it (or
it didn't improve things much). Note that this increases code size too.
> * in addition to ability to define strict fields, allow to define
> strict data structures (say, lists) and strict types/structures of
> functions parameters and results. that is a long-standing goal, but
> the "![!Int] -> !Int" function can be used inside higher-order code a
> lot faster than "[Int] -> Int" one. the ultimate goal is to allow
> Haskell to be as fast as ML dialects in every situation and this means
> that laziness should be optional in every place. this will also
> allow to make efficient emulation of C++ virtual methods
Strict types are interesting, and I know several people (including me)
who have put some thought into how they would work. It turns out to be
surprisingly difficult, though.
You might look into doing something simple: suppose you could declare a
type to be unlifted:
data !T = ..
The type T doesn't have a bottom element. Its kind is !, not *, and it
can't be used to instantiate a polymorphic type variable, just like
GHC's primitive types.
This is quite restrictive, but it's simple and pretty easy to implement
in GHC as it stands. It gives you a nice guarantee: expressions of type
T are never thunks. case expressions deconstructing T never need to
evaluate it first.
The next generalisation would be to allow ! as an arbitrary type
operator, but still with the kind restriction. Dropping the kind
restriction is where things start to get interesting...
More information about the Glasgow-haskell-users