inside the GHC code generator

Fri Feb 24 04:52:36 EST 2006

Hi Bulat,

Wow!  You make a lot of good suggestions.  I think some of them are 
unfortunately unworkable, some others have been tried before, but there 
are definitely some to try.  I'll suggest a few more in this email.

First of all I should say that we don't *want* to use gcc as a code 
generator.  Everything we've been doing in the back end over the last 
few years has been aiming towards dropping the dependency on gcc and 
removing the Evil Mangler.  Now is not the time to be spending a lot of 
effort in beefing up the C back end :-)  Our preference is to work on 
both the native and C-- back ends; the C-- back end has the potential to 
produce the best code and be the most portable.  We also plan to keep 
the "unregisterised" C back end for the purposes of porting to a new 
platform, it's only the registerised path we want to lose.

Having said all that, if we can get better code out of the C back end 
without too much effort, then that's certainly worthwhile.

> translated to something like this
> 
>     factorial:
>         _s1BD = *(Sp + 0);
>         if (_s1BD != 1) goto c1C4;     // n!=1 ?
>         R1 = *(Sp + 4);
>         Sp = Sp + 8;
>         return (*(Sp + 0))();  // return from factorial
>     c1C4:
>         _s1BI = _s1BD * (*(Sp + 4));   // n*r
>         _s1BF = _s1BD - 1;             // n-1
>         *(Sp + 4) = _s1BI;
>         *(Sp + 0) = _s1BF;
>         goto factorial;

To be fair, this is only the translation on an architecture that has no 
argument registers (x86, and currently x86_64 but hopefully not for long).

The simple optimisation of changing the final tail call to factorial 
into a goto to a label at the beginning of the function (as suggested by 
John Meacham I think) should improve the code generated by gcc for 
functions like this, and is easy to do.

> 1) because it just not asked. you can enable gcc optimization by
> adding "-optc-O6" to the cmdline, but this leads to compilation errors
> on part of modules. it seems that gcc optimization is not compatible
> with "evil mangler" that is the ghc's own optimization trick.

-O3 works, I haven't tried -O6.

> * C++ calling stack should be used as much as possible
> * parameters are passed in C++ way: "factorial(int n, int r)"

This is unworkable, I'm afraid.  Go and read Simon's paper on the STG:

   http://citeseer.ist.psu.edu/peytonjones92implementing.html

there are two main reasons we don't use the C stack: garbage collection 
and tail calls.  What do you plan to do about GC?

> * recursive definitions translated into the for/while loops if possible

Certainly a good idea.

> * groups of mutually recursive definitions should be also "naturally"
> translated as much as possible
> * functions marked INLINE in Haskell code should be marked the same in
> C++ code

an inline function will have been inlined, so not sure what you mean here!

> * maximally speed up using of already evaluated values. instead of
> unconditional jumping to the closure-computation function just test
> this field and continue computation if it contains NULL (indicating
> that value is already in WHNF). This change will make time penalty of
> boxing data structures significantly, at least 10 times, smaller

This is called semi-tagging, it was tried a long time ago.  Certainly 
worth trying again, but I very much doubt the gains would be anything 
like you claim, and it increases code size.

There are other schemes, including this one that we particularly like: 
use a spare bit in a pointer to indicate "points to an evaluated value". 
  Since there are two spare bits, you can go further and use the 4 
values to mean:

   0: possibly unevaluated
   1: evaluated, tag 0
   2: evaluated, tag 1
   3: evaluated, tag 2

I believe Robert Ennals tried this as part of his spec eval 
implementation, but IIRC he didn't get separate measurements for it (or 
it didn't improve things much).  Note that this increases code size too.

> * in addition to ability to define strict fields, allow to define
> strict data structures (say, lists) and strict types/structures of
> functions parameters and results. that is a long-standing goal, but
> the "![!Int] -> !Int" function can be used inside higher-order code a
> lot faster than "[Int] -> Int" one. the ultimate goal is to allow
> Haskell to be as fast as ML dialects in every situation and this means
> that laziness should be optional in every place. this will also
> allow to make efficient emulation of C++ virtual methods

Strict types are interesting, and I know several people (including me) 
who have put some thought into how they would work.  It turns out to be 
surprisingly difficult, though.

You might look into doing something simple: suppose you could declare a 
type to be unlifted:

   data !T = ..

The type T doesn't have a bottom element.  Its kind is !, not *, and it 
can't be used to instantiate a polymorphic type variable, just like 
GHC's primitive types.

This is quite restrictive, but it's simple and pretty easy to implement 
in GHC as it stands.  It gives you a nice guarantee: expressions of type 
T are never thunks.  case expressions deconstructing T never need to 
evaluate it first.

The next generalisation would be to allow ! as an arbitrary type 
operator, but still with the kind restriction.  Dropping the kind 
restriction is where things start to get interesting...

Cheers,
	Simon