inside the GHC code generator

Mon Feb 27 05:41:27 EST 2006

Hello Simon,

Friday, February 24, 2006, 12:30:22 PM, you wrote:

SPJ> | GHC has great high-level optimization.

SPJ> However, let me strongly urge you *not* to focus attention primarily on
SPJ> the gcc route.  Compiling via C has received a lot of attention over the
SPJ> years, and there are many papers describing cool hacks for doing so.
SPJ> GHC does not do as well as it could.

btw, it's well known that Clean sometimes make better code than GHC
and sometimes vice versa. now I'm almost sure that GHC outperforms
Clean in high-level optimizations and give away on the low-level code
generation, so the grand result depends on that is more important for
particular algorithm. on the hand-coded programs Clean should almost
always outperform GHC

SPJ> But there are serious obstacles.  That's not gcc's fault -- it wasn't
SPJ> designed for this.  Accurate GC is one of them,

we ca use two stacks - for boxed and for unboxed values

SPJ> tail calls is another,

nowadays gcc optimize tail calls

SPJ> and there are plenty more smaller things that bite you only after you've
SPJ> invested a lot of time.  This way lies madness.

but nevertheless ghc already compiles to gcc and it's the fastest
backend, with help of Evil Mangler. afaik, code generated by ghc will
work even w/o Evil Mangler, so it's just an optimization hack. can you
please say what an optimizations done via EM? i think that part of
them can be implemented via changing C generation, so may be even we can
omit EM at long last

SPJ> C-- was *designed* for this purpose.  GHC uses C-- as its intermediate
SPJ> language (just before emitting C).  So a good route is this:

SPJ> * Write C-- to C-- optimisations

SPJ> * Then, if you like, translate that code to C.  Already you will be
SPJ> doing better than GHC does today, because the C-- to C-- optimiser will
SPJ> let you generate better C

SPJ> * But you can also emit C--, or native code; both of these paths will
SPJ> directly benefit from your C-- optimisations.

SPJ> The point is that none of this relies on Quick C--; you can always use
SPJ> an alternative back end.

yes, C-- was designed to solve all our problems - provide world-class
optimization, code generation for different cpus and so on. but does
it fulfil it's promises? no. and now you propose to write these
optimization as part of Haskell project. great idea! i've started from
the same idea and even wrote a sequence of optimizations what will
transform initial C-- code for "fac" to the efficient one. but then i
realized that the core problem is what ghc just generates low-level
code in "portable asm" style. and optimizing of low-level code that
tries to describe how to IMPLEMENT this algorithm instead of just
describe the ALGORITHM itself is a hard task. noone writes optimizers
for the ASM language, but you propose to do exact this.

instead of coping with current "portable asm" code generated from STG
i propose to generate idiomatic C or C-- code and leave its
optimization to the appropriate compiler. the SEPARATE problem is what
gcc/icc can generate much better code that qc, so i propose to direct
efforts of generating idiomatic C[--] toward C compilers

SPJ> You can almost do this today. GHC uses C-- as an intermediate language.
SPJ> But alas, GHC's code generator does not take advantage of C--'s native
SPJ> calls or parameter passing.  Instead, it pushes things on an auxiliary
SPJ> stack etc.  (Those Sp memory operations you see.)  This isn't necessary.
SPJ> We are planning to split GHC's code generator into two parts
SPJ>         A) Generate C-- with native calls, with an implicit C-- stack
SPJ>         B) Perform CPS conversion, to eliminate all calls in favour of
SPJ> jumps
SPJ>                 using an explicit stack
SPJ> The current code gen is A followed by B.  But A is a much more suitable
SPJ> optimisation platform, and gives more flexibility.

i propose to implement only A and leave B to the C[--] compiler
itself. that will require to return to 2-stacks model, but in return
we will get 1st-class optimization instead of making itself it's
rather restricted subset. instead of permanently trying to catch up C
compiler's optimization we can just jump to the common bandwagon :)

SPJ> Chris Thompson, and undergrad at Cambridge, is doing (B) as his
SPJ> undergrad project, although it remains to be seen whether he'll have
SPJ> enough complete to be usable.

SPJ> Another shortcoming is that the native code generator in GHC isn't
SPJ> capable of dealing with backward jumps to labels (because GHC hasn't
SPJ> needed that so far).  But if you did C-- optimisation, you'd probably
SPJ> generate such jumps.  It'd be great to beef up the native code gen to
SPJ> handle that.

i propose to add if/for/while statements to the CmmStmt datatype to
allow generation of higher-level code. as John Meacham wrote several
months ago, gcc can't unroll loops written with gotos. so it's anyway
better to generate explicit loops. it should be not too hard to
generate asm code for these constructs?

although anyway generation of high-level C code is not compatible with
native asm path, and from some moment this paths will diverge

SPJ> Many of the optimisations you describe (perhaps all) are readily
SPJ> expressible in the C-- intermediate language, and by working at that
SPJ> level you will be independent of with the back end is gcc, a native code
SPJ> generator, or Quick C--, or some other C-- compiler.  Much better.

most of my proposals is of course compatible with C-- path. all
proposed changes can be divided to the following parts:

1) on the "Haskell->STG" path, including more aggressive
inlining/specialization of functional values, polymorphic code and
boxed values

2) in the computation model, for example make quick check that box
already contains WHNF

3) generation of high-level C[--] code instead of current low-level
one

4) orientation to the gcc optimizations instead of waiting and
especially implementing our own ones

difference between C-- and gcc ways, imho, is that on the C-- way you
should themselves implement optimizations and that range of
optimizations you can implement in foreseeable future is much less
than already implemented in gcc. on the C way you need only change the
method of generating C code - and this should be much easier than
implementing even small subset of gcc optimizations and will
immediately give us, the ghc users, all the benefits of 1st class
optimization techniques available in gcc

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com