jhc vs ghc and the surprising result involving ghc generatedassembly.

Simon Marlow simonmar at microsoft.com
Thu Oct 27 05:38:20 EDT 2005


John, this is great stuff.  There's clearly a lot we can do to improve
GHC's code, at least for fac :-)  (I'd be really interested in any
numbers you have for larger programs too, eg. the nofib suite).

> and now the generated assembly.
> 
> Main_zdwfac_info:
> .text
> 	.align 8
> 	.text
> 	movq	(%rbp), %rdx
> 	cmpq	$1, %rdx
> 	jne	.L2
> 	movq	8(%rbp), %r13
> 	leaq	16(%rbp), %rbp
> 	movq	(%rbp), %rax
> .L4:
> 	jmp	*%rax

gcc started generating this rubbish around version 3.4, if I recall
correctly.  I've tried disabling various optimisations, but can't seem
to convince gcc not to generate the extra jump.  You don't get this from
the native code generator, BTW.

Actually, our NCG beats gcc in most cases on x86_64, I believe.

> .L2:
> 	movq	%rdx, %rax
> 	imulq	8(%rbp), %rax
> 	movq	%rax, 8(%rbp)
> 	leaq	-1(%rdx), %rax
> 	movq	%rax, (%rbp)
> 	movl	$Main_zdwfac_info, %eax
> 	jmp	.L4

[snip]

> A couple simple rules seem to help greatly.
> 
> * turn anything of the form JMP_((W_)&self) where self is oneself into
a goto
> that gotos a label at the beginning of the function.

Sounds like a good idea.  In fact, we should do all this as a C-- -> C--
optimisation, as Simon PJ pointed out (and I mentioned on IRC
yesterday).

one small caveat: our register allocator can't handle loops at the
moment, so any optimisations that generated actual loops would have to
be disabled for the NCG for now.

> * do simple pattern matthing on the basic blocks to recognize where C
control
> constructs can be placed.
> 
> for instance turn
> 
> if (x) { goto  y; }
> blah..
> baz..
> JMP_(foo)
> 
> into
> 
> if (x) { goto  y; } else {
> blah..
> baz..
> JMP_(foo)
> }

I don't actually understand why this helps, but it's easy enough to do.

> * getting stack dereferences out of your loops.

A bit more tricky, but could still be done as C-- to C--.

Note that GHC's back end is really aimed at producing good code when
there are registers available for passing arguments - this isn't true on
x86 or x86_64 at the moment, though.

> 1. fully convert it to use C control constructs, so gcc will do it for
us.
> (code motion and loop invarient being inhibited again by gotos)

Ok.  Jan's do-while idea seems like a cheap way to achieve this.
 
> These should be straightforward to implement in the C code generator.
it also
> suggests we might want to try to use the native C calling convention
on leaf
> nodes that deal with unboxed values (so we get register passing and
return
> values for free) or taking groups of mutually recursive functions and
turning
> them all into one function with explicit jumps between them.

Using the real C calling convention brings some problems - we'd have to
use setjmp/longjmp to pass control back to the scheduler or garbage
collector.  
 
Cheers,
	Simon


More information about the Glasgow-haskell-users mailing list