jhc vs ghc and the surprising result involving ghc generatedassembly.

Wed Nov 2 05:42:50 EST 2005

On 01 November 2005 16:32, Florian Weimer wrote:

> * Simon Marlow:
> 
>> gcc started generating this rubbish around version 3.4, if I recall
>> correctly.  I've tried disabling various optimisations, but can't
>> seem to convince gcc not to generate the extra jump.  You don't get
>> this from the native code generator, BTW.
> 
> But the comparison is present in the C code.  What do you want GCC to
> do?

I didn't mean to sound overly critical of gcc.  But here's what I was
complaining about - the code generated by gcc (3.4.x) is as follows:

Main_zdwfac_info:
.text
	.align 8
	.text
	movq	(%rbp), %rdx
	cmpq	$1, %rdx
	jne	.L2
	movq	8(%rbp), %r13
	leaq	16(%rbp), %rbp
	movq	(%rbp), %rax
.L4:
	jmp	*%rax
.L2:
	movq	%rdx, %rax
	imulq	8(%rbp), %rax
	movq	%rax, 8(%rbp)
	leaq	-1(%rdx), %rax
	movq	%rax, (%rbp)
	movl	$Main_zdwfac_info, %eax
	jmp	.L4

there's an obvious simplification - the last two instructions should be
replaced by just 

      jmp   Main_zdwfac_info

eliminating one branch and a mov.  This occurs all over the place in our
code.  Whenever a function has more than one computed goto, gcc insists
on commoning up the jmp instructions even when it's a really bad idea,
like above.

Actually if I add -O2, then I get better code, so perhaps this isn't a
real problem.  Although gcc still generates this:

Fac_zdwfac_info:
.text
	.align 8
	movq	(%rbp), %rdx
	testq	%rdx, %rdx
	jne	.L3
	movq	8(%rbp), %r13
	addq	$16, %rbp
	movq	(%rbp), %rax
	jmp	*%rax
	.p2align 4,,7
.L3:
	movq	8(%rbp), %rax
	imulq	%rdx, %rax
	decq	%rdx
	movq	%rdx, (%rbp)
	movq	%rax, 8(%rbp)
	movl	$Fac_zdwfac_info, %eax
	jmp	*%rax

and fails to combine the movs with the jmp instruction (we do this
simplification ourselves when post-processing the assembly code).  I'll
compile up gcc 4 and see what happens with that.

Cheers,
	Simon