[Haskell-cafe] GHC optimisations

Philip Armstrong phil at kantaka.co.uk
Tue Aug 21 09:19:10 EDT 2007

On Tue, Aug 21, 2007 at 05:25:49AM -0700, Stefan O'Rear wrote:
>On Tue, Aug 21, 2007 at 01:14:20PM +0100, Rodrigo Queiro wrote:
>> On my system, the C version runs about 9x faster than the haskell
>> version (with -O3 and -O2 -fvia-c -optc-O3 respectively). However, GCC
>> seems to produce about 70 lines of assembly for the main loop,
>> compared to about 10 from GHC. I suspect the speed difference is the
>> result of some heavy optimisation by GCC, which would need to be
>> hand-tuned for GHC. (I would be interested to see what this would be.
>> Unfortunately I don't know x86 assembly well enough to understand the
>> GCC output.)

GCC is carrying out two major optimisations that ghc is missing here:
replacing the tail call with a jump directly into the function body
(having stuffed the correct arguments into the appropriate registers)
and unrolling the loop. That's pretty much it. Neither are what I'd
call 'heavy' optimisations.

>The fundamental problem is that GHC doesn't have enough registers to to
>a good job with Haskell.  Normal Haskell code  makes extensive use of
>the GHC stack for function calls, the C stack for signal handlers, the
>capability base pointer for thread state, and the heap for everything
>else.  Which doesn't really leave us in a good state for optimizing.  In
>particular, x86 ghc ALWAYS passes parameters on the stack, even for tail
>calls.  I didn't actually bother to check, but I'm pretty sure that's
>what the OP was noticing - if you look carefully it's not actually
>pushing or popping anything, just using stack memory.

Yes, absolutely.

>Situations are far better on x86_64 (16 registers) and ppc (32
>registers).  There is some work being done on the backend to improve
>this (in particular, a new and much better register allocator and a
>parameter-aware Cmm system).

<fires up ppc box>

Ouch. That's even worse:

$ ./sum 100000000

C version: 0.16s
Haskell  : 1.40s

Looking at the generated assembler, the ppc version has exactly the
same problem that the x86 version does. It carries out the
calculation, the stores the result in some memory locations and calls f
again so that the preamble to f can pull those same results out of the
memory locations in order to put them back into the same registers

(I'm using ghc 6.6.1 on Debian unstable btw for anyone following along.)

cheers, Phil

http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt

More information about the Haskell-Cafe mailing list