jhc vs ghc and the surprising result involving ghcgeneratedassembly.

Thu Oct 27 12:03:06 EDT 2005

On 27 October 2005 12:12, John Meacham wrote:

>> Note that GHC's back end is really aimed at producing good code when
>> there are registers available for passing arguments - this isn't
>> true on x86 or x86_64 at the moment, though.
> 
> Hrm? why are registers not available on x86_64? I thought it had a
> plethora. (compared to the i386)

mutter mutter... a bunch of the registers are reserved for argument
passing in the C calling convention, and when I tried to steal them I
ran into trouble around foreign calls.  It should/might be possible to
work around this, I need to have another go.  It works fine with the
NCG, of course.

> I was thinking something like the worker/wrapper split, ghc would
> recognize when a function takes only unboxed arguments and returns an
> unboxed result (these can probably be relaxed, no evals is the key
> thing)
> 
> so in the case of fac, it would create
> 
> int fac(int n, int r) {
>         if (n == 1) return 1;
>         return fac (n - 1,n*r);
> }
> 
> and (something like)
> 
> void fac_wrapper(void) {
> continuation = pop()   // I might be mixing up the order of these
> n = pop()
> r = pop()
> 
> x = fac(n,r)
> 
> push(x)
> jump(continuation)
> 
> }

Well yes, but if the worker needs to return to the scheduler (i.e. if it
does a heap check or stack check) then the C stack is all messed up and
we need a setjmp/longjmp to get back to the scheduler.  You can do it in
the case where there are no heap/stack checks, but I think that's very
rare.

> I am not sure how much sense this makes though. I am no expert on the
> spineless tagless G machine  (which would make an excellent name for a
> band BTW)

:-D

> fortunatly, modern CPUs anticipate this conondrum and provide
> 'write-combining' forms of their memory access functions, these will
> write a value directly to RAM without touching the cache at all. This
> will always be a win when updating thunks due to the reasons mentioned
> above and is potentially a big benefit. selective write-combining is
> in the top 3 performance enhancing things according to the cpu
> optimization manuals.
> 
> I think the easiest way to do this would be to have a MACRO defined to
> an appropriate bit of assembly or a simple C assignment if the
> write-combining mov's arn't available.

very good idea, I must try that.  Any more progress on why our x86_64
code is slow?

Cheers,
	Simon