jhc vs ghc and the surprising result involving
ghcgeneratedassembly.
Simon Marlow
simonmar at microsoft.com
Thu Oct 27 12:03:06 EDT 2005
On 27 October 2005 12:12, John Meacham wrote:
>> Note that GHC's back end is really aimed at producing good code when
>> there are registers available for passing arguments - this isn't
>> true on x86 or x86_64 at the moment, though.
>
> Hrm? why are registers not available on x86_64? I thought it had a
> plethora. (compared to the i386)
mutter mutter... a bunch of the registers are reserved for argument
passing in the C calling convention, and when I tried to steal them I
ran into trouble around foreign calls. It should/might be possible to
work around this, I need to have another go. It works fine with the
NCG, of course.
> I was thinking something like the worker/wrapper split, ghc would
> recognize when a function takes only unboxed arguments and returns an
> unboxed result (these can probably be relaxed, no evals is the key
> thing)
>
> so in the case of fac, it would create
>
> int fac(int n, int r) {
> if (n == 1) return 1;
> return fac (n - 1,n*r);
> }
>
> and (something like)
>
> void fac_wrapper(void) {
> continuation = pop() // I might be mixing up the order of these
> n = pop()
> r = pop()
>
> x = fac(n,r)
>
> push(x)
> jump(continuation)
>
> }
Well yes, but if the worker needs to return to the scheduler (i.e. if it
does a heap check or stack check) then the C stack is all messed up and
we need a setjmp/longjmp to get back to the scheduler. You can do it in
the case where there are no heap/stack checks, but I think that's very
rare.
> I am not sure how much sense this makes though. I am no expert on the
> spineless tagless G machine (which would make an excellent name for a
> band BTW)
:-D
> fortunatly, modern CPUs anticipate this conondrum and provide
> 'write-combining' forms of their memory access functions, these will
> write a value directly to RAM without touching the cache at all. This
> will always be a win when updating thunks due to the reasons mentioned
> above and is potentially a big benefit. selective write-combining is
> in the top 3 performance enhancing things according to the cpu
> optimization manuals.
>
> I think the easiest way to do this would be to have a MACRO defined to
> an appropriate bit of assembly or a simple C assignment if the
> write-combining mov's arn't available.
very good idea, I must try that. Any more progress on why our x86_64
code is slow?
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list