[Haskell-cafe] Re: speed: ghc vs gcc
Don Stewart
dons at galois.com
Fri Feb 20 23:59:42 EST 2009
bertram.felgenhauer:
> This is odd, but it doesn't hurt the inner loop, which only involves
> $wsum01_XPd, and is identical to $wfold_s15t above.
>
> > Checking the asm:
> > $ ghc -O2 -fasm
> >
> > sQ3_info:
> > .LcRt:
> > cmpq 8(%rbp),%rsi
> > jg .LcRw
> > leaq 1(%rsi),%rax
> > addq %rsi,%rbx
> > movq %rax,%rsi
> > jmp sQ3_info
>
> So for some reason ghc ends up doing the (n + 1) addition before the
> (acc + n) addition in this case - this accounts for the extra
> instruction, because both n+1 and n need to be kept around for the
> duration of the addq (which does the acc + n addition).
Yep, well spotted.
> > Checking via C:
> >
> > $ ghc -O2 -optc-O3 -fvia-C
> >
> > Better code, but still a bit slower:
> >
> > sQ3_info:
> > cmpq 8(%rbp), %rsi
> > jg .L8
> > addq %rsi, %rbx
> > leaq 1(%rsi), %rsi
> > jmp sQ3_info
>
> This code is identical (up to renaming registers and one offset that
> I can't fully explain, but is probably related to a slight difference
> in handling pointer tags between the two versions of the code) to the
> "nice assembly" above.
Indeed, which is gratifying.
> > Running:
> >
> > $ time ./B
> > 500000000500000000
> > ./B 1.01s user 0.01s system 97% cpu 1.035 total
>
> Hmm, about 5% slower, are you sure this isn't just noise?
>
> If not noise, it may be some alignment effect. Hard to say.
I couldn't get it under 1s from a dozen runs, so assuming some small
effect with alignment.
Why we get the extra test in the outer loop though, not sure. That's new
too I think -- at least I've not seen that pattern before.
-- Don
More information about the Haskell-Cafe
mailing list