[Haskell-cafe] Re: speed: ghc vs gcc

Don Stewart dons at galois.com
Fri Feb 20 23:59:42 EST 2009


bertram.felgenhauer:
> This is odd, but it doesn't hurt the inner loop, which only involves
> $wsum01_XPd, and is identical to $wfold_s15t above.
> 
> > Checking the asm:
> >     $ ghc -O2 -fasm
> > 
> >     sQ3_info:
> >     .LcRt:
> >       cmpq 8(%rbp),%rsi
> >       jg .LcRw
> >       leaq 1(%rsi),%rax
> >       addq %rsi,%rbx
> >       movq %rax,%rsi
> >       jmp sQ3_info
> 
> So for some reason ghc ends up doing the (n + 1) addition before the
> (acc + n) addition in this case - this accounts for the extra
> instruction, because both n+1 and n need to be kept around for the
> duration of the addq (which does the acc + n addition).


Yep, well spotted.
  
> > Checking via C:
> > 
> >    $ ghc -O2 -optc-O3 -fvia-C
> > 
> > Better code, but still a bit slower:   
> > 
> >         sQ3_info:
> >           cmpq        8(%rbp), %rsi
> >           jg  .L8
> >           addq        %rsi, %rbx
> >           leaq        1(%rsi), %rsi
> >           jmp sQ3_info
> 
> This code is identical (up to renaming registers and one offset that
> I can't fully explain, but is probably related to a slight difference
> in handling pointer tags between the two versions of the code) to the
> "nice assembly" above.


Indeed, which is gratifying.
  
> > Running:
> > 
> >         $ time   ./B
> >         500000000500000000
> >         ./B  1.01s user 0.01s system 97% cpu 1.035 total
> 
> Hmm, about 5% slower, are you sure this isn't just noise?
> 
> If not noise, it may be some alignment effect. Hard to say.


I couldn't get it under 1s from a dozen runs, so assuming some small
effect with alignment.

Why we get the extra test in the outer loop though, not sure. That's new
too I think -- at least I've not seen that pattern before.

-- Don


More information about the Haskell-Cafe mailing list