GHC vs. GCC on raw vector addition

Bulat Ziganshin bulatz at
Wed Jan 18 12:54:43 EST 2006

Hello Bulat,

Wednesday, January 18, 2006, 8:34:54 PM, you wrote:

BZ> the only cause that this code is only 3 times slower is that C version
BZ> is really limited by memory speed. when tested on 1000-element
BZ> arrays, it is 20 times slower. i'm not yet tried SSE optimization for
BZ> gcc ;)

sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops"
the C version is 50 times faster than best Haskell one... it's the
loop from C version:

        fldl (%edx)
        faddl (%ecx)
        fstpl (%edx)
        fldl 8(%edx)
        faddl 8(%ecx)
        fstpl 8(%edx)
        fldl 16(%edx)
        faddl 16(%ecx)
        fstpl 16(%edx)
        fldl 24(%edx)
        faddl 24(%ecx)
        addl $4,%ebx
        addl $32,%ecx
        fstpl 24(%edx)
        addl $32,%edx
        cmpl -4(%ebp),%ebx
        jl L18

Best regards,
 Bulat                            mailto:bulatz at

More information about the Glasgow-haskell-users mailing list