Wed Feb 1 05:50:42 EST 2006

```Donald Bruce Stewart wrote:

> This entry in fact runs faster than the original (though not the new vectorised
> entry) optimised C entry (and faster than all other languages):
>     http://shootout.alioth.debian.org/gp4/benchmark.php?test=partialsums&lang=all
>
> So, by carefully tweaking things, we first squished a space leak, and then gained another
> 30%.
>
> In summary:
>   * Check the Core that is generated
>   * Watch out for optimisations that are missed
>   * Read the generated C for the tight loops.
>   * Make sure tight loops are unboxed
>   * Use -fexcess-precision and -optc-ffast-math for doubles
>
> This is roughly the process I used for the other shootout entries.
>
> Cheers,
>   Don
>

I just looked hard at the "new vectorised entry" and the original entry for C.
In both, the last two functions, which use the alt-ernating sign, are *not* done
in the required naive fashion:

>   sum = 0.0;
>   for (k = 1; k <= n-1; k += 2) sum += 1.0/kd;
>   for (k = 2; k <= n; k += 2) sum -= 1.0/kd;
>   printf("%.9f\tAlternating Harmonic\n", sum);
>
>   sum = 0.0;
>   for (k = 1; k <= 2*n-1; k += 4) sum += 1.0/kd;
>   for (k = 3; k <= 2*n; k += 4) sum -= 1.0/kd;
>   printf("%.9f\tGregory\n", sum);

As you can see, all the positive terms are added to sum, then all the negative
terms.  The double precision math comes to a different result, but this is
hidden by printing only 9 digits.

I just modified the g++ entry and the Haskell entry which do it right and the c
entry to print more digits (e.g. "show sum" in Haskell).

The Haskell entry and g++ entry agree, as expected.  The c entry does *not* agree:

0.69314698056009382831632592569803819060325622558593750000... g++
0.69314698056038037687898167860112152993679046630859375000... gcc