[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Fri Jun 22 08:16:54 EDT 2007

Philip Armstrong wrote:
> On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:
>> On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:
>
>> That's the old wiki. The new one gives the opposite advice! (As does
>> the ghc manual):
>>
>>  http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
>>  http://www.haskell.org/haskellwiki/Performance/Floating_Point
> 
> Incidentally, the latter page implies that ghc is being overly
> pessimistic when compilling FP code without -fexcess-precision:
> 
> "On x86 (and other platforms with GHC prior to version 6.4.2), use
>  the -fexcess-precision flag to improve performance of floating-point
>  intensive code (up to 2x speedups have been seen). This will keep
>  more intermediates in registers instead of memory, at the expense of
>  occasional differences in results due to unpredictable rounding."
> 
> IIRC, it is possible to issue an instruction to the x86 FP unit which
> makes all operations work on 64-bit Doubles, even though there are
> 80-bits available internally. Which then means there's no requirement
> to spill intermediate results to memory in order to get the rounding
> correct.

For some background on why GHC doesn't do this, see the comment "MORE FLOATING 
POINT MUSINGS..." in

   http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

The main problem is floats: even if you put the FPU into 64-bit mode, your float 
operations will be done at 64-bit precision.  There are other technical problems 
that we found with doing this, the comment above elaborates.

GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision. 
The idea is to try to get reproducible floating-point results.  The native code 
generator is unaffected by -fexcess-precision, but it produces rubbish 
floating-point code on x86 anyway.

> Ideally, -fexcess-precision should just affect whether the FP unit
> uses 80 or 64 bit Doubles. It shouldn't make any performance
> difference, although obviously the generated results may be different.
 >
> As an aside, if you use the -optc-mfpmath=sse option, then you only
> get 64-bit Doubles anyway (on x86).

You probably want SSE2.  If I ever get around to finishing it, the GHC native 
code generator will be able to generate SSE2 code on x86 someday, like it 
currently does for x86-64.  For now, to get good FP performance on x86, you 
probably want

   -fvia-C -fexcess-precision -optc-mfpmath=sse2

Cheers,
	Simon