[Haskell-cafe] Re: Haskell version of ray tracer code is much
slower than the original ML
Simon Marlow
simonmarhaskell at gmail.com
Fri Jun 22 08:16:54 EDT 2007
Philip Armstrong wrote:
> On Thu, Jun 21, 2007 at 08:42:57PM +0100, Philip Armstrong wrote:
>> On Thu, Jun 21, 2007 at 03:29:17PM -0400, Mark T.B. Carroll wrote:
>
>> That's the old wiki. The new one gives the opposite advice! (As does
>> the ghc manual):
>>
>> http://www.haskell.org/ghc/docs/latest/html/users_guide/faster.html
>> http://www.haskell.org/haskellwiki/Performance/Floating_Point
>
> Incidentally, the latter page implies that ghc is being overly
> pessimistic when compilling FP code without -fexcess-precision:
>
> "On x86 (and other platforms with GHC prior to version 6.4.2), use
> the -fexcess-precision flag to improve performance of floating-point
> intensive code (up to 2x speedups have been seen). This will keep
> more intermediates in registers instead of memory, at the expense of
> occasional differences in results due to unpredictable rounding."
>
> IIRC, it is possible to issue an instruction to the x86 FP unit which
> makes all operations work on 64-bit Doubles, even though there are
> 80-bits available internally. Which then means there's no requirement
> to spill intermediate results to memory in order to get the rounding
> correct.
For some background on why GHC doesn't do this, see the comment "MORE FLOATING
POINT MUSINGS..." in
http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs
The main problem is floats: even if you put the FPU into 64-bit mode, your float
operations will be done at 64-bit precision. There are other technical problems
that we found with doing this, the comment above elaborates.
GHC passes -ffloat-store to GCC, unless you give the flag -fexcess-precision.
The idea is to try to get reproducible floating-point results. The native code
generator is unaffected by -fexcess-precision, but it produces rubbish
floating-point code on x86 anyway.
> Ideally, -fexcess-precision should just affect whether the FP unit
> uses 80 or 64 bit Doubles. It shouldn't make any performance
> difference, although obviously the generated results may be different.
>
> As an aside, if you use the -optc-mfpmath=sse option, then you only
> get 64-bit Doubles anyway (on x86).
You probably want SSE2. If I ever get around to finishing it, the GHC native
code generator will be able to generate SSE2 code on x86 someday, like it
currently does for x86-64. For now, to get good FP performance on x86, you
probably want
-fvia-C -fexcess-precision -optc-mfpmath=sse2
Cheers,
Simon
More information about the Haskell-Cafe
mailing list