[Haskell-cafe] Re: Haskell version of ray tracer code is much slower than the original ML

Fri Jun 22 09:13:13 EDT 2007

On Fri, Jun 22, 2007 at 01:16:54PM +0100, Simon Marlow wrote:
>Philip Armstrong wrote:
>>IIRC, it is possible to issue an instruction to the x86 FP unit which
>>makes all operations work on 64-bit Doubles, even though there are
>>80-bits available internally. Which then means there's no requirement
>>to spill intermediate results to memory in order to get the rounding
>>correct.
>
>For some background on why GHC doesn't do this, see the comment "MORE 
>FLOATING POINT MUSINGS..." in
>
>   http://darcs.haskell.org/ghc/compiler/nativeGen/MachInstrs.hs

Twisty. I guess 'slow, but correct, with switches to go faster at the
price of correctness' is about the best option.

>You probably want SSE2.  If I ever get around to finishing it, the GHC 
>native code generator will be able to generate SSE2 code on x86 someday, 
>like it currently does for x86-64.  For now, to get good FP performance on 
>x86, you probably want
>
>   -fvia-C -fexcess-precision -optc-mfpmath=sse2

Reading the gcc manpage, I think you mean -optc-msse2
-optc-mfpmath=sse. -mfpmath=sse2 doesn't appear to be an option.

(I note in passing that the ghc darcs head produces binaries from
ray.hs which are about 15% slower than ghc 6.6.1 ones btw. Same
optimisation options used both times.)

cheers, Phil

-- 
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt