[Haskell-beginners] Performance of Idiomatic lazy Haskell
Daniel Fischer
daniel.is.fischer at web.de
Sun Jan 31 13:10:59 EST 2010
Am Sonntag 31 Januar 2010 15:52:41 schrieb Stephen Tetley:
> Hi Daniel
>
> Thanks - the figures are very impressive for the stream fusion
> library. I knew the paper, but I hadn't looked at it the
> implementation.
>
Just for the record, the loops (my variant) coded in C take 2.7 seconds
(icc) resp. 3.0 seconds (gcc) with -O3.
Markus' loop takes 5.32s (icc) resp. 6.04s (gcc) with -O3.
Well, that's largely due to the fact that that takes two divisions per
iteration (and divisions are very slow on my box).
Rewriting it so that only one division per iteration is necessary:
================================================
module Main (main) where
main :: IO ()
main = do putStrLn "EPS: "
eps <- readLn :: IO Double
let pi14 = pisum 1 3 False (eps*0.25)
putStrLn $ "PI mit EPS "++(show eps)++" = "++ show(4*pi14)
pisum s i b eps
| d < eps = s
| b = let h = s+d in h `seq` pisum h (i+2) False eps
| otherwise = let h = s-d in h `seq` pisum h (i+2) True eps
where
d = 1/i
================================================
it takes 5.42 seconds (eps = 1e-8) when compiled natively, that loop in C
is practically indistinguishable from my loop variant.
So stream-fusion or hand-coded loops in Haskell are clearly slower than C
loops, but don't stink when compiled with the native code generator.
However, I get identical to gcc performance with
ghc -O2 -fexcess-precision -fvia-C -optc-O3
[-optc-ffast-math doesn't make any difference then] for the hand-coded
loops (all, my variant, Markus' original, Markus' rewritten - of course,
each identical to the corresponding C-loop).
Also for stream-fusion.
Some of the list creating variants also profit from compilation via C,
others not. However, none come near the NCG loop performance, let alone C
loop performance.
To sum up:
Although GHC's native code generator does a not too bad job with
arithmetic-heavy loops, gcc is better (even without optimisations).
But if you compile via C, you can get identical performance -
for these loops.
In general, things are more complicated, often the native code generator
produces better code than the via-C route. It's usually worth a test which
produces the better binary, native or via-C.
More information about the Beginners
mailing list