[Haskell-beginners] Performance of Idiomatic lazy Haskell
Daniel Fischer
daniel.is.fischer at web.de
Sun Jan 31 08:39:56 EST 2010
Am Sonntag 31 Januar 2010 13:23:33 schrieb Stephen Tetley:
> Hi Markus
>
> Whoops, I hadn't read your email properly and wasn't accounting for
> the epsilon. Here's a version that does, although it is perhaps a
> little slow...
>
Better use the previous and calculate how many terms you need:
============================================
module Main (main) where
import Data.List (unfoldr)
main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let mx = floor (4/eps)
!k = (mx+1) `quot` 2
putStrLn $ "PI mit EPS " ++ (show eps) ++ " = " ++ show (leibniz k)
leibniz n = (4 *) $ sum $ take n step
step :: [Double]
step = unfoldr phi (True,1) where
phi (sig,d) | sig = Just (1/d, (False,d+2))
| otherwise = Just (negate (1/d), (True,d+2))
giving
$ echo '0.00000001' | ./unfoldPi +RTS -sstderr -RTS
./unfoldPi +RTS -sstderr
EPS:
PI mit EPS 1.0e-8 = 3.141592648589476
27,305,969,616 bytes allocated in the heap
2,523,788 bytes copied during GC
61,660 bytes maximum residency (1 sample(s))
38,864 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 52083 collections, 0 parallel, 0.29s, 0.44s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 23.10s ( 23.14s elapsed)
GC time 0.29s ( 0.44s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 23.39s ( 23.57s elapsed)
%GC time 1.2% (1.9% elapsed)
Alloc rate 1,182,002,769 bytes per MUT second
Productivity 98.8% of total user, 98.0% of total elapsed
===============================================
a little better if we do the unfolding ourselves, not creating any
intermediate pairs:
leibniz n = (4*) . sum $ unf n True 1
unf :: Int -> Bool -> Double -> [Double]
unf 0 _ _ = []
unf k True n = 1/n : unf (k-1) False (n+2)
unf k False n = negate (1/n) : unf (k-1) True (n+2)
which gives
$ echo '0.00000001' | ./unfPi +RTS -sstderr -RTS
./unfPi +RTS -sstderr
EPS:
PI mit EPS 1.0e-8 = 3.141592648589476
15,250,801,168 bytes allocated in the heap
1,284,632 bytes copied during GC
61,592 bytes maximum residency (1 sample(s))
38,864 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 29089 collections, 0 parallel, 0.22s, 0.32s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 18.05s ( 18.05s elapsed)
GC time 0.22s ( 0.32s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 18.28s ( 18.37s elapsed)
%GC time 1.2% (1.7% elapsed)
Alloc rate 844,773,242 bytes per MUT second
Productivity 98.8% of total user, 98.2% of total elapsed
=============================================
Still not competitive with the loop version
$ echo '0.00000001' | ./loopPi +RTS -sstderr -RTS
./loopPi +RTS -sstderr
EPS:
PI mit EPS 1.0e-8 = 3.1415926526069526
136,248 bytes allocated in the heap
2,024 bytes copied during GC
51,720 bytes maximum residency (1 sample(s))
38,392 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 6.64s ( 6.64s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 6.64s ( 6.64s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 20,517 bytes per MUT second
Productivity 100.0% of total user, 100.0% of total elapsed
which can also be improved:
{-# LANGUAGE BangPatterns #-}
module Main (main) where
main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let mx = floor (4/eps)
k = (mx-1) `quot` 2
!pi14 = pisum (even k) (fromInteger (2*k+1))
putStrLn $ "PI mit EPS "++(show eps)++" = "++ show(4*pi14)
-- sum from small numbers to large, to reduce cancellation
-- although, in this particular case, for eps = 1e-8, the result is
-- farther off than summing the other way
pisum :: Bool -> Double -> Double
pisum bl start = go bl start 0
where
go _ n s | n < 1 = s
go True n !s = go False (n-2) (s+recip n)
go False n !s = go True (n-2) (s-recip n)
$ echo '0.00000001' | ./mloopPi +RTS -sstderr -RTS
./mloopPi +RTS -sstderr
EPS:
PI mit EPS 1.0e-8 = 3.141592648589793
136,416 bytes allocated in the heap
2,024 bytes copied during GC
51,720 bytes maximum residency (1 sample(s))
38,392 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.55s ( 4.56s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.55s ( 4.56s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 29,966 bytes per MUT second
Productivity 99.9% of total user, 99.8% of total elapsed
Most of the gain is due to the much simpler loop-break test, a small bit
may be due to bang-patterns vs. seq.
Bottom line:
GHC isn't very good at fusing away intermediate lists in strict algorithms
(it's better at it for lazy algorithms). Allocating so many cons-cells just
to be immediately garbage-collected costs a lot of time.
You can code the loop directly, which gives reasonable results, or, as
Felipe suggested, use the fusion framework created by experts:
module Main (main) where
import qualified Data.List.Stream as S
main :: IO ()
main = do
putStrLn "EPS: "
eps <- readLn :: IO Double
let mx = floor (4/eps)
!k = (mx+1) `quot` 2
putStrLn $ "PI mit EPS " ++ (show eps) ++ " = " ++ show (leibniz k)
leibniz n = (4 *) $ S.sum $ S.take n step
step :: [Double]
step = S.unfoldr phi (True,1) where
phi (sig,d) | sig = Just (1/d, (False,d+2))
| otherwise = Just (negate (1/d), (True,d+2))
which beats the hand-coded loop:
$ echo '0.00000001' | ./sunfPi +RTS -sstderr -RTS
./sunfPi +RTS -sstderr
EPS:
PI mit EPS 1.0e-8 = 3.1415926445727678
136,560 bytes allocated in the heap
2,024 bytes copied during GC
51,720 bytes maximum residency (1 sample(s))
38,392 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 0 collections, 0 parallel, 0.00s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 4.27s ( 4.27s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.27s ( 4.27s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 31,994 bytes per MUT second
Productivity 100.0% of total user, 100.0% of total elapsed
> > import Data.List (unfoldr)
> >
> >
> > leibniz eps = converge eps ser
> >
> >
> > ser :: [Double]
> > ser = unfoldr phi (True,1) where
> > phi (sig,d) | sig == True = Just (1/d, (False,d+2))
> >
> > | otherwise = Just (negate (1/d), (True,d+2))
> >
> > converge :: Double -> [Double] -> Double
> > converge eps xs = step 0 0 xs where
> > step a b (x:xs) = let a' = a + (4*x) in
> > if abs (a'-b) < eps then a' else step a' a xs
> >
> > demo1 = leibniz 0.00000000025
>
> Best wishes
>
> Stephen
More information about the Beginners
mailing list