[Haskell-beginners] PI calculation - Newbie question

Fri Jan 29 07:13:45 EST 2010

Am Freitag 29 Januar 2010 11:59:30 schrieb Gabi:
> Hi Group,
>
> I am just trying to learn the lang and Implemented this PI calculator.
> It is really slow and very memory consuming (much much slower than its
> equivalent in Clojure for instance)
>
> I think the problem is in  "rs <- sequence (replicate n isRandIn)"  -
> But I don't know how to get around it (how do I get a lazy sequence of
> rs? Is it the problem anyway?)

It's one part of the problem.

sequence (replicate n isRandIn)

requires all n random number computations to be in memory at once. When you 
want far more than 10000 random points, that'll take a lot of memory.

The more evil part is that the things aren't actually calculated when you 
call sequence, so what you get is a long list of thunks "calculate when 
needed" which take a really large amount of space (
./MCPi 100000 +RTS -sstderr                                                  
3.14048                                                                      
     252,873,148 bytes allocated in the heap                                 
      48,792,700 bytes copied during GC                                      
       8,861,192 bytes maximum residency (5 sample(s))                       
         128,768 bytes maximum slop                                          
              23 MB total memory in use (0 MB lost due to fragmentation) 

./MCPi 1000000 +RTS -sstderr                                                  
3.1401                                                                        
   2,523,687,224 bytes allocated in the heap                                  
     527,553,588 bytes copied during GC                                       
     107,345,604 bytes maximum residency (8 sample(s))                        
       7,619,396 bytes maximum slop                                           
             219 MB total memory in use (2 MB lost due to fragmentation)
).

You can reduce the space requirements by forcing the calculations earlier, 
e.g.

isRandIn = do
    p <- randPoint
    return $! inCirc p

The ($!) requires the evaluation of (inCirc p) [to the outermost 
constructor, since it's an Int, that means complete evaluation], the list 
that sequence produces is a list of evaluated Ints, needs far less space.
Unfortunately, that is much slower. I'm not sure why.

However, it's not good to do that much in IO (and getStdRandom isn't 
exactly fast, it has to retrieve the random generator and store the 
modified generator).

The bulk of the algorithm is a pure calculation, given the *lazy* list of 
pseudo-random coordinates.

So
----------------------------------------------------------------------

-- curried version of inCirc, no need to construct pairs
inCirc :: Double -> Double -> Int
inCirc x y
    | dx*dx + dy*dy < 0.25  = 1
    | otherwise             = 0
      where
        dx = x - 0.5
        dy = y - 0.5

-- transform a list of coordinates into a list of indicators
-- whether the point is inside the circle (sorry for the
-- stupid name)
inCircles :: [Double] -> [Int]
inCircles (x:y:zs) = inCirc x y : inCircles zs
inCircles _ = []

-- given a count of experiments and an infinite list of coordinates,
-- calculate an approximation to pi
calcPi :: Int -> [Double] -> Double
calcPi n ds = fromIntegral ct / fromIntegral n * 4
      where
        ct = sum . take n $ inCircles ds

-- now the IO part is only
-- * getting the number of experiments and
-- * getting the StdGen
main :: IO ()
main = do
    args <- getArgs
    sg <- getStdGen
    let n = case args of
                (a:_) -> read a
                _ -> 10000
    print $ calcPi n (randomRs (0,1) sg)

----------------------------------------------------------------------

Now the list of coordinates is consumed as it is produced, things can 
immediately be garbage-collected (if compiled with optimisations, without 
optimisations, you would have to replace "sum" with "foldl' (+) 0").

It runs in constant space and faster. It's still slow, though, because 
System.Random's StdGen is slow.
For faster generation of pseudo-random numbers, take a look at e.g. 
mersenne-random (there are other fast PRNG packages on hackage, too, IIRC).

>
> -- p.hs simple PI calculator, using the Monte Carlo Method
>
> import System( getArgs )
> import System.Random
> inCirc :: (Double, Double) -> Int
> inCirc (x,y) = if ((dx * dx) + (dy * dy)) < 0.25
>                         then 1
>                        else 0
>                where dx = x - 0.5
>                           dy = y - 0.5
>
>
> randPoint :: IO (Double, Double)
> randPoint = do
>            x <-getStdRandom (randomR (0, 1 :: Double))
>            y <-getStdRandom (randomR (0, 1 :: Double))
>            return (x, y)
>
>
> isRandIn = do
>           p <- randPoint
>           return (inCirc p)
>
>
> main  = do
>       args <- getArgs
>       let n = if null args
>               then 10000
>               else read $ (head args)::Int
>
>       rs <- sequence (replicate n isRandIn)
>       let pi = (fromIntegral(sum rs) / fromIntegral n) * 4
>       print pi