[Haskell-cafe] Monte Carlo Pi calculation (newbie learnings)

Tue Nov 6 09:00:12 EST 2007

Hi Alex,

You wrote:
> I'm new to Haskell, and don't quite understand how IO and lazy
> evaluation work together yet.  In order to improve my understanding, I
> thought I'd try to knock together a translation of a fairly simple
> problem I can code up in C...
> Now, I have two questions.  The easy question is, how can I make this
> more idiomatic?
> ...The second question is performance-related.  The Haskell code above
> overflows the stack when run with an argument of 1000000

These are great questions, and a really fun example!

Let me focus back on your original two questions.

When you are just playing around with an algorithm,
it is not so idiomatic in Haskell to write a program that
takes a command-line argument and prints to stdout
like in C. Instead, you would more likely write a small module
containing some functions. Then you load them into
GHCi or Hugs and play with them, and reload as you
make changes.

So I might write something simple like this into RandomPi.hs:

module RandomPi where

import System.Random

randMax, unitRadius :: Int
randMax = floor $ sqrt $ fromIntegral (maxBound `div` 2 :: Int)
unitRadius = randMax * randMax

calcPi :: RandomGen g => g -> Int -> (Int, Double)
calcPi g total = (count, fromIntegral count / fromIntegral total * 4)
  where
    count = length $ filter isInCircle randPairs
    isInCircle (x, y) = x*x + y*y < unitRadius
    randPairs = take total $ pairs $ randomRs (0, randMax) g

-- Group a list into pairs
pairs :: [a] -> [(a,a)]
pairs (x:x':xs) = (x, x') : pairs xs
pairs _         = []

I think this pretty faithfully translates your ideas from
C into Haskell.

(I was a little more careful than you were regarding
MAX_RAND, but that has nothing to do with Haskell.
In fact, I'm not sure your C code would work using
GNU libc, because you'd get unit_radius = 1 I think
due to int overflow.)

If you really, really, want a command-line/stdout
thing, you would then add in stuff like this
to do the required plumbing work:

import System.Environment
import System.Exit

randomPiMain :: IO ()
randomPiMain = do
  args <- getArgs
  when (length args /= 2) $ usage args
  let total = read $ args !! 1 :: Int
  g <- newStdGen
  let (count, myPi) = calcPi g total
  putStrLn $ "Count: " ++ show count
  putStrLn $ "Total: " ++ show total
  putStrLn $ show myPi

usage :: [String] -> IO ()
usage (progname:_) = do
  putStrLn $ concat ["usage: ", progname, " iteration_count"]
  exitWith $ ExitFailure 1

Then you would create a separate file containing
only two lines, and compile it:

import RandomPi
main = RandomPiMain

So the answer to your first question is that you isolate
out the IO stuff - usually not very much - and write
the rest of your program entirely pure. Then the IO
stuff you either do by hand at the prompt, or you
write separate functions for it.

In this case, the only IO is:
o Reading 'total'
o Printing the answer
o Giving the usage message
o Getting an initial random generator

You second question, about blowing the stack,
is settled in the above the same way that
the previous posters suggested. Represent
your big iteration as a list. Try to stick
to fairly simple operations on the list. Then the
compiler will figure out how to keep things
lazy and also release memory as it goes along.

Another point (where I disagree a bit with some
previous posters): I think that it is not very
idiomatic to use split on the random generator here.
For several reasons, I think split should usually
be avoided, except for things like sharing a generator
across multiple threads. Instead, I just pull pairs of
random numbers off of a single stream.

There is still a problem here - my code (and also
all of the previous posters, I think) clobbers the random
generator, rendering it unusable for future calculations.
In this case that is probably not a problem, but it
is a bad habit to get into, and not very polite.

But now we become entangled with your second
question. Because it is tricky to generate a huge
list of random numbers lazily, while still keeping
track of the last value of the generator, without
blowing the stack.

I will leave others to post a traditional solution to
that. I avoid that problem entirely by using yet
another Haskell idiom - I *always* (OK, almost always)
do random calculations inside a lazy State monad.

You don't need to understand the theory of monads
to understand the following code. You can see that
it just generates a big list of random pairs and uses
it. The type says to quietly keep the current value of
the random generator as state in the background.
The idiom "State $ randomR range" is a trick that
makes this possible.

module RandomPi where

import System.Random
import Control.Monad.State

randMax, unitRadius :: Int
randMax = floor $ sqrt $ fromIntegral (maxBound `div` 2 :: Int)
unitRadius = randMax * randMax

calcPiM :: RandomGen g => Int -> State g (Int, Double)
calcPiM total = do
    randPairs <- replicateM total $ randPairM (0, randMax)
    let count = length $ filter isInCircle randPairs
    return (count, fromIntegral count / fromIntegral total * 4)
  where
    isInCircle (x, y) = x*x + y*y < unitRadius

-- Generate a pair of random numbers
randPairM :: (RandomGen g, Random a) => (a, a) -> State g (a, a)
randPairM range = do
      x <- State $ randomR range
      y <- State $ randomR range
      return (x, y)

To use this at the GHCi or Hugs command prompt, or in
randomPiMain, you write:

g <- newStdGen
evalState (calcPiM 100000) g

Hope this helps,
Yitz