[Haskell-cafe] How do I get this done in constant mem?

Sat Oct 10 12:33:52 EDT 2009

> Yes, you should not do this in IO.  That requires the entire
> computation to finish before the result can be used.

Not really the entire computation though... whnf, no?

main = do
  let thunks :: IO [Int]
  thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
  putStrLn . show . head =<< thunks -- prints
  putStrLn . show . last =<< thunks -- overflows

In the case of [[num]] from the top post, I belive that would be the
first complete list.

2009/10/9 Luke Palmer <lrpalmer at gmail.com>:
> On Fri, Oct 9, 2009 at 2:05 PM,  <mf-hcafe-15c311f0c at etc-network.de> wrote:
>> Hi all,
>>
>> I think there is something about my use of the IO monad that bites me,
>> but I am bored of staring at the code, so here you g.  The code goes
>> through a list of records and collects the maximum in each record
>> position.
>>
>>
>> -- test.hs
>> import Random
>> import System.Environment (getArgs)
>> import System.IO (putStr)
>>
>> samples :: Int -> Int -> IO [[Double]]
>> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)
>
> Yes, you should not do this in IO.  That requires the entire
> computation to finish before the result can be used.  This computation
> should be pure and lazy.
>
> It is possible, using split (and I believe not without it, unless you
> use mkStdGen), to make a 2D list of randoms where the random
> generation matches exactly the structure of the list.
>
> splits :: (RandomGen g) => Int -> g -> [g]
> splits 0 _ = []
> splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2
>
> samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
> samples i j gen = map row (splits i gen)
>    where
>    row g = take j (randomRs (0, 10^9) g)
>
> In fact, we could omit all these counts and make an infinite 2D list,
> which you can cull in the client code.
>
> splits :: (RandomGen g) => g -> [g]
> splits g = let (g1,g2) = split g in g1 : splits g2
>
> samples :: (RandomGen g) => g -> [[Double]]
> samples = map row . splits
>    where
>    row = randomRs (0, 10^9)
>
> I find the latter to be more straightforward and obvious.  Maintaining
> the laziness here is a fairly subtle thing, so study, perturb, try to
> write it yourself in different ways, etc.
>
>> maxima :: [[Double]] -> [Double]
>> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
>
> FWIW, This function has a beautiful alternate definition:
>
> maxima :: [[Double]] -> [Double]
> maxima = map maximum . transpose
>
>> main = do
>>  args <- getArgs
>>  x <- samples (read (head args)) 5
>>  putStr . (++ "\n") . show $ maxima x
>>
>>
>> I would expect this to take constant memory (foldr as well as foldl),
>> but this is what happens:
>>
>>
>> $ ghc -prof --make -O9 -o test test.hs
>> [1 of 1] Compiling Main             ( test.hs, test.o )
>> Linking test ...
>> $ ./test 100 +RTS -p
>> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
>> $ grep 'total alloc' test.prof
>>        total alloc =     744,180 bytes  (excludes profiling overheads)
>> $ ./test 10000 +RTS -p
>> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
>> $ grep 'total alloc' test.prof
>>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
>> $ ./test 1000000 +RTS -p
>> Stack space overflow: current size 8388608 bytes.
>> Use `+RTS -Ksize' to increase it.
>> $
>>
>>
>> so...
>>
>> does sequence somehow force the entire list of monads into evaluation
>> before the head of the result list can be used?
>
> Yep.  IO is completely strict; in some sense the same as "call by
> value" (don't take the analogy too far).  Rule of thumb: keep your
> distance from it ;-)
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>