[Haskell-cafe] How do I get this done in constant mem?

Sat Oct 10 13:07:45 EDT 2009

also, looking at the following, it does seem to me that it is sequence
that is too strict, and not IO that is to blame, as the Maybe monad
has the same behavior:

t5IO, t6IO :: IO Int
t5Maybe, t6Maybe :: Maybe Int
t5 = return . head =<< sequence [return 1, undefined]
t6 = return . head =<< return [1,undefined]
t5IO = t5
t5Maybe = t5
t6IO = t6
t6Maybe = t6

*Main> t5IO
*** Exception: Prelude.undefined
*Main> t5Maybe
*** Exception: Prelude.undefined
*Main> t6IO
1
*Main> t6Maybe
Just 1

2009/10/10 Thomas Hartman <tphyahoo at gmail.com>:
>> Yes, you should not do this in IO.  That requires the entire
>> computation to finish before the result can be used.
>
> Not really the entire computation though... whnf, no?
>
> main = do
>  let thunks :: IO [Int]
>  thunks = (sequence . replicate (10^6) $ (randomRIO (0,10^9)))
>  putStrLn . show . head =<< thunks -- prints
>  putStrLn . show . last =<< thunks -- overflows
>
> In the case of [[num]] from the top post, I belive that would be the
> first complete list.
>
>
> 2009/10/9 Luke Palmer <lrpalmer at gmail.com>:
>> On Fri, Oct 9, 2009 at 2:05 PM,  <mf-hcafe-15c311f0c at etc-network.de> wrote:
>>> Hi all,
>>>
>>> I think there is something about my use of the IO monad that bites me,
>>> but I am bored of staring at the code, so here you g.  The code goes
>>> through a list of records and collects the maximum in each record
>>> position.
>>>
>>>
>>> -- test.hs
>>> import Random
>>> import System.Environment (getArgs)
>>> import System.IO (putStr)
>>>
>>> samples :: Int -> Int -> IO [[Double]]
>>> samples i j = sequence . replicate i . sequence . replicate j $ randomRIO (0, 1000 ** 3)
>>
>> Yes, you should not do this in IO.  That requires the entire
>> computation to finish before the result can be used.  This computation
>> should be pure and lazy.
>>
>> It is possible, using split (and I believe not without it, unless you
>> use mkStdGen), to make a 2D list of randoms where the random
>> generation matches exactly the structure of the list.
>>
>> splits :: (RandomGen g) => Int -> g -> [g]
>> splits 0 _ = []
>> splits n g = let (g1,g2) = split g in g1 : splits (n-1) g2
>>
>> samples :: (RandomGen g) => Int -> Int -> g -> [[Double]]
>> samples i j gen = map row (splits i gen)
>>    where
>>    row g = take j (randomRs (0, 10^9) g)
>>
>> In fact, we could omit all these counts and make an infinite 2D list,
>> which you can cull in the client code.
>>
>> splits :: (RandomGen g) => g -> [g]
>> splits g = let (g1,g2) = split g in g1 : splits g2
>>
>> samples :: (RandomGen g) => g -> [[Double]]
>> samples = map row . splits
>>    where
>>    row = randomRs (0, 10^9)
>>
>> I find the latter to be more straightforward and obvious.  Maintaining
>> the laziness here is a fairly subtle thing, so study, perturb, try to
>> write it yourself in different ways, etc.
>>
>>> maxima :: [[Double]] -> [Double]
>>> maxima samples@(_:_) = foldr (\ x y -> map (uncurry max) $ zip x y) (head samples) (tail samples)
>>
>> FWIW, This function has a beautiful alternate definition:
>>
>> maxima :: [[Double]] -> [Double]
>> maxima = map maximum . transpose
>>
>>> main = do
>>>  args <- getArgs
>>>  x <- samples (read (head args)) 5
>>>  putStr . (++ "\n") . show $ maxima x
>>>
>>>
>>> I would expect this to take constant memory (foldr as well as foldl),
>>> but this is what happens:
>>>
>>>
>>> $ ghc -prof --make -O9 -o test test.hs
>>> [1 of 1] Compiling Main             ( test.hs, test.o )
>>> Linking test ...
>>> $ ./test 100 +RTS -p
>>> [9.881155955344708e8,9.910336352165401e8,9.71000686630374e8,9.968532576451201e8,9.996200333115692e8]
>>> $ grep 'total alloc' test.prof
>>>        total alloc =     744,180 bytes  (excludes profiling overheads)
>>> $ ./test 10000 +RTS -p
>>> [9.996199711457872e8,9.998928358545277e8,9.99960283632381e8,9.999707142123885e8,9.998952151508758e8]
>>> $ grep 'total alloc' test.prof
>>>        total alloc =  64,777,692 bytes  (excludes profiling overheads)
>>> $ ./test 1000000 +RTS -p
>>> Stack space overflow: current size 8388608 bytes.
>>> Use `+RTS -Ksize' to increase it.
>>> $
>>>
>>>
>>> so...
>>>
>>> does sequence somehow force the entire list of monads into evaluation
>>> before the head of the result list can be used?
>>
>> Yep.  IO is completely strict; in some sense the same as "call by
>> value" (don't take the analogy too far).  Rule of thumb: keep your
>> distance from it ;-)
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>