[Haskell-cafe] memory, garbage collection and other newbie's issues

Sun Oct 22 08:08:49 EDT 2006

Hello Andrea,

Sunday, October 22, 2006, 1:37:55 PM, you wrote:

>> as Udo said, it should be better to evaluate thunks just when they are
>> created, by using proper 'seq' calls.

> While I understand why you and Udo are right, still it is difficult
> for me to related this discussion to my code. So I wrote a small
> example that reproduces my problem, with the hope that this will help
> me understand your point.

sorry, i will not dig into details of its working. i will just explain
common rule: if you calc something, make sure that returned value at
any attempt to use it will be completely evaluated. it's something
rather close to the problems of doing IO in lazy environment,
thoroughly described in my IO inside manual

for example, this definition:

f a b = (a*b,a+b)

will return you unevaluated thunk. even when it is used, it may be
computed partially:

main = do let x = f 1 2
          print (fst x)

here print will force evaluation of first value in pair, but not the
second one. but you can improve it by telling 'if you need value of f,
you should evaluate its components first':

f a b = let x = a*b
            y = a+b
        in x `seq` y `seq` (x,y)

this f definition will not evaluate x and y automatically. BUT its
returned value is not (x,y). its returned value is x `seq` y `seq` (x,y)
and when further computation try to use it in any way, it can't put
hands on the pair before it will evaluate x and y values. are you
understand?

this technique therefore may be used to construct values that will be
fully evaluated on any attempt to use any_ part of such value. say,

f a = let x = ..
          y = ..
          z = ..
      in x `seq` y `seq` z `seq` T x y z

g = let x = f 1
        y = f 2
        z = f 3
    in x `seq` y `seq` z `seq` S x y z

here value returned by g on any attempt to use it will force
evaluation of two levels of data structure. this can be repeated again
and again

but writing to file or comparing does the same. that is difference.
when you write value to file, you does all evaluations at once but
before this moment you have large datastructure full of chunks built
in memory. seq-ing technique can help when you build you structure
step-by-step in some sequential environment (read monad)

so, the following:

main = do let x = g
          return $! (field1 x)

will evaluate the whole value returned by g. but it will be equivalent
to

main = do let x = g''
          return $! x==x

where g'' - the same function without seqs

to get real advantage, you need to build your value sequentially in
monad and force evaluation of each step results:

main = do let x = f 1
          return $! x
          let y = f 2
          return $! y
          let z = f 3
          return $! z
          let a = T x y z
          ..

so, the seq-ing technique will be better that you current one only in
the case when values build are used to build other parts of these
datastructure - in this case additional seq's may force evaluation of
more part of datastructure than actually requested

about your example - this monad is strict, i.e. its operations are
sequenced like in IO monad itself (this monad just carries additional
state information between IO operations). what you need to do - is to
force that any updates/requests will force evaluation. then strictness
of monad will guarantee you that evaluation will be perfomed at each
step. this will work both for IO and your monad:

do let x = a*b
       return $! x
       -- here x is evaluated

the trick here is that where execution reaches the 'return $!' point,
it needs to strictly evaluate value of 'return $! x' expression. this
expression, that is equivalent to (x `seq` return x) need to evaluate
x before it can return anything! as a result, 'return $! x' is
executable statement (like putStr, for example) that force evaluation
of x. i.e. it guarantees that x will be executed just at this moment,
before executing next statement in 'do' block

so, by inserting 'return $!' statements, you may force evaluation of
thunks at given lines of your program, and of course you want to
evaluate every thunk just at the moment it's created. now we need to
look into your program and just find places where thunks are created:

> data Mystate = Mystate {mystate :: [String]}

first rule of strict evaluation - use newtype instead of data for
one-element constructors.

newtype Mystate = Mystate {mystate :: [String]}

is equivalent to

data Mystate = Mystate {mystate :: ![String]}

note '!' for making field strict. if you can't use newtype - make all
fields strict

next moment - [String] is a [[Char]], i.e. it's a lazy list of lazy
lists. you should ensure that both levels of lists are strictly
evaluated if you need to avoid thunks

> type SL = StateT Mystate IO

> getState :: SL [String]
> getState =
>     do s <- get
>        return $ mystate s

first place where we not strict enough. this expression will return
_unevaluated_ thunk (mystate s). but you can force its evaluation
before return:

>        return $! mystate s

so, your code return something like (mystate (Mystate ["a"])) while my
code will evaluate this expression, throwing out function call, and
return result of function evaluation: ["a"].

I should emphasize that it will not evaluate any thunks inside list of
strings. but as long as your state is fully evaluated before storing,
this operator will be able to return strictly evaluated value. it is
how all this works

> setState ns =
>     modify (\s -> s {mystate = ns})

here you modify state, but don't ensure that string list is evaluated
on both levels. well, it will be ok if you ensure evaluation at _each_
call to this function. alternatively, you can force evaluation before
assignment by:

 setState ns = do
     return $! map length ns
     modify (\s -> s {mystate = ns})

> getFile :: String -> SL ()
> getFile p =
>     do f <- liftIO $ readFile p
>        let lns = lines f
>        -- forces evaluation of lns
>        liftIO $ putStrLn $ "Number of lines: " ++ show (length lns)
>        setState lns
>        promptLoop

here you evaluate only higher level of [String] before assigning it
to state. but for this operation ('lines' on file contents) it's enough,
we don't need to evaluate individual chars here

> showLine :: Int -> SL ()
> showLine nr =
>     do s <- getState
>        liftIO $ putStrLn $ s !! nr
>        promptLoop

> promptStr = "lFilename [load the file Filename] - sNr [show the line Nr of Filename] - q to quit"

> promptLoop :: SL ()
> promptLoop = 
>     do liftIO $ putStrLn promptStr
>        str <- liftIO getLine
>        case str of
>          ('l':ss) -> getFile ss
>          ('s':nr) -> showLine (read nr)
>          ('q':[]) -> liftIO $ return ()

you can omit liftIO here, just return () will be ok :)

>          _ -> promptLoop

> main =
>     evalStateT promptLoop $ Mystate []

if StateT is strict monad transformer, this code don't have space
leaks. you create thunks in two places, in one place you already
evaluate it, and i wrote what to do in second place.

for another code, you should watch assignments to state and ensure
that assigned values are fully evaluated

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com