[Haskell-cafe] memory, garbage collection and other newbie's issues

Sun Oct 22 14:26:53 EDT 2006

Hello Andrea,

Sunday, October 22, 2006, 6:06:24 PM, you wrote:

>> f a b = let x = a*b
>>             y = a+b
>>         in x `seq` y `seq` (x,y)
>> 
>> this f definition will not evaluate x and y automatically. BUT its
>> returned value is not (x,y). its returned value is x `seq` y `seq` (x,y)
>> and when further computation try to use it in any way, it can't put
>> hands on the pair before it will evaluate x and y values. are you
>> understand?

> Yes, I do understand. But, as far as I know, "seq" will just evaluate x
> and y enough to see if they are not bottom. So, if x and y are a deep
> data structure, they won't be evaluated entirely, right?

x (and y) should be not considered here as data value! it's an
expression which should be evaluated to bottom or non-bottom, i.e.
some data constructor. but what if expression of x (i.e. thunk) by
itself contains 'seq' call? say

let x = 1/0 `seq` 1
    y = x `seq` 2
print y

try it ;)  it will require to evaluate y value that requires to
evaluate x that requires to eval 1/0! so when we construct some value
we may ensure that all its parts will be evaluated once some part will
be requested:

x `seq` y `seq` T x y

when you construct data values with seq recursively, this will lead to
that whole deep datastructure will be evaluated once any part of it
will be requested

i don't propose you to do exactly this, just explain how this works
and why works

> That is to say:

>> to get real advantage, you need to build your value sequentially in
>> monad and force evaluation of each step results:
>> 
>> main = do let x = f 1
>>           return $! x
>>           let y = f 2
>>           return $! y
>>           let z = f 3
>>           return $! z
>>           let a = T x y z
>>           ..

> ...

>> > setState ns =
>> >     modify (\s -> s {mystate = ns})
>> 
>> here you modify state, but don't ensure that string list is evaluated
>> on both levels. well, it will be ok if you ensure evaluation at _each_
>> call to this function. alternatively, you can force evaluation before
>> assignment by:
>> 
>>  setState ns = do
>>      return $! map length ns
>>      modify (\s -> s {mystate = ns})
>> 

> this is the crucial point. You are forcing evaluation with $! map
> length, I'm doing it with writeFile. I do not see very much
> difference. That's an ad hoc solution. Since I need to write the
> state, instead of (needlessly) looking for the length of it's members,
> I write it...;-)

yes, it's the same. _right_ way is to use speciall deepSeq function
that is like 'seq' but does full evaluation. unfortunately,
haskell/ghc don't give us ready-to-use implementation. equality
testing and other tricks are just the ways to make full evaluation of
this datastructure and they are unnecessary if you really consume it
all

> By the way, the problem is not ns, but s, the old state. The new state
> has been evaluated by the code below (when we display the number of
> lines). So you need to change your code with:

> do s <- getState
>    modify (\s -> s {mystate = ns})    

> otherwise you are going to have the same space leak as my original
> code had. In other word, it is possible to have a user who keeps on
> loading files without looking at any line (in this case getState is
> never called and so there is no "return $! mystate s"!).

this '$!' only suppress one small thunk that contains mystate call on
MyState: mystate (MyState s) => s. it don't evaluates anything else,
so adding s<-getState will not change anything. if you want to do full
evaluatin, you should add the same 'return $! map length state' to the
getState, but instead you can add this just to setState :)))

so, you don't need to use 'map length' in getState. you don't need to
call getState in setState. and you may omit 'map length' call in
setState if you will ensure that all calls to it fully evaluate this
list. for beginning, in you non-toy code, you may add this call and
see that will happen

next, this line

>    modify (\s -> s {mystate = ns})

really produce unevaluated thunk. monad state will point to expression
's {mystate = ns}'. it's better to represent in functional style:
setMyState s ns. as you can see, this thunk points to the old value of s.
while you (and me) know that this value is really unused, RTS (GHC
run-time system) don't know it! as a result, all old file contents are
hold in memory until you evaluate this thunk :)))

instead of 'modify' you should use the following sequence:

  s <- get
  put $! s {mystate = ns}

it's my fault - i don't seen this memory leak until we digged into
each step of evaluation :)

you can run performGC at this moment to ensure that memory will never
contain more than two file's contents. if you want to hold no more
that one contents, use the following:

  setState []
  performGC
  s <- getContents
  setState (lines s)

although, i also suggest to add performGC after each substantial step
(reading, conversion) and then try to remove one after one and see
results

> This produces the same as:
> setState ns =
>     do s <- getState -- that does: return $ mystate s
>        liftIO $ writeFile "/dev/null" s
>        modify (\s -> s {mystate = ns})

as i just understood, the old 's' value is held in memory even after
modify, so evaluating it really allow to reduce memory usage :)

of course, using put$! should be even better

>> if StateT is strict monad transformer, this code don't have space
>> leaks. you create thunks in two places, in one place you already
>> evaluate it, and i wrote what to do in second place.

> Yes, indeed. But just because we forced evaluation in a way that `seq`
> cannot do. Am I right?

yes, the strict evaluation order of this monad gives us possibility to
evaluate thunks just at some steps of execution instead of linking
evaluation to requests of data inside of some structure. strict monad
gives us strict context on which we can rely

> Thanks a lot for you patience. You did help me a lot. And this is not
> the first time. I appreciate. Really.

i like to teach :)

-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com