garbage collection and other newbie's issues
bulat.ziganshin at gmail.com
Fri Oct 20 14:21:51 EDT 2006
Friday, October 20, 2006, 2:23:16 PM, you wrote:
> well, you gave me a wonderfully clear introduction to Haskell GC, and
> now I have a better understanding of the output of the various
> profiling I'm doing. Thank you very much!
i had the same problems too :)
> Still, I cannot understand my specific problem, that is to say, why
> the function that reads a file retains so much memory.
> I did some test and the results are puzzling:
> - I tried reading the feed and directly converting it into the opml
> chunk to be inserted into the opml component of my StateT monad. The
> problem becomes far worse. Here the output of a heap profile:
> as you can see, after opening one feed (397868 bytes), closing it, opening another
> one (410052 bytes), closing it and reopening the first one brings
> memory consumption to 152 Mega.
first, GC don't occurs automatically when you close file. you can help
GHC by using performGC from System.Mem. i does it in my own prog
second, each Char in ghc occupies 12 bytes (!), so each of your files
occupies about 5 mb of memory. if you will count the previous problem,
the 2 or 3 files can be hels in memory at the same time (just because
they was not yet GCd) so memory usage may become, say, 10 mb
multiplying this at 2.5 or even 3 factor which i described in previous
letter means, say, 30 mb used
and then the only description i've found why 150 megs are used it's
because multiple copies of the same data is hold in different forms -
i.e. first copy is original file contents as one large string, second
is contents splitted to lines, third is internal feed format, so on
> Using the intermediate datatype (that is to say, reading the feed,
> transforming it into my datatype and then to the opml tree), reduces
> the problem:
> only 92 Mega of memory consumption for the very same operations.
> Making the intermediate datatype strict gives almost the same results:
> 98 Mega.
using +RTS -c option and performGC after building opml tree and after
closing feed should help you
> Now, I come to believe the file reading is indeed strict, and that
> my problem could be related to StateT laziness.
> Does this makes sense?
i'm not sure but guess that all monad transformers are strict. hope
that someone else will clear this point
> I'm now going to try to implement my opml state as a IORef and use a
> ReaderT monad to see if something new happens.
>> ps: if your program uses a lot if string, FPS will be a very great. it
>> don;t change the GC behavior, just makes everything 10 times smaller
> yes, but I'm using HXT and this is using normal strings to store xml
> text nodes. So I could have some improvements with IO but not that much
> in memory consumption, unless I totally change my implementation.
ask library authors :)
> Anyway, even if I could reduce from 152 to 15 mega the memory
> consumption for reading 2 feeds, I'd be running out of memory, on my
> laptop, in one day instead that 5 minutes. Anyway I should face the
> fact that it is not the string implementation in Haskell that is
> causing the problem. The problem is probably me!
it is the world where we live :) yesterday i need to close Windows
Task Manager because it was running for 2 weeks and its memory usage
grown to 100 megs! :)
but things is not so bad. with +RTS -c switch your program will reach
some maximum memory usage and it will not grow further. or you can use
alternatively +RTS -F2 switch - it will be faster that -c, require
more memory and will not suffer from one GHC bug
plus, running performGC at right points (right when you have a lot of
garbage) should substantially decrease this maximum
try it and please write me about results
Bulat mailto:Bulat.Ziganshin at gmail.com
More information about the Haskell-Cafe