[C2hs] Reducing c2hs memory requirements

Sun Jan 30 11:34:11 EST 2005

Hi all,

For people building gtk2hs, we've found that the large amount of heap
space required by c2hs to be a problem. It means people with older
machines with less than about 400Mb of RAM cannot build gtk2hs.

For recent versions of Gtk, parsing and name analysis requires 350m of
heap space. (ie, runing c2hs +RTS -M340m -RTS will run out of heap, but
c2hs +RTS -M350m -RTS will be ok).

So we've been pondering how to reduce the heap requirements. The key
point is that we do not want to have to keep the whole of the AST +
symbol maps in  memory at once.

For the parsing phase, this should not be a problem, the parsers works
declaration by declaration, the only thing that is accumulated is the
set of typedef names. There are two options here, we could write out
each declaration one at a time to another file using the binary
serialisation framework. Alternatively, if the list of declarations
could be returned lazily by the parser then that should work ok.

The harder bit is the name analysis. It reads the declaration list in a
linear pattern (so it should work well with a lazy parser or a list of
declarations deserialised one by one out of a file). The CTagNS
namespace and CDefTable seem to be write only; which is good as they
could be written out to file immediately. The CShadowNS is not generated
during the name analysis phase. The CObjNS namespace is trickier since
it is both written and used for lookups. We could live with keeping this
one in memory or alternatively it should be possible to both write the
map bit by bit and do random reads for the lookups. The lookups
themselves do not retain any heap since they immediately write the value
out into another map.

The name analysis phase actually doesn't use many map lookup/insert
operations. If each of these could be re-defined locally to work in the
local NA monad and then the NA monad extended to know the files we are
reading from/to then in the runNS we could switch between doing
lookups/inserts from in heap FiniteMaps or to/from files.

runNA :: NA a -> Either AttrC Files -> a -> CST s (Either AttrC Files)

My point is that we wouldn't need to change any existing code paths. The
use of intermediate files could even be controlled by a
--conserve-memory flag or something (since it would probably slow down
the cases where currently everything fits into memory).

Just looking for feedback; particularly from Manuel as to whether he
thinks this is a plan worth pursuing.

Duncan