[C2hs] Reducing c2hs memory requirements
Manuel M T Chakravarty
chak at cse.unsw.edu.au
Sun Mar 13 19:43:57 EST 2005
To be honest, I am not too keen on using files to buffer the AST and
symbol tables, as it adds another level of complexity. Moreover, an
operation that is performed whenever C declarations are analysed is
declaration chasing, where name analysis (using functions from CTrav.hs)
follows declarations involving typedefs. Depending on where the types
are defined, access can be non-local and may slow everything down quite
Do you have any idea which data structures take most of the space? Or
is it just that after expanding the header files of GTK widgets, the
resulting C pre-processor output is so large that it just takes a lot of
space to hold it?
On Sun, 2005-01-30 at 16:34 +0000, Duncan Coutts wrote:
> Hi all,
> For people building gtk2hs, we've found that the large amount of heap
> space required by c2hs to be a problem. It means people with older
> machines with less than about 400Mb of RAM cannot build gtk2hs.
> For recent versions of Gtk, parsing and name analysis requires 350m of
> heap space. (ie, runing c2hs +RTS -M340m -RTS will run out of heap, but
> c2hs +RTS -M350m -RTS will be ok).
> So we've been pondering how to reduce the heap requirements. The key
> point is that we do not want to have to keep the whole of the AST +
> symbol maps in memory at once.
> For the parsing phase, this should not be a problem, the parsers works
> declaration by declaration, the only thing that is accumulated is the
> set of typedef names. There are two options here, we could write out
> each declaration one at a time to another file using the binary
> serialisation framework. Alternatively, if the list of declarations
> could be returned lazily by the parser then that should work ok.
> The harder bit is the name analysis. It reads the declaration list in a
> linear pattern (so it should work well with a lazy parser or a list of
> declarations deserialised one by one out of a file). The CTagNS
> namespace and CDefTable seem to be write only; which is good as they
> could be written out to file immediately. The CShadowNS is not generated
> during the name analysis phase. The CObjNS namespace is trickier since
> it is both written and used for lookups. We could live with keeping this
> one in memory or alternatively it should be possible to both write the
> map bit by bit and do random reads for the lookups. The lookups
> themselves do not retain any heap since they immediately write the value
> out into another map.
> The name analysis phase actually doesn't use many map lookup/insert
> operations. If each of these could be re-defined locally to work in the
> local NA monad and then the NA monad extended to know the files we are
> reading from/to then in the runNS we could switch between doing
> lookups/inserts from in heap FiniteMaps or to/from files.
> runNA :: NA a -> Either AttrC Files -> a -> CST s (Either AttrC Files)
> My point is that we wouldn't need to change any existing code paths. The
> use of intermediate files could even be controlled by a
> --conserve-memory flag or something (since it would probably slow down
> the cases where currently everything fits into memory).
> Just looking for feedback; particularly from Manuel as to whether he
> thinks this is a plan worth pursuing.
> C2hs mailing list
> C2hs at haskell.org
More information about the C2hs