[C2hs] Reducing c2hs memory requirements

Sun Mar 13 19:43:57 EST 2005

Duncan,

To be honest, I am not too keen on using files to buffer the AST and
symbol tables, as it adds another level of complexity.  Moreover, an
operation that is performed whenever C declarations are analysed is
declaration chasing, where name analysis (using functions from CTrav.hs)
follows declarations involving typedefs.  Depending on where the types
are defined, access can be non-local and may slow everything down quite
a bit.

Do you have any idea which data structures take most of the space?  Or
is it just that after expanding the header files of GTK widgets, the
resulting C pre-processor output is so large that it just takes a lot of
space to hold it?

Manuel

On Sun, 2005-01-30 at 16:34 +0000, Duncan Coutts wrote:
> Hi all,
> 
> For people building gtk2hs, we've found that the large amount of heap
> space required by c2hs to be a problem. It means people with older
> machines with less than about 400Mb of RAM cannot build gtk2hs.
> 
> For recent versions of Gtk, parsing and name analysis requires 350m of
> heap space. (ie, runing c2hs +RTS -M340m -RTS will run out of heap, but
> c2hs +RTS -M350m -RTS will be ok).
> 
> So we've been pondering how to reduce the heap requirements. The key
> point is that we do not want to have to keep the whole of the AST +
> symbol maps in  memory at once.
> 
> For the parsing phase, this should not be a problem, the parsers works
> declaration by declaration, the only thing that is accumulated is the
> set of typedef names. There are two options here, we could write out
> each declaration one at a time to another file using the binary
> serialisation framework. Alternatively, if the list of declarations
> could be returned lazily by the parser then that should work ok.
> 
> The harder bit is the name analysis. It reads the declaration list in a
> linear pattern (so it should work well with a lazy parser or a list of
> declarations deserialised one by one out of a file). The CTagNS
> namespace and CDefTable seem to be write only; which is good as they
> could be written out to file immediately. The CShadowNS is not generated
> during the name analysis phase. The CObjNS namespace is trickier since
> it is both written and used for lookups. We could live with keeping this
> one in memory or alternatively it should be possible to both write the
> map bit by bit and do random reads for the lookups. The lookups
> themselves do not retain any heap since they immediately write the value
> out into another map.
> 
> The name analysis phase actually doesn't use many map lookup/insert
> operations. If each of these could be re-defined locally to work in the
> local NA monad and then the NA monad extended to know the files we are
> reading from/to then in the runNS we could switch between doing
> lookups/inserts from in heap FiniteMaps or to/from files.
> 
> runNA :: NA a -> Either AttrC Files -> a -> CST s (Either AttrC Files)
> 
> My point is that we wouldn't need to change any existing code paths. The
> use of intermediate files could even be controlled by a
> --conserve-memory flag or something (since it would probably slow down
> the cases where currently everything fits into memory).
> 
> Just looking for feedback; particularly from Manuel as to whether he
> thinks this is a plan worth pursuing.
> 
> Duncan
> 
> _______________________________________________
> C2hs mailing list
> C2hs at haskell.org
> http://www.haskell.org/mailman/listinfo/c2hs