[C2hs] Reducing c2hs memory requirements
Manuel M T Chakravarty
chak at cse.unsw.edu.au
Sun Mar 13 19:43:57 EST 2005
Duncan,
To be honest, I am not too keen on using files to buffer the AST and
symbol tables, as it adds another level of complexity. Moreover, an
operation that is performed whenever C declarations are analysed is
declaration chasing, where name analysis (using functions from CTrav.hs)
follows declarations involving typedefs. Depending on where the types
are defined, access can be non-local and may slow everything down quite
a bit.
Do you have any idea which data structures take most of the space? Or
is it just that after expanding the header files of GTK widgets, the
resulting C pre-processor output is so large that it just takes a lot of
space to hold it?
Manuel
On Sun, 2005-01-30 at 16:34 +0000, Duncan Coutts wrote:
> Hi all,
>
> For people building gtk2hs, we've found that the large amount of heap
> space required by c2hs to be a problem. It means people with older
> machines with less than about 400Mb of RAM cannot build gtk2hs.
>
> For recent versions of Gtk, parsing and name analysis requires 350m of
> heap space. (ie, runing c2hs +RTS -M340m -RTS will run out of heap, but
> c2hs +RTS -M350m -RTS will be ok).
>
> So we've been pondering how to reduce the heap requirements. The key
> point is that we do not want to have to keep the whole of the AST +
> symbol maps in memory at once.
>
> For the parsing phase, this should not be a problem, the parsers works
> declaration by declaration, the only thing that is accumulated is the
> set of typedef names. There are two options here, we could write out
> each declaration one at a time to another file using the binary
> serialisation framework. Alternatively, if the list of declarations
> could be returned lazily by the parser then that should work ok.
>
> The harder bit is the name analysis. It reads the declaration list in a
> linear pattern (so it should work well with a lazy parser or a list of
> declarations deserialised one by one out of a file). The CTagNS
> namespace and CDefTable seem to be write only; which is good as they
> could be written out to file immediately. The CShadowNS is not generated
> during the name analysis phase. The CObjNS namespace is trickier since
> it is both written and used for lookups. We could live with keeping this
> one in memory or alternatively it should be possible to both write the
> map bit by bit and do random reads for the lookups. The lookups
> themselves do not retain any heap since they immediately write the value
> out into another map.
>
> The name analysis phase actually doesn't use many map lookup/insert
> operations. If each of these could be re-defined locally to work in the
> local NA monad and then the NA monad extended to know the files we are
> reading from/to then in the runNS we could switch between doing
> lookups/inserts from in heap FiniteMaps or to/from files.
>
> runNA :: NA a -> Either AttrC Files -> a -> CST s (Either AttrC Files)
>
> My point is that we wouldn't need to change any existing code paths. The
> use of intermediate files could even be controlled by a
> --conserve-memory flag or something (since it would probably slow down
> the cases where currently everything fits into memory).
>
> Just looking for feedback; particularly from Manuel as to whether he
> thinks this is a plan worth pursuing.
>
> Duncan
>
> _______________________________________________
> C2hs mailing list
> C2hs at haskell.org
> http://www.haskell.org/mailman/listinfo/c2hs
More information about the C2hs
mailing list