[C2hs] Re: support for 6.4

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Mon May 23 08:49:56 EDT 2005


On Mon, 2005-05-23 at 00:42 +0100, Duncan Coutts wrote:
> On Sun, 2005-05-22 at 17:23 +1000, Manuel M T Chakravarty wrote:

> > This just needs a lot of space.
> 
> This is true, it does just have to keep track of a great deal of
> information.
> 
> Still, I wonder if there is something going on that we don't quite
> understand. The serialised dataset for c2hs when processing the Gtk 2.6
> headers is 9.7Mb (this figure does include string sharing but this
> should be mostly happening when in the heap too and even if it isn't,
> it's only a 2x space blowup). I know that when represented in the ghc
> heap it will take more space than this because of all the pointers (and
> finite maps rather than simple lists) but that factor wouldn't account
> for the actual minimum heap requirements which is about 30 times bigger
> than the serialised format.
> 
> Actually, that could be verified experimentally by unserialising the
> dataset and making sure it is all in memory by using deepSeq (this would
> be necessary since we lazily deserialise the dataset).

>From my brief experiment the 9.7 Mb file when deserialised into the heap
takes just over 50Mb of heap space and top reported 47Mb RSS.

I tried another experiment and found that the parsing phase by itself
required over 250Mb of heap space. By the time it got to the name
analysis it requires over 350Mb.

So from that it looks to me that the parser could be improved. The
lexer/parser could be swapped out for another implementation without
affecting any other module.

Perhaps we should look at one based on Alex & Happy. Happy can do
monadic parsers which would allow it to maintain the set of identifiers
needed when parsing C. Alex & Happy can produce pure Haskell98 code (or
ghc specific code for better performance) so the portability of c2hs
would not be affected - unlike our binary serialisation patches which
are use various ghc'isms.

Duncan



More information about the C2hs mailing list