[C2hs] Reducing c2hs memory requirements

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Thu Mar 24 15:01:10 EST 2005


On Mon, 2005-03-14 at 11:43 +1100, Manuel M T Chakravarty wrote:
> Duncan,
> 
> To be honest, I am not too keen on using files to buffer the AST and
> symbol tables, as it adds another level of complexity.

Sadly this is quite true. I was commenting to some other Haskell hackers
that there's probably a paper or so in designing a framwork / techniques
for writing external algorithms in Haskell (ones where the dataset is
not expected to fit in main memory).

>   Moreover, an
> operation that is performed whenever C declarations are analysed is
> declaration chasing, where name analysis (using functions from CTrav.hs)
> follows declarations involving typedefs.  Depending on where the types
> are defined, access can be non-local and may slow everything down quite
> a bit.

Right, as I recall one of the maps was read-write but the others were
write only.

> Do you have any idea which data structures take most of the space?  Or
> is it just that after expanding the header files of GTK widgets, the
> resulting C pre-processor output is so large that it just takes a lot of
> space to hold it?

I know the preprocessed header is large (765K for gtk 2.4) but I think
c2hs's use is more than one would expect for that. I tried running c2hs
on that 765K gtk.i file just now without any +RTS -M650m -RTS heap
limit. I had to kill it after it had allocated 1.3Gb on my machine which
has 1Gb of RAM and brought everything to a crawl. With a memory limit in
place it can complete using 'only' 650Mb. So that's a 10x memory use
compared to the original file.

I have not been able to figure out exactly which bit of the data
structure is taking so much space. I've found GHC's space profiling
tools just don't tell me that (or I don't understand the profiling
output enough).

I suspect that there is a great deal of the AST that is kept but is
never used. But I cannot pinpoint anything. I don't think it is the
strings themselves. Using a sharing symbol table (like ghc uses) and
packed strings made an insignificant difference in my tests.

The space profiling does show that the finite maps take a very large
proprotion of the space compared to the AST (but maybe I'm misreading
the profile graphs since the maps are also retainers for bits of the
AST)

Sorry this isn't teribly helpful.

Duncan



More information about the C2hs mailing list