[C2hs] Re: [Gtk2hs-users] Profiling c2hs

Wed May 5 10:17:54 EDT 2004

On 5 May 2004, at 06:08, Duncan Coutts wrote:

> All,
>
> I was writing a demo app to add to gtk2hs's small collection; it's a
> viewer for ghc's time profile log files, and I needed a large profile
> file to test it on so I thought I'd try profiling c2hs. Here's what I
> found:
>
> Looking at allocations (which corresponds quite closely with time):
> Note that these are % of the total cost, not % of their parent cost
>
> 11.7% is taken by the C & chs lexers which is not unreasonable.
> 77.4% is taken by the CParser
>
> Within the CParser:
> 72.0% is taken by parseCExtDeclList, within which
> 65.5% is taken by morphTypeNames (and 58.2% of the runtime)
> 31.4% alloc & 26.5% time was taken just by calls to Prelude.elem within
> morphTypeNames.
>
> In this particular run of c2hs, in which it was processing some large
> gtk+ headers, it called morphTypeNames over 55 million times.
>
> I narrowed this down a bit more and found that it is the call to
> morphTypeNames from parseCExtDeclList that takes the time, the call 
> from
> parseCHeader is insignificant. Furthermore, it is the recursive calls 
> to
> morphTypeNames that is expensive; parseCExtDeclList only called
> morphTypeNames about 1000 times (compared to 55 million recursive
> calls).
>
> So what's going and and what can we do?
>
> If I understand the code correctly, the reason we are doing this
> morphTypeNames is because we can't distinguish typedef'ed identifiers 
> in
> the parser, we need to remember which identifiers were typedefs and 
> then
> treat them differently in the parser later if they were. At the moment
> this is done by after parsing each declaration (which might contain
> typedefs) modifying the remaining stream of tokens changing all
> identifiers that have been discovered to be typedefs from ordinary
> identifier tokens to typedef-identifier tokens. This is expensive.
>
> Perhaps it would be possible to add this set of typedef identifiers to
> the parser state monad and thread it through to the parsers for cid and
> typedefName which could consult this parser state and match
> conditionally on the identifier token being in the typedef-identifier
> set or not. (cidOrTN would not need to to a lookup) This would avoid
> traversing the token stream after parsing each declaration.
>
> Sound reasonable?
>
It might be worth a shot. Eliminating this function call would 
certainly speed up things, however, c2hs seems to be slow due to its 
huge memory footprint at runtime which probably doesn't change 
(although it might).

Nice find,

Axel.