[C2hs] Re: support for 6.4

Manuel M T Chakravarty chak at cse.unsw.edu.au
Thu May 26 02:46:57 EDT 2005


Duncan Coutts:
> On Wed, 2005-05-25 at 11:44 +1000, Manuel M T Chakravarty wrote:
> 
> > An Alex/Happy parser would be an option if it improves matters
> > significantly.  If you or anybody else has a go at it, please follow the
> > C language definition closely, as does the existing lexer and parser
> > (with comments stating to which sections in K&R the individual
> > productions relate).
> 
> Intrim status update:
> 
> I've started with the lexer, using alex. I've converted clause for
> clause the existing lexer, keeping the K&R language definition comments.
> 
> The output is exactly the same as for the existing lexer on my test file
> of gtk.i (cpp output of gtk/gtk.h) which is 1014K.
> 
> It's a tad faster and runs in minimal space. I timed how long it takes
> to find the length of the list of tokens (so there's no IO overhead). I
> used ghc -O for the lexer modules and all relevant dependent modules. On
> my old slow 500Mhz sparc, the existing lexer takes 7.8 seconds and needs
> 14Mb heap minimum while the alex lexer takes 1.8 seconds and runs in
> less than 1Mb heap.

Cool.

> One issue I noticed with the existing lexer is that it seems to be
> strict, ie the whole token stream is built before it is returned. The
> alex lexer is lazy which would explain the difference in heap usage.
> This suggests the existing lexer could be made to perform better if it
> were made lazy.

When I originally wrote the lexer, I benchmarked a number of variants.
The result was that the strict one was more efficient, but that was long
ago with an ancient version of ghc.

> > Moreover, the module c2hs/base/syntax/ParserMonad.hs already provides
> > a monad suitable for Happy.
> 
> I'll take a look.
> 
> As for the lexer/parser interaction required for C, I guess the way to
> do this is to make the lexer monad keep the set of 'typedefed'
> identifiers and return the appropriate token type to the parser
> depending on membership of that set. The lexer monad should also expose
> an operation to the parser to add identifiers to the 'typedefed' set
> which the parser would use after parseing the appropriate kind of
> declaration.

Sounds right to me.

Cheers,
Manuel




More information about the C2hs mailing list