[C2hs] Re: support for 6.4

Wed May 25 13:06:27 EDT 2005

On Wed, 2005-05-25 at 11:44 +1000, Manuel M T Chakravarty wrote:

> An Alex/Happy parser would be an option if it improves matters
> significantly.  If you or anybody else has a go at it, please follow the
> C language definition closely, as does the existing lexer and parser
> (with comments stating to which sections in K&R the individual
> productions relate).

Intrim status update:

I've started with the lexer, using alex. I've converted clause for
clause the existing lexer, keeping the K&R language definition comments.

The output is exactly the same as for the existing lexer on my test file
of gtk.i (cpp output of gtk/gtk.h) which is 1014K.

It's a tad faster and runs in minimal space. I timed how long it takes
to find the length of the list of tokens (so there's no IO overhead). I
used ghc -O for the lexer modules and all relevant dependent modules. On
my old slow 500Mhz sparc, the existing lexer takes 7.8 seconds and needs
14Mb heap minimum while the alex lexer takes 1.8 seconds and runs in
less than 1Mb heap.

One issue I noticed with the existing lexer is that it seems to be
strict, ie the whole token stream is built before it is returned. The
alex lexer is lazy which would explain the difference in heap usage.
This suggests the existing lexer could be made to perform better if it
were made lazy.

> Moreover, the module c2hs/base/syntax/ParserMonad.hs already provides
> a monad suitable for Happy.

I'll take a look.

As for the lexer/parser interaction required for C, I guess the way to
do this is to make the lexer monad keep the set of 'typedefed'
identifiers and return the appropriate token type to the parser
depending on membership of that set. The lexer monad should also expose
an operation to the parser to add identifiers to the 'typedefed' set
which the parser would use after parseing the appropriate kind of
declaration.

Duncan