[C2hs] yet another C parser

Jelmer Vernooij jelmer at samba.org
Sun Oct 8 21:47:18 EDT 2006


Hi Duncan,

On Mon, 2006-10-09 at 00:26 +0100, Duncan Coutts wrote:
> over the weekend I had another stab at fixing issues in the c2hs C
> parser. I've been annoyed for some time that the basic typedef problem
> is not solved. I know some of the other problems are the various GNU
> extensions.
> 
> Recall that there is this annoying context-dependency in the C grammar,
> that parsing depends on if an identifier had been declared as a
> typedef'ed identifier withing the enclosing scope.
> 
> There's two things the current grammar gets wrong. One is that it
> doesn't accept identifiers & typedefs correctly in all the right
> situations. This was because my attempts to add them in the right places
> led to huge numbers of shift/reduce conflicts in the grammar. The other
> is that it doesn't do nested scopes properly, it assumes only one global
> scope. So currently, typedef'ed names to not go out of scope when they
> should do.
> 
> So I started with James A. Roskind's C grammar for YACC. Google for it,
> there's lots of info on it. Anyway, I ported that to happy and indeed it
> does build with just the one expected shift/reduce conflict (for
> if/then/else).
> 
> So the plan I suppose, would be to integrate this with the current
> lexer. This involves adding the semantic actions at just the right
> points to have typedef'ed names added and removed from the typedef'ed
> name set at just the right times and then testing the parser on a bunch
> of nasty torture cases. There are some suggested by Roskind here:
> 
> http://compilers.iecc.com/comparch/article/92-01-056
> 
> eg, try this one:
> typedef int A, B(A);
> 
> which should be equivalent to:
> 
> typedef int A;
> typedef int B(int);
> /* B's type is a function taking an int and returning an int */
> 
> Terrifying stuff :-)
> 
> 
> I was also looking at a cheap automatic test for the parser. The idea is
> just to try and parse every .h file in /usr/include. We can filter out
> just the ones that compile on their own with gcc (since some need extra
> -I dirs or -D defines).
> 
> At the moment on my machine with the current c2hs parser I get:
> 
> 225 headers could be parsed ok
> 38 headers failed with parse errors
> 
> Of the failures, most are related to __attribute__ of some sort. Some
> are C99 features like restrict or _Bool. There are a few otherwise
> uncategorised parse errors.

> We could probably extend this style of automatic testing to standard C
> packages by providing a c2hs wrapper that pretends to be gcc. Eg we'd
> use something like:
> export CC="c2hs-ccwrapper"
> export LD="c2hs-ldwrapper"
> ./configure
> make
> 
> to make the build system work ok we'd need to produce dummy .o files
> etc. We might be able to test vast amounts of C code this way.
Yeah, that'd certainly make sense. I have to manually trim the includes
on my system when writing .chs files because of system headers that c2hs
can't parse because of weird attributes.

Cheers,

Jelmer
-- 
Jelmer Vernooij <jelmer at samba.org> - http://samba.org/~jelmer/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://www.haskell.org/pipermail/c2hs/attachments/20061009/14992711/attachment.bin


More information about the C2hs mailing list