[Haskell-cafe] More Language.C work for Google's Summer of Code

Serguey Zefirov sergueyz at gmail.com
Tue Mar 30 13:55:21 EDT 2010


I tried to devise a C preprocessor, but then I figured out that I
could write something like that:
---------------------------
#define A(arg) A_start (arg) A_end

#define A_start "this is A_start definition."
#define A_end "this is A_end definition."

A (
#undef A_start
#define A_start A_end
)
---------------------------

gcc preprocesses it into the following:
---------------------------
# 1 "a.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "a.c"





"this is A_end definition." () "this is A_end definition."
---------------------------

Another woes are filenames in angle brackets for #include. They
require special case for tokenizer.

So I given it (fully compliant C preprocessor) up. ;)

Other than that, C preprocessor looks simple.

I hardly qualify as a student, though.

2010/3/30 Aaron Tomb <atomb at galois.com>:
> The first is to integrate preprocessing into the library. Currently, the
> library calls out to GCC to preprocess source files before parsing them.
> This has some unfortunate consequences, however, because comments and macro
> information are lost. A number of program analyses could benefit from
> metadata encoded in comments, because C doesn't have any sort of formal
> annotation mechanism, but in the current state we have to resort to ugly
> hacks (at best) to get at the contents of comments. Also, effective
> diagnostic messages need to be closely tied to original source code. In the
> presence of pre-processed macros, column number information is unreliable,
> so it can be difficult to describe to a user exactly what portion of a
> program a particular analysis refers to. An integrated preprocessor could
> retain comments and remember information about macros, eliminating both of
> these problems.
>
> The second possible project is to create a nicer interface for traversals
> over Language.C ASTs. Currently, the symbol table is built to include only
> information about global declarations and those other declarations currently
> in scope. Therefore, when performing multiple traversals over an AST, each
> traversal must re-analyze all global declarations and the entire AST of the
> function of interest. A better solution might be to build a traversal that
> creates a single symbol table describing all declarations in a translation
> unit (including function- and block-scoped variables), for easy reference
> during further traversals. It may also be valuable to have this traversal
> produce a slightly-simplified AST in the process. I'm not thinking of
> anything as radical as the simplifications performed by something like CIL,
> however. It might simply be enough to transform variable references into a
> form suitable for easy lookup in a complete symbol table like I've just
> described. Other simple transformations such as making all implicit casts
> explicit, or normalizing compound initializers, could also be good.
>
> A third possibility, which would probably depend on the integrated
> preprocessor, would be to create an exact pretty-printer. That is, a
> pretty-printing function such that pretty . parse is the identity.
> Currently, parse . pretty should be the identity, but it's not true the
> other way around. An exact pretty-printer would be very useful in creating
> rich presentations of C source code --- think LXR on steroids.
>
> If you're interested in any combination of these, or anything similar, let
> me know. The deadline is approaching quickly, but I'd be happy to work
> together with a student to flesh any of these out into a full proposal.
>
> Thanks,
> Aaron
>
> --
> Aaron Tomb
> Galois, Inc. (http://www.galois.com)
> atomb at galois.com
> Phone: (503) 808-7206
> Fax: (503) 350-0833
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


More information about the Haskell-Cafe mailing list