[Haskell-cafe] More Language.C work for Google's Summer of Code
atomb at galois.com
Tue Mar 30 17:14:59 EDT 2010
That's very good to hear!
When it comes to preprocessing and exact printing, I think that there
are various stages of completeness that we could support.
1) Add support for parsing comments to the Language.C parser. Keep
using an external pre-processor but tell it to leave comments in the
source code. The cpphs pre-processor can do this. The trickiest bit
here would have to do with where to record the comments in the AST.
What AST node is a given comment associate with? We could probably
come up with some general rules, and perhaps certain comments, in
weird locations, would still be ignored.
2) Support correct column numbers for source locations. This falls
short of complete macro support, but covers one of the key problems
that macros introduce. The mcpp preprocessor  has a special
diagnostic mode where it adds special comments describing the origin
of code that resulted from macro expansion. If the parser retained
comments, we could use this information to help with exact pretty-
3) Modify the pretty-printer to take position information into
account when pretty-printing (at least optionally). As long as macro
definitions themselves (as well as #ifdef, etc.) are not in the AST,
the output will still not be exactly the same as the input, but it'll
4) Add full support for parsing and expanding macros internally, so
that both macro definitions and expansions appear in the Language.C
AST. This is probably a huge project, partly because macros do not
have to obey the tree structure of the C language in any way. This is
perhaps beyond the scope of a summer project, but the other steps
could help prepare for it in the future, and still fully address some
of the problems caused by the preprocessor along the way.
Do you think you'd be interested in some subset or variation of 1, 2,
and 3? Are there other ideas you have? Things I've missed? Things
you'd do differently?
On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:
> I'd be very much interested in working on this library for GSoC. I'm
> currently working on an idea for another project, but I'm not certain
> how widely beneficial it would be. The preprocessor and
> pretty-printing projects sound especially intriguing.
> On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <atomb at galois.com> wrote:
>> I'm wondering whether there's anyone on the list with an interest
>> in doing
>> additional work on the Language.C library for the Summer of Code.
>> There are
>> a few enhancements that I'd be very interested seeing, and I'd love
>> be a
>> mentor for such a project if there's a student interested in
>> working on
>> The first is to integrate preprocessing into the library.
>> Currently, the
>> library calls out to GCC to preprocess source files before parsing
>> This has some unfortunate consequences, however, because comments
>> and macro
>> information are lost. A number of program analyses could benefit from
>> metadata encoded in comments, because C doesn't have any sort of
>> annotation mechanism, but in the current state we have to resort to
>> hacks (at best) to get at the contents of comments. Also, effective
>> diagnostic messages need to be closely tied to original source
>> code. In the
>> presence of pre-processed macros, column number information is
>> so it can be difficult to describe to a user exactly what portion
>> of a
>> program a particular analysis refers to. An integrated preprocessor
>> retain comments and remember information about macros, eliminating
>> both of
>> these problems.
>> The second possible project is to create a nicer interface for
>> over Language.C ASTs. Currently, the symbol table is built to
>> include only
>> information about global declarations and those other declarations
>> in scope. Therefore, when performing multiple traversals over an
>> AST, each
>> traversal must re-analyze all global declarations and the entire
>> AST of the
>> function of interest. A better solution might be to build a
>> traversal that
>> creates a single symbol table describing all declarations in a
>> unit (including function- and block-scoped variables), for easy
>> during further traversals. It may also be valuable to have this
>> produce a slightly-simplified AST in the process. I'm not thinking of
>> anything as radical as the simplifications performed by something
>> like CIL,
>> however. It might simply be enough to transform variable references
>> into a
>> form suitable for easy lookup in a complete symbol table like I've
>> described. Other simple transformations such as making all implicit
>> explicit, or normalizing compound initializers, could also be good.
>> A third possibility, which would probably depend on the integrated
>> preprocessor, would be to create an exact pretty-printer. That is, a
>> pretty-printing function such that pretty . parse is the identity.
>> Currently, parse . pretty should be the identity, but it's not true
>> other way around. An exact pretty-printer would be very useful in
>> rich presentations of C source code --- think LXR on steroids.
>> If you're interested in any combination of these, or anything
>> similar, let
>> me know. The deadline is approaching quickly, but I'd be happy to
>> together with a student to flesh any of these out into a full
>> Aaron Tomb
>> Galois, Inc. (http://www.galois.com)
>> atomb at galois.com
>> Phone: (503) 808-7206
>> Fax: (503) 350-0833
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
More information about the Haskell-Cafe