[Haskell-cafe] More Language.C work for Google's Summer of Code

Tue Mar 30 20:11:49 EDT 2010

On Tue, Mar 30, 2010 at 5:14 PM, Aaron Tomb <atomb at galois.com> wrote:
> That's very good to hear!
>
> When it comes to preprocessing and exact printing, I think that there are
> various stages of completeness that we could support.
>
>  1) Add support for parsing comments to the Language.C parser. Keep using an
> external pre-processor but tell it to leave comments in the source code. The
> cpphs pre-processor can do this. The trickiest bit here would have to do
> with where to record the comments in the AST. What AST node is a given
> comment associate with? We could probably come up with some general rules,
> and perhaps certain comments, in weird locations, would still be ignored.

>
>  2) Support correct column numbers for source locations. This falls short of
> complete macro support, but covers one of the key problems that macros
> introduce. The mcpp preprocessor [1] has a special diagnostic mode where it
> adds special comments describing the origin of code that resulted from macro
> expansion. If the parser retained comments, we could use this information to
> help with exact pretty-printing.
>
>  3) Modify the pretty-printer to take position information into account when
> pretty-printing (at least optionally). As long as macro definitions
> themselves (as well as #ifdef, etc.) are not in the AST, the output will
> still not be exactly the same as the input, but it'll come closer.
>
>  4) Add full support for parsing and expanding macros internally, so that
> both macro definitions and expansions appear in the Language.C AST. This is
> probably a huge project, partly because macros do not have to obey the tree
> structure of the C language in any way. This is perhaps beyond the scope of
> a summer project, but the other steps could help prepare for it in the
> future, and still fully address some of the problems caused by the
> preprocessor along the way.
I haven't looked at the C spec on macros, but I'm pretty motivated and
would like to shoot for a big project.

>
> Do you think you'd be interested in some subset or variation of 1, 2, and 3?
> Are there other ideas you have? Things I've missed? Things you'd do
> differently?

I'm very interested in all 3 of them, and actually somewhat in #4,
though I'll have to do some reading to understand why you're saying
it's such a big undertaking.

>
> Thanks,
> Aaron
>
>
> [1] http://mcpp.sourceforge.net/
>
>
> On Mar 30, 2010, at 1:46 PM, Edward Amsden wrote:
>
>> I'd be very much interested in working on this library for GSoC. I'm
>> currently working on an idea for another project, but I'm not certain
>> how widely beneficial it would be. The preprocessor and
>> pretty-printing projects sound especially intriguing.
>>
>> On Tue, Mar 30, 2010 at 1:30 PM, Aaron Tomb <atomb at galois.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm wondering whether there's anyone on the list with an interest in
>>> doing
>>> additional work on the Language.C library for the Summer of Code. There
>>> are
>>> a few enhancements that I'd be very interested seeing, and I'd love be a
>>> mentor for such a project if there's a student interested in working on
>>> them.
>>>
>>> The first is to integrate preprocessing into the library. Currently, the
>>> library calls out to GCC to preprocess source files before parsing them.
>>> This has some unfortunate consequences, however, because comments and
>>> macro
>>> information are lost. A number of program analyses could benefit from
>>> metadata encoded in comments, because C doesn't have any sort of formal
>>> annotation mechanism, but in the current state we have to resort to ugly
>>> hacks (at best) to get at the contents of comments. Also, effective
>>> diagnostic messages need to be closely tied to original source code. In
>>> the
>>> presence of pre-processed macros, column number information is
>>> unreliable,
>>> so it can be difficult to describe to a user exactly what portion of a
>>> program a particular analysis refers to. An integrated preprocessor could
>>> retain comments and remember information about macros, eliminating both
>>> of
>>> these problems.
>>>
>>> The second possible project is to create a nicer interface for traversals
>>> over Language.C ASTs. Currently, the symbol table is built to include
>>> only
>>> information about global declarations and those other declarations
>>> currently
>>> in scope. Therefore, when performing multiple traversals over an AST,
>>> each
>>> traversal must re-analyze all global declarations and the entire AST of
>>> the
>>> function of interest. A better solution might be to build a traversal
>>> that
>>> creates a single symbol table describing all declarations in a
>>> translation
>>> unit (including function- and block-scoped variables), for easy reference
>>> during further traversals. It may also be valuable to have this traversal
>>> produce a slightly-simplified AST in the process. I'm not thinking of
>>> anything as radical as the simplifications performed by something like
>>> CIL,
>>> however. It might simply be enough to transform variable references into
>>> a
>>> form suitable for easy lookup in a complete symbol table like I've just
>>> described. Other simple transformations such as making all implicit casts
>>> explicit, or normalizing compound initializers, could also be good.
>>>
>>> A third possibility, which would probably depend on the integrated
>>> preprocessor, would be to create an exact pretty-printer. That is, a
>>> pretty-printing function such that pretty . parse is the identity.
>>> Currently, parse . pretty should be the identity, but it's not true the
>>> other way around. An exact pretty-printer would be very useful in
>>> creating
>>> rich presentations of C source code --- think LXR on steroids.
>>>
>>> If you're interested in any combination of these, or anything similar,
>>> let
>>> me know. The deadline is approaching quickly, but I'd be happy to work
>>> together with a student to flesh any of these out into a full proposal.
>>>
>>> Thanks,
>>> Aaron
>>>
>>> --
>>> Aaron Tomb
>>> Galois, Inc. (http://www.galois.com)
>>> atomb at galois.com
>>> Phone: (503) 808-7206
>>> Fax: (503) 350-0833
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>