[Haskell-cafe] More Language.C work for Google's Summer of Code

Tue Mar 30 16:56:08 EDT 2010

Yes, that would definitely be one productive way forward. One concern  
is that Language.C is BSD-licensed (and it would be nice to keep it  
that way), and cpphs is LGPL. However, if cpphs remained a separate  
program, producing C + extra stuff as output, and the Language.C  
parser understood the extra stuff, this could accomplish what I'm  
interested in. It would be interesting, even, to just extend the  
Language.C parser to support comments, and to tell cpphs to leave them  
in.

There's also another pre-processor, mcpp [1], that is quite featureful  
and robust, and which supports an output mode with special syntax  
describing the origin of the code resulting from macro expansion.

Aaron

[1] http://mcpp.sourceforge.net/

On Mar 30, 2010, at 12:14 PM, austin seipp wrote:

> (sorry for the dupe aaron! forgot to add haskell-cafe to senders  
> list!)
>
> Perhaps the best course of action would be to try and extend cpphs to
> do things like this? From the looks of the interface, it can already
> do some of these things e.g. do not strip comments from a file:
>
> http://hackage.haskell.org/packages/archive/cpphs/1.11/doc/html/Language-Preprocessor-Cpphs.html#t%3ABoolOptions
>
> Malcolm would have to attest to how complete it is w.r.t. say, gcc's
> preprocessor, but if this were to be a SOC project, extending cpphs to
> include needed functionality would probably be much more realistic
> than writing a new one.
>
> On Tue, Mar 30, 2010 at 12:30 PM, Aaron Tomb <atomb at galois.com> wrote:
>> Hello,
>>
>> I'm wondering whether there's anyone on the list with an interest  
>> in doing
>> additional work on the Language.C library for the Summer of Code.  
>> There are
>> a few enhancements that I'd be very interested seeing, and I'd love  
>> be a
>> mentor for such a project if there's a student interested in  
>> working on
>> them.
>>
>> The first is to integrate preprocessing into the library.  
>> Currently, the
>> library calls out to GCC to preprocess source files before parsing  
>> them.
>> This has some unfortunate consequences, however, because comments  
>> and macro
>> information are lost. A number of program analyses could benefit from
>> metadata encoded in comments, because C doesn't have any sort of  
>> formal
>> annotation mechanism, but in the current state we have to resort to  
>> ugly
>> hacks (at best) to get at the contents of comments. Also, effective
>> diagnostic messages need to be closely tied to original source  
>> code. In the
>> presence of pre-processed macros, column number information is  
>> unreliable,
>> so it can be difficult to describe to a user exactly what portion  
>> of a
>> program a particular analysis refers to. An integrated preprocessor  
>> could
>> retain comments and remember information about macros, eliminating  
>> both of
>> these problems.
>>
>> The second possible project is to create a nicer interface for  
>> traversals
>> over Language.C ASTs. Currently, the symbol table is built to  
>> include only
>> information about global declarations and those other declarations  
>> currently
>> in scope. Therefore, when performing multiple traversals over an  
>> AST, each
>> traversal must re-analyze all global declarations and the entire  
>> AST of the
>> function of interest. A better solution might be to build a  
>> traversal that
>> creates a single symbol table describing all declarations in a  
>> translation
>> unit (including function- and block-scoped variables), for easy  
>> reference
>> during further traversals. It may also be valuable to have this  
>> traversal
>> produce a slightly-simplified AST in the process. I'm not thinking of
>> anything as radical as the simplifications performed by something  
>> like CIL,
>> however. It might simply be enough to transform variable references  
>> into a
>> form suitable for easy lookup in a complete symbol table like I've  
>> just
>> described. Other simple transformations such as making all implicit  
>> casts
>> explicit, or normalizing compound initializers, could also be good.
>>
>> A third possibility, which would probably depend on the integrated
>> preprocessor, would be to create an exact pretty-printer. That is, a
>> pretty-printing function such that pretty . parse is the identity.
>> Currently, parse . pretty should be the identity, but it's not true  
>> the
>> other way around. An exact pretty-printer would be very useful in  
>> creating
>> rich presentations of C source code --- think LXR on steroids.
>>
>> If you're interested in any combination of these, or anything  
>> similar, let
>> me know. The deadline is approaching quickly, but I'd be happy to  
>> work
>> together with a student to flesh any of these out into a full  
>> proposal.
>>
>> Thanks,
>> Aaron
>>
>> --
>> Aaron Tomb
>> Galois, Inc. (http://www.galois.com)
>> atomb at galois.com
>> Phone: (503) 808-7206
>> Fax: (503) 350-0833
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>
>
>
> -- 
> - Austin
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe