[Haskell-cafe] More Language.C work for Google's Summer of Code
Aaron Tomb
atomb at galois.com
Tue Mar 30 13:30:22 EDT 2010
Hello,
I'm wondering whether there's anyone on the list with an interest in
doing additional work on the Language.C library for the Summer of
Code. There are a few enhancements that I'd be very interested seeing,
and I'd love be a mentor for such a project if there's a student
interested in working on them.
The first is to integrate preprocessing into the library. Currently,
the library calls out to GCC to preprocess source files before parsing
them. This has some unfortunate consequences, however, because
comments and macro information are lost. A number of program analyses
could benefit from metadata encoded in comments, because C doesn't
have any sort of formal annotation mechanism, but in the current state
we have to resort to ugly hacks (at best) to get at the contents of
comments. Also, effective diagnostic messages need to be closely tied
to original source code. In the presence of pre-processed macros,
column number information is unreliable, so it can be difficult to
describe to a user exactly what portion of a program a particular
analysis refers to. An integrated preprocessor could retain comments
and remember information about macros, eliminating both of these
problems.
The second possible project is to create a nicer interface for
traversals over Language.C ASTs. Currently, the symbol table is built
to include only information about global declarations and those other
declarations currently in scope. Therefore, when performing multiple
traversals over an AST, each traversal must re-analyze all global
declarations and the entire AST of the function of interest. A better
solution might be to build a traversal that creates a single symbol
table describing all declarations in a translation unit (including
function- and block-scoped variables), for easy reference during
further traversals. It may also be valuable to have this traversal
produce a slightly-simplified AST in the process. I'm not thinking of
anything as radical as the simplifications performed by something like
CIL, however. It might simply be enough to transform variable
references into a form suitable for easy lookup in a complete symbol
table like I've just described. Other simple transformations such as
making all implicit casts explicit, or normalizing compound
initializers, could also be good.
A third possibility, which would probably depend on the integrated
preprocessor, would be to create an exact pretty-printer. That is, a
pretty-printing function such that pretty . parse is the identity.
Currently, parse . pretty should be the identity, but it's not true
the other way around. An exact pretty-printer would be very useful in
creating rich presentations of C source code --- think LXR on steroids.
If you're interested in any combination of these, or anything similar,
let me know. The deadline is approaching quickly, but I'd be happy to
work together with a student to flesh any of these out into a full
proposal.
Thanks,
Aaron
--
Aaron Tomb
Galois, Inc. (http://www.galois.com)
atomb at galois.com
Phone: (503) 808-7206
Fax: (503) 350-0833
More information about the Haskell-Cafe
mailing list