RFC: "Native -XCPP" Proposal

Austin Seipp austin at well-typed.com
Wed May 6 12:25:54 UTC 2015

On Wed, May 6, 2015 at 6:08 AM, Herbert Valerio Riedel
<hvriedel at gmail.com> wrote:
> Hello *,
> As you may be aware, GHC's `{-# LANGUAGE CPP #-}` language extension
> currently relies on the system's C-compiler bundled `cpp` program to
> provide a "traditional mode" c-preprocessor.
> This has caused several problems in the past, since parsing Haskell code
> with a preprocessor mode designed for use with C's tokenizer has caused
> already quite some problems[1] in the past. I'd like to see GHC 7.12
> adopt an implemntation of `-XCPP` that does not rely on the shaky
> system-`cpp` foundation. To this end I've created a wiki page
>   https://ghc.haskell.org/trac/ghc/wiki/Proposal/NativeCpp
> to describe the actual problems in more detail, and a couple of possible
> ways forward. Ideally, we'd simply integrate `cpphs` into GHC
> (i.e. "plan 2"). However, due to `cpp`s non-BSD3 license this should be
> discussed and debated since affects the overall-license of the GHC
> code-base, which may or may not be a problem to GHC's user-base (and
> that's what I hope this discussion will help to find out).
> So please go ahead and read the Wiki page... and then speak your mind!

Thanks for writing this up, btw! It's nice to put the mumblings we've
had for a while down 'on paper'.

> Thanks,
>   HVR
> [1]: ...does anybody remember the issues Haskell packages (& GHC)
>      encountered when Apple switched to the Clang tool-chain, thereby
>      causing code using `-XCPP` to suddenly break due to subtly
>      different `cpp`-semantics?

There are two (major) differences I can list, although I can only
provide some specific examples OTTOMH:

  1) Clang is more strict wrt language specifications. For example,
GCC is lenient and allows a space between a macro identifier and the
parenthesis denoting a parameter list; so saying 'FOO (x, y)' is valid
with GCC (where FOO is a macro), but not with Clang. Sometimes this
trips up existing code, but I've mostly seen it in GHC itself.

  2) The lexing rules for C and Haskell simply are not the same in
general. For example, what should "FOO(a' + b')" parse to? Well, in
Haskell, 'prime' is a valid component from an identifier and in this
case the parse should be "a prime + b prime", but in C the ' character
is identified as beginning the start of a single-character literal,
and a strict preprocessor like Clang's will reject that.

In practice, I think people have mostly just avoided arcane lexer
behaviors that don't work, and the only reason this was never a
problem was because GCC or some variant was always the 'standard' C
compiler GHC could rely on.

> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users


Austin Seipp, Haskell Consultant
Well-Typed LLP, http://www.well-typed.com/

More information about the ghc-devs mailing list