[C2hs] Re: making c2hs undrstand line pragmas

Wed May 24 12:02:47 EDT 2006

Duncan,

I think this is still an outstanding issue.

> I've got a patch to c2hs to make it do something with C style line
> pragmas in .chs files, eg:
> 
> # 1 "gtk/Graphics/UI/Gtk/TreeList/TreeStore.chs.pp"
> 
> These are produced by the C preprocessor. Currently c2hs chokes on these
> and so people have to use the -P option to cpp to suppress them. They
> are actually rather useful if the code in fact does need preprocessing
> (as most of gtk2hs's .chs files do) because they point to the original
> file name and source locations. For example it means that ghc's errors
> will report accurate locations in the .chs.pp file rather than reporting
> locations in the .chs file.
> 
> c2hs already produces accurate Haskell line pragmas {-# LINE ... #-} in
> the .hs files it produces for exactly that reason. I want to extend that
> to the case that the .chs file itself has had a preprocessor used on it.
> 
> One other reason to preserve the original file name is that haddock can
> now include links to the source files, and it uses the line pragmas to
> find the original source file. It doesn't do much good however if
> haddock links to a non-existant .chs file when the real original file
> was .chs.pp.

Ok, I see how this would be useful.  Please push your patch.

> The only thing that's wrong is that c2hs doesn't recognise cpp
> directives as the first line in a .chs file. You can see why this is so
> from the code below:
> 
> cpp :: CHSLexer
> cpp = directive
>       where
>         directive = 
> 	  string "\n#" +> alt ('\t':inlineSet)`star` epsilon
> 	  `lexmeta` 
> 	     \(_:_:dir) pos s ->	-- strip off the "\n#"
> 	       case dir of
> 
> 		... etc
> 
> It's requires a cpp directive to start with a newline followed by a '#'
> character.
> 
> I'm not sufficiently familiar with the style of c2hs's chs lexer to
> figure out how to fix this. Perhaps it can be done by checking if we're
> at the beginning of a line in a different way. Perhaps it can be done by
> checking the current column rather than looking for a '\n' character.

Yes, that's an awkward bit in the code that has bothered me before.  The
lexer combinators have not neat way to check for characters appearing in
a particular column.  I see only two ways to proceed:

      * We could match on # alone and then check in the action what
        column we are in and do different things in dependence on this.
        I don't like this, as it messes up the longest match rule and
        might be fragile.
      * We can prepend a '\n' character to the source file before
        starting the lexing process by changing the triple passed to
        execLexer in the function lexCHS (and we must then also adjust
        the initial value of `pos' to still get accurate line numbers).
        This is a bit of a kludge, but it seems to be the more robust
        solution to me.

Manuel