[C2hs] new release?

Mon May 14 11:58:30 EDT 2007

Hi Manuel,

We could probably do a new release soonish. The main thing is testing.
Two of the c2hs/tests/ are not working.

In test/Calls.chs there's a slightly weird issue with C/Haskell name
aliases. 

In the .chs file we've got:
{#call fun MyString as myString#}

but it generates:
foreign import ccall safe "calls.h MyString"
  MyString :: (Ptr CChar)

when it obviously should be:
foreign import ccall safe "calls.h MyString"
  myString :: (Ptr CChar)

It seems to me the wrong thing is happening when we parse the .chs file,
we call (normAP apath oalias) which returns Nothing in this case:

normAP :: CHSAPath -> Maybe Ident -> Maybe Ident
normAP ide Nothing                                = Nothing
normAP ide (Just ide') | apathToIdent ide == ide' = Nothing
		       | otherwise                = Just ide'

and:

apathToIdent (CHSRoot ide) =
    let lowerFirst (c:cs) = toLower c : cs
    in onlyPosIdent (posOf ide) (lowerFirst $ identToLexeme ide)

For reasons I don't quite understand, it's comparing case insensitively
on the first char and then because they appear to be equal it drops the
alias and then end up using the original C name which was not a valid
Haskell identifier. I've not looked at this code carefully enough to
understand what's supposed to be happening.

The structs test also fails, though I've not investigated why yet:

c2hs structs.h Structs.chs
c2hs: Errors during expansion of binding hooks:

Structs.chs:32: (column 60) [ERROR] 
  >>> Expected a pointer object!
  Attempt to dereference a non-pointer object or to use it in a
`pointer' hook.

The failing line is:

 val4  <- liftM cIntConv $ {#get weird->nested.pnt->y#} weird

and the C types were:

typedef struct _point *point;

struct _point {
  int x, y;
};

typedef struct {
  bool b;
  int  x;
  struct {
    int   y, z;
    point pnt;
  } nested;
} *weird;

Presumably the problem is with the nested.pnt part.

Other things I'd like to do, not related to a release:

      * eliminate the C2HS.hs source module and the corresponding
        --copy-library flag.
      * use the MTL rather than the current StateTrans, StateBase, State
      * use bytestrings in the C lexer
      * add pre-compiled header support

It turns out very few things in C2HS.hs are actually used by generated
code, and the ones that are used are either just aliases of standard
library functions or simple compositions of library functions. The
remaining functions are presumably there to be used by the programmer
in .chs modules. I'm not sure that we need to provide this, especially
since most of them are duplicates of things in the standard lib.

The main difficulty with this change is that currently the marshalers
are just a single Ident identifier where as for simple compositions we
need code fragments like "fmap fromIntegral . peek". Perhaps we should
just replace it with a String.

For pre-compiled headers, I think this can be done and be compatible
with the #cppery we currently allow in .chs files. C translation units
have the nice suffix extension property. Declarations added later cannot
change the meaning of earlier declarations. So it'd be possible to
precompile a common header and so long as that appears before any
per-file custom stuff that all works out fine, we just extend the info
from the pre-compiled header.

For example:

Foo.chs:
#include <gtk/gtk.h>
#c
blah
#endc

Bar.chs:
#include <gtk/gtk.h>
#include "something-else.h"

So these generate different .chs.h header files, but they share a common
prefix. This will be very common when using cabal which passes on the
command line all the .h files in the 'includes:', and these headers are
considered to be #included before any other directives in the .chs
files. So a simple scheme would be to just allow pre-compilation of
those headers passed on the command line.

To actually do this we have to keep track of some info so we can pick up
extending the header where we left off. At a pure parsing level all we
need is the current set of typedef identifiers. We can reasonably assume
that the header defines a number of complete external declarations so we
don't need to preserve the list of nested scopes, that list should
always be empty. However we would want to have the pre-compiled header
file contain not the C AST but the maps that c2hs generates. This would
involve adjusting name analysis phase a bit so that it can work
incrementally, one external declaration at a time.

So the info kept in the pre-compiled header file would be the 4 maps and
the set of typeidents along with various bits of meta data for sanity
checking.

To have cpp only produce the suffix of a translation unit it's possible
to use:

        -imacros file
                Exactly like -include, except that any output produced
                by scanning file is thrown away.  Macros it defines
                remain defined.  This allows you to acquire all the
                macros from a header without also processing its
                declarations.

So in the above example we'd use that flag with a .chs.h file containing
just the #includes of the headers passed on the command line.

As for the UI, this should be a bit easier these days since Cabal can
automate things. We needn't try and make it totally transparent at the
c2hs level, since we can get Cabal to make it transparent at the user
level. The simplest thing I can think of is to have Cabal pass a precomp
flag that tells c2hs to generate or use the given precomp file. Cabal
would put this file in the dist/build dir so it'd be cleaned
automatically. If the file didn't exist yet, c2hs would generate it
based just on the headers passed on the command line, and without
examining the .chs file. If the precomp file did exist, c2hs would use
it and check that the list of headers (and other -C flags) passed were
exactly the same as for the previous invocation when the file was
generated.

I'm not sure if it should be c2hs or Cabal's responsibility to
re-generate the precomp file if a header file in the local source tree
changed.

All this assumes of course that actually using pre-compiled headers is
really significantly faster. That has yet to be properly demonstrated
but my expectation is that it will be much faster. I'll probably be
doing this demo as part of the Data.Binary optimisation work.

Duncan