state of the cabal (preprocessors)

Henrik Nilsson nhn at Cs.Nott.AC.UK
Tue Oct 19 10:46:29 EDT 2004


Hi there,

Just a few thoughts on preprocessing issues.

First I should say that I have not really kept up with the Cabal
development, so apologies if I say something totally obvious or
something that just wouldn't work with Cabal.

Simon Marlow wrote:
 > Malcolm Wallace wrote:
 > > Isaac Jones wrote:
 > > > Since Cabal is pretty new, this won't break any existing Cabal
 > > > packages, and when converting non-Cabal packages to Cabal, there
 > > > is some work to do anyway, so why not just adopt this as one extra
 > > > rule to follow?
 > > > This is just a suggestion - I'm in two minds whether it is a good
 > > > idea myself, but it is at least worth considering the possibility.
 > >
 > >
 > > And I suppose the literate version would be .lcpphs? (unlit first,
 > > then cpp, then Haskell).  It would be more consistent and arguably
 > > correct, but I'm not sure that we should do it.

While arguably correct, such suffixes look pretty awkward to me.

In the Yampa build system we simply adopted the suffix ".cpp" to
indicate that C preprocessing was necessary. That would give
".hs.cpp" and ".lhs.cpp" respectively. One could argue about the
correctness of that, but it is at least simple, compositional,
and plays well with other suffixes. The literate convention is also
specified in the Haskell 98 specification, whereas CPP and other
preprocessing is not, as far as I can remember. From that perspective,
the ".cpp" convention is not totally unreasonable either.

Either way, personally I mostly see advantages of adopting suffixes to
indicate the need for preprocessing. For example:

   * It is clear to anyone who's looking at the sources which files needs
     to be preprocessed. This is particularly important for CPP
     processing since CPP does not really understand Haskell, and there
     thus are traps for the unwary.

   * Suffixes makes it very easy to preprocess only selected files, which
     again is a particularly good idea when CPP is involved.

Of course, there are other ways of doing preprocessing selectively.
Maybe Cabal has such mechanisms, making this (mostly) a non-issue?
(Indeed, the Yampa build system did provide an alternative way as well).
For example, it is sometimes necessary to pass specific flags to the
compiler for specific source files only, and if Cabal already supports
that, then I guess passing "-E" selectively would just be a special
case.

 > Another solution is to adopt a new extension for plain Haskell, say
 > .phs.  The conversion from .hs to .phs is either via CPP or just
 > 'cat', depending on some setting somewhere.  Also, I recommend that we
 > use the compiler itself for preprocessing:
 >
 >  ghc -E foo.hs -o foo.phs
 > because only the compiler knows what the values for the preprocessor
 > symbols __HASKELL__, __GLASGOW_HASKELL__, i386_TARGET_ARCH etc. should
 > be.  Otherwise we'll have to run the compiler during ./setup configure
 > to find out the values of these symbols (isn't that what hmake does?
 > What about when a new compiler comes along?).

Yes, that's probably true.

Malcolm Wallace wrote:

 > You are right that the compiler is best placed to define pp symbols,
 > so this is all very well, but neither nhc98 nor Hugs currently have
 > the -E option to stop immediately after pp.  And come to think of it,
 > the only real reason to have cpp done separately at all is because
 > Hugs does not have a preprocessor call builtin, like ghc and nhc98 do.
 > So maybe the best solution is to ship Hugs with -F"cpphs.hugs"
 > enabled by default?  Then no separate extension would be required,
 > and Cabal could just defer all cpp-ing to the compiler.

In the Yampa build system we took the approach that installing a
library for use by Hugs meant running all the preprocessing at
installation time and thus installing preprocessed sources.

I think that was the right approach. It simplifies for the end-user,
in particular when a multitude of pre-processing is involved. E.g. they
don't need to pass the right flags to Hugs and they don't need to worry
about having the preprocessors in their paths etc. (The installation
of a library could be system-wide, e.g. the person doing the
installation might not be the same as the one actually using it later.)
Additionally, there is a performance benefit, which potentially
could be significant depending on what preprocessors that are involved.

Similar arguments would apply if one for some reason wanted to install
libraries for GHCi in source form.

 > Another thought occurs to me.  Does anyone use cpp markings in
 > conjunction with any other preprocessors?  For instance, cpp + Happy,
 > cpp + DRiFT?  What ordering applies there?  I'm inclined to think
 > that it would nearly always be cpp first, other preprocessors second,
 > but perhaps not?  After all, the cpp markings would probably still
 > be conditioned on the end compiler, not on the intermediate pp?

If one adopts a convention that indicates the preprocessing to be done
by a simple suffix, then I think that would allow the programmer to
control the ordering if necessary, avoiding building in speculative
assumptions in Cabal?

Speaking of suffixes and preprocessing, I've encountered another problem
in the context of Yampa that might be worth rising.

Originally (well, still, actually), we used Ross Patterson's arrow
pre-processor for the arrow syntactic sugar. We then adopted
the convention that the suffix ".as" was for "arrowized Haskell source",
and ".las" for "literate arrowized Haskell source". I don't think this
choice of prefixes was particularly brilliant, but this does not
really matter.

However, we now have the situation that GHC supports the arrow syntax
directly. This begs the question of how to arrange things if one want to
distribute arrowized code that also should work for other
compilers/interpreters, since preprocessing still would be necessary
for those other systems. In particular, which suffix should one use for
the arrowized files in question?

While I guess one could stick to ".hs" and then resort to various
build-system trickery to get the preprocessing done when necessary, it
seems to me that a more straightforward solution might be to agree
on a suffix that indicates that the Arrow syntax is used (say ".arr").
Systems that do support the arrow syntax could then accept e.g.
".hs.arr" as a synonym to ".hs", or, if necessary, could look at the
extension for enabling the syntactic extension.

This solution is not without its problems, though, and I'm not sure
what the best approach would be. But the issue is similar to some
systems having built-in CPP support and others not, and it might make
sense to adopt a similar solution.

Of course, if arrow support is in the works for the other compilers,
this last problem might not be so much of an issue.

Best regards,

/Henrik

-- 
Henrik Nilsson
School of Computer Science and Information Technology
The University of Nottingham
nhn at cs.nott.ac.uk

This message has been scanned but we cannot guarantee that it and any
attachments are free from viruses or other damaging content: you are
advised to perform your own checks.  Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.



More information about the Libraries mailing list