Advance notice that I'd like to make Cabal depend on parsec

Mon Mar 18 17:55:04 CET 2013

On Mon, 2013-03-18 at 09:32 -0700, Iavor Diatchki wrote:
> Hello,
> 
> To me it seems that the dependency here is incorrect---as far as I
> understand, GHC does not need to parse Cabal files, so it should not depend
> on the code and the library to do so.

Yes, GHC does not parse .cabal files.

> Furthermore, what is the overall architecture of the whole thing? My
> understanding has been that each implementation should have its own notion
> of a "package", and cabal simply has support for working with the package
> formats for each implementation.  Thus, it seems that the package types and
> code for (de)serializing them should be in the implementation (i.e., GHC),
> not Cabal.   I can see that it might make sense to have a common
> representation about package meta-data (e.g., names, versions, license,
> etc.), so perhaps these should all go in a separate package.  This looks a
> bit like the `cabal-types` that Duncan suggested, but I'd imagine that
> Cabal would need more types than just package meta-data so this is not an
> ideal name.

The Cabal spec defines a few things that all Haskell implementations are
supposed to support/provide. This covers the notion of an installed
package (not .cabal source packages). It defines what it is, the
meta-data that is stored and the format in which the implementation
should accept it (ie the input format for ghc-pkg).

So it's a compiler independent notion and the natural place to put the
code to support it was the Cabal lib, so that's what happened. We can
split that off into another package but it still makes sense for that
package to provide a parser and pretty printer (because implementations
have to accept them in the external format). So the natural way to
partition things doesn't help us avoid ghc depending on a parser lib.

> Finally, I agree that a "real" parser is good, but do you really want to
> write it using Parsec?   A sensible alternative would be to write a Happy
> grammar.  Having an actual grammar would both benefit users of the system,
> and it would avoid the dependency on all those package.   My experience of
> having to maintain some largish Parsec (and in general, combinator based)
> parsers, is that over the years the parsers get more and more complex, and
> are quite hard to maintain.

I'm somewhat repeating other parts of the thread at this point (see the
discussion on "why not happy"), but I'd like to point out again that it
is not simply the outline parser for cabal-style files that we're
talking about. We also need parsers/pretty printers for all the various
little types that make up the info about packages, like versions,
package names, package ids, version constraints, module names, licenses
etc etc.

In the Cabal lib we have a type class with parser and pretty printer and
all these various little types are instances. That stuff pretty much has
to use a combinator lib because it needs to be compositional. You need
to be able to reuse the version number parser in other parsers, or
programs like cabal-install or the hackage-server need to reuse them as
part of other parsers in their command line UI or config files, urls
etc.

So we have to use a combinator lib anyway. We currently also use this
for parsing the fields of .cabal files and the InstalledPackageInfo.
Currently we use ReadP from the base library which produces no error
messages and has pretty bad performance in some cases (exploding memory
use). It's that part particularly that should be using something like
parsec. Happy simply isn't an option there. I could possibly use happy
for the outline parser but it wouldn't buy us much.

Duncan