Suggestion for resolving the Cabal/GHC dependency problems

Duncan Coutts duncan.coutts at googlemail.com
Wed Sep 11 16:28:18 UTC 2013


All,

I was discussing this with Yuri earlier and I had an idea that I think
may resolve our problems.

Firstly, what are the problems:

     1. ghc devs and users grumble because the ghc library depends on
        Cabal, making it hard to use the ghc lib with a later Cabal.
     2. ghc devs grumble generally that Cabal seems quite big but they
        only need small parts of it
     3. Cabal devs complain that they cannot add useful dependencies
        (like a parser with error messages) because ghc depends on
        Cabal.

Secondly, let us recall why it is that ghc does use Cabal, and where:

     1. it's used by ghc-pkg to read/write the external representation
        of installed package files (external rep is defined by Cabal
        spec, and implemented in the Cabal lib)
     2. it's used by ghc to read the ghc package database files/dirs.
        These databases use the same external representation, and ghc &
        ghc-pkg use the InstalledPackageInfo type internally
        (InstalledPackageInfo is defined in the Cabal lib).
     3. it's used by the ghc build system to help with building all the
        libraries that ship with ghc. I believe that this part uses more
        of the build system part of Cabal, not just the types and
        external formats.
     4. ghc comes with Cabal pre-installed so that users can run
        Setup.hs scripts to install other packages. This was part of the
        original Cabal design: that all compilers would use the
        installed package info format defined by Cabal, and all
        compilers would ship Cabal to users so the Setup.hs mechanism
        will work.

Now, as far as I know, nobody is suggesting that ghc stop shipping
Cabal, nor that it stop using it as part of the build system.

The problems all centre around use number 2, where the ghc library
package depends on Cabal. Number 1 isn't really a problem because
ghc-pkg is an executable.

So my suggestion is quite simple, eliminate the dependency in case 2
above, but keep it in the other three cases. Specifically:

      * ghc will use a new internal type to represent info coming from
        the ghc-pkg databases, ie not InstalledPackageInfo. This can be
        smaller as ghc doesn't care about the metadata.
      * The InstalledPackageInfo and the current need for ghc to read
        its external representation is the main reason the ghc lib
        depends on Cabal. Other dependencies should be minor and easy to
        remove.
      * ghc and ghc-pkg will agree on a new on-disk representation of
        the installed package info.
      * ghc-pkg will continue to depend on Cabal, it will continue to
        use the types and parsers defined by Cabal to read/write the
        InstalledPackageInfo. It will translate from
        InstalledPackageInfo into the on-disk representation that ghc &
        ghc-pkg share.

So what might the on-disk representation for the ghc-pkg databases look
like? Currently they use the external format of InstalledPackageInfo
because this is convenient using Cabal.

One simple option is just to store both formats for all packages.
Another option would be that ghc never reads package dbs where the cache
is out of date. Then it only ever reads the cache and never has to look
at the other files. In principle the cache should never be out of date:
there are two options for updating the db, calling ghc-pkg, or putting
the file directly and calling ghc-pkg recache (distros often use the
latter as it is simpler for them). In either case the db cache will be
up to date. (In fact calling it a cache is not really correct.)

So this is a better solution than the one previously proposed to split
out some small part of Cabal, because in this proposal, ghc doesn't
depend on Cabal at all, not even some smaller common lib.

It's also better from the point of view of the Cabal folks because it
does not involve splitting Cabal in unnatural ways. The Cabal folks do
want to split the Cabal lib, but not in a way that is especially helpful
to ghc. This suggestion is orthogonal to any Cabal lib splits.

Further, if only ghc-pkg and the ghc build system depend on Cabal, then
it is easier for Cabal to add more dependencies, since they do not have
to be installed with ghc (due to the ghc lib depending on them). In
particular the Cabal folks would like to use a proper parser and have
suggested adding dependencies on parsec, mtl and transformers. If only
ghc-pkg depends on Cabal, then these dependencies only need to be used
at build time, and do not have to be installed (which also means they
don't have to be kept quite so up to date).


Note that this would not address SPJ's complaint that the start of
building ghc involves building 60+ modules of Cabal. The ghc-cabal tool
still uses Cabal and I am not suggesting changing that now. It's
plausible that when the Cabal lib is split that the ghc-cabal tool could
depend on just the smaller of the two (someone would need to look at how
much functionality from the "Simple" build system it uses). I don't see
that this is a big priority however.

Duncan




More information about the ghc-devs mailing list