Cabal Configurations Status Report

Thu Jul 26 15:23:06 EDT 2007

Hello cabal-devel,

     the following is a summary of the changes to Cabal to enable Cabal
     Configurations.

As we are about to integrate Cabal configurations into the main
branch in order to get more testing feedback, I think it is a good
time to give an overview about my changes.

** Parsing

In order to allow the new syntactic categories of sections and if-else
blocks, I added those categories to the low-level parsing primitives.
Possible section names are "library", "executable", or "flag" of
which the latter two require the name of the executable/flag name as a
parameter.  Conditional blocks may have an optional else-branch.

The current implementation is rather restrictive in its permitted
style; namely, the opening "{" has to be on the same line as the
section name or "if" keyword and the closing "}" has to stand alone on
its line unless it is followed by an "else {" branch.

At a higher level we then parse this lower-level structure into a
'GenericPackageDescription' structure.  This is defined as

   data GenericPackageDescription =
     GenericPackageDescription {
         packageDescription :: PackageDescription,
         packageFlags       :: [Flag],
         condLibrary        :: Maybe (CondTree ConfVar [Dependency]  
Library),
         condExecutables    :: [(String, CondTree ConfVar  
[Dependency] Executable)]
       }

The 'packageDescription' field contains a conventional package
description with only part of the fields filled in, namely all global
settings like version number or package name.  However, it does *not*
contain a valid "build-depends" field.  This and all other
library-related parts now go into the library section.  The old Cabal
format can easily be translated into this format so backwards
compatibility is no problem.

The 'CondTree' structure is an internal representation of a block of
field descriptions and possible conditional blocks.

   data CondTree v c a = CondNode
     { condTreeData        :: a
     , condTreeConstraints :: c
     , condTreeComponents  :: [( Condition v
                               , CondTree v c a
                               , Maybe (CondTree v c a))]
     }

A 'CondTree' node contains both data and constraints/dependencies.
Conditions are parameterized over the variable type, which in our case
will presumably be only one type.

   data ConfVar = OS String | Arch String | Flag String

** Resolving Conditions/Dependencies

'CondTree's can be resolved to a flat 'a' (the data part) given a
function to determine which constraints are satisfied and which
aren't.

   resolveWithFlags :: Monoid a =>
        [(String,[Bool])]
         -- ^ Domain for each flag name, will be tested in order.
     -> String  -- ^ OS name, as returned by System.Info.os
     -> String  -- ^ arch name, as returned by System.Info.arch
     -> [CondTree ConfVar [d] a]  -- ^ All CondTrees must satisfy the  
deps
     -> ([d] -> DepTestRslt [d])  -- ^ Dependency test function.
     -> (Either [d] -- missing dependencies
          ( [a]  -- result data
          , [d]  -- overall chosen dependencies
          , [(String, Bool)]))  -- chosen flag assignments

This function is the heart of the flag resolution process and is
implemented rather naively by trying all possible flag assignments and
choosing the first that satisfies the dependencies.  As long as we
don't use too many flags (or set some explicitly) this shouldn't be a
problem.

Note that the data has to be a Monoid.  The problem is, that due to
lack of first-class record fields[1] we have to store a partially
filled-in 'a' (in our use case a 'Library' or 'Executable') and then
rely on the Monoid instance to do the right thing.  This strategy also
required changing the way defaults are set; before, this was done by
setting it in the empty value and just (blindly) overwriting it
whenever it was set explicitly in the .cabal file.  Now, however, we
have to set it after the package has been resolved, and find out which
fields haven't been set explicitly.  This is a rather hackish
("hacky"?)  approach but seems to work fine so far.

   [1] Implementing approximations would likely require extensions to
   Haskell98 thus wouldn't be a good idea for Cabal which tries to be
   as portable as possible.

The function that does the actual flag-assignment resolution on
packages is

   finalizePackageDescription ::
        [(String,Bool)]  -- ^ Explicitly specified flag assignments
     -> Maybe [PackageIdentifier] -- ^ Available dependencies
     -> String -- ^ OS-name
     -> String -- ^ Arch-name
     -> GenericPackageDescription
     -> Either [Dependency]  -- ^ Missing dependencies
         ( PackageDescription -- ^ Resolved package description
         , [(String,Bool)])   -- ^ Flag assignments chosen

Using this function the 'configure' command determines a resolved
package description (or fails) and stores it in the 'LocalBuildInfo'
structure.  From there it is available to all the other setup
commands.  Unfortunately, this is not sufficient for all build
commands.

** Breaking the Interface to Setup.(l)hs

Due to the new staged package resolution process, the 'confHook'
arguments had to be changed, and most hooks could be changed to drop
the 'PackageDescription' argument.  For better backwards compatibility
it is still passed in, although it is exactly the same as what is stored
in the local build info.  There are two notable exceptions to this:
the 'sdist' and the 'clean' command.  So far, both took only an
optional LocalBuildInfo, but required a PackageDescription argument.
'sdist' needs this to determine where all the source and data files
reside, 'clean' only needs this because it currently tries to be a bit
too smart.

The fix for 'sdist' is to provide it with a package argument that
represents all possible assignments to the fields, i.e., a package
description, where all the "if ... {" and "} else" lines are ignored.
This is necessary, so that sdist can make sure that all possible
source files are included in the tarball.  The problem with this
"unconditional flattening" is that it may lead to inconsistent
assignments to a field value, e.g., a package that might be buildable
under one an not buildable under another condition will be *not*
buildable in the resulting package description.  However, we don't
read those fields in the current implementation, so this is no problem
right now (but of course makes things not very maintainable, and
custom hook implementations have to be careful.)

A possible alternative is to simply disallow certain fields from being
specified conditionally.  I'm not so sure about this, since for
example we'd disallow different source (sub-)trees for different
architectures if we'd do so.

The best way to fix the 'clean' hook, I think, would be to dumben
'clean' and make it simply be a "rm -rf dist/" equivalent.  At the
moment it has to clean up the generated files of many preprocessors
that were put into the source directory to allow them being included
in the 'sdist' generated tarball.  I think a better solution is to
explicitly copy (or generate) those files on demand (i.e., when
running 'sdist').

** Compatibility

In any case we still have a huge problem with backwards compatibility.
Whenever there are non-trivial changes to the Cabal API many packages
might break, but--even worse--can be fixed to work with only one
version of Cabal.  I think we need a better way to do this.  I will
elaborate some ideas in a separate mail.

** Summary

I hope this gives a good enough overview of the basic changes to Cabal
(without dropping the whole source code at you) that hopefully
stimulate some ideas, or questions (or complaints ;) ).

HTH,

/ Thomas