Changing the -package dependency resolution algorithm

Simon Marlow marlowsd at gmail.com
Thu Jul 31 07:53:40 UTC 2014


Sorry for not replying to this earlier.  I think after our meeting the 
other day we agreed to keep the existing algorithm but document it 
better.  Here's a stab at describing the spec:

1. compute the set of valid packages, where a package is valid if
   - all its dependencies are valid
   - it is not named in an -ignore-package flag,
   - it is not shadowed by a package with the same PackageId
     in another Package DB, unless it is in the transitive
     closure of a package named in a -package-id flag

2. for each flag:
    -package P:  expose P, and hide all other versions of P
    -package-id P: expose P, and hide all other versions of P
    -hide-package P: hide P

3. if there are multiple exposed packages with the same name, hide
    all but the most recent version.

We need to rethink the shadowing behaviour.  It is designed to handle 
the case where we have the same PackageId (name + version) in two 
different DBs (e.g. global and local).  Shadowing takes the topmost one 
of these (e.g. local, or rightmost -package-db flag).  We can relax this 
requirement so long as the InstalledPackageIds are different, but what 
if the InstalledPackageIds are the same?  Right now that's OK, because 
identical InstalledPackageIds implies identical ABIs, but if we change 
that so that InstalledPackageId is derived from the source and not the 
ABI, we would not be able to assume that two identical 
InstalledPackageIds are compatible.

Cheers,
Simon

On 24/07/2014 15:57, Edward Z. Yang wrote:
> Right now, GHC has a very complex and hard to explain algorithm for
> picking packages from the package database when you give it a pile of
> -package/-package-id/-{hide,ignore,trust,distrust}-package flags.
> Roughly, it currently does something like this.
>
> 1. Concatenate all of the package databases into a giant list, with
> system packages first and then user packages later, removing duplicate
> entries with the same installed package ID (preferring later packages).
> Call this PACKAGES.
>
> 2. Take all the -package-id flags and compute their transitive closure
> (call this set P).
>
> 3. Calculate the set of shadowed packages--the set of installed packages
> for which there exists another package with the same package ID
> (foo-0.1) which is in P or is later in the package stack.
>
> 4. Calculate the set of ignored packages---the set of packages which
> match all -ignore-package flags
>
> 5. Filter out shadowed and ignored packages from the list of packages,
> calling the result ALL_PACKAGES.
>
> 6. Calculate the set of broken packages---the set of packages not
> contained in the least fixed point of the relation that takes
> a set of packages, and adds all packages whose dependencies are
> satisfied, from ALL_PACKAGES.
>
> 7. Process the package flags in order, operating on PACKAGES (not
> ALL_PACKAGES).  For -package/-hide-package, take the package with the
> *latest* version that matches the flag and are not broken (since this
> includes shadowed packages, the result is unique) and toggle it to be
> exposed/unexposed, and hide all the other packages.  For
> -trust-package/-distrust-package, toggle the trusted bit for all
> instances in the database.
>
> 8. If we have exposed multiple versions of the same package name,
> hide all the older versions
>
> What a mouthful!  I have no idea, given a set of flags, how this works.
> So here is an alternate proposal for an alternate way of handling these
> flags *when starting from an empty database* (e.g. -hide-all-packages)
>
> Suppose we maintain a set of selected packages, which starts off empty.
> Process each package flag in turn.
>
> For -package and -package-id, get the set of installed packages which
> match the flag. (If multiple package names apply, process each in turn
> as if it were a separate flag.)  Compute the transitive closure of
> dependencies for all of them, and filter out all choices which have
> dependencies which are inconsistent with the current set of selected
> packages.  Consistency without multi-instances is a mapping of a package
> name to an installed package.  If there is still more than one choice,
> tiebreak by version, which database and time of install. (The latter
> tiebreak should not be necessary until we allow multiple instances of a
> package with the same package ID.)
>
> For -hide-package, get the set of packages which match and hide them
> all; for -ignore-package, hide the transitive closure of dependencies of it.
> For -trust,distrust-package, toggle for all matching packages as before.
>
> Here are some differences in behavior between this and the previous
> scheme:
>
> - It's no longer valid to indirectly depend on two different versions
>    of the same package.  Most of the time, users didn't want this anyway.
>    Note that the current scheme prevents directly depending on two
>    different versions by shadowing the old ones.
>
> - We can easily extend it to handle multi-instances by relaxing the
>    consistency check.
>
> - It assumes *-hide-all-packages* at the beginning.  This scheme
>    probably works less well without that: now we need some consistent
>    view of the database to start with.
>
> What do people think?
>
> Cheers,
> Edward
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://www.haskell.org/mailman/listinfo/ghc-devs
>


More information about the ghc-devs mailing list