Changing the -package dependency resolution algorithm

Edward Z. Yang ezyang at mit.edu
Thu Jul 24 14:57:05 UTC 2014


Right now, GHC has a very complex and hard to explain algorithm for
picking packages from the package database when you give it a pile of
-package/-package-id/-{hide,ignore,trust,distrust}-package flags.
Roughly, it currently does something like this.

1. Concatenate all of the package databases into a giant list, with
system packages first and then user packages later, removing duplicate
entries with the same installed package ID (preferring later packages).
Call this PACKAGES.

2. Take all the -package-id flags and compute their transitive closure
(call this set P).

3. Calculate the set of shadowed packages--the set of installed packages
for which there exists another package with the same package ID
(foo-0.1) which is in P or is later in the package stack.

4. Calculate the set of ignored packages---the set of packages which
match all -ignore-package flags

5. Filter out shadowed and ignored packages from the list of packages,
calling the result ALL_PACKAGES.

6. Calculate the set of broken packages---the set of packages not
contained in the least fixed point of the relation that takes
a set of packages, and adds all packages whose dependencies are
satisfied, from ALL_PACKAGES.

7. Process the package flags in order, operating on PACKAGES (not
ALL_PACKAGES).  For -package/-hide-package, take the package with the
*latest* version that matches the flag and are not broken (since this
includes shadowed packages, the result is unique) and toggle it to be
exposed/unexposed, and hide all the other packages.  For
-trust-package/-distrust-package, toggle the trusted bit for all
instances in the database.

8. If we have exposed multiple versions of the same package name,
hide all the older versions

What a mouthful!  I have no idea, given a set of flags, how this works.
So here is an alternate proposal for an alternate way of handling these
flags *when starting from an empty database* (e.g. -hide-all-packages)

Suppose we maintain a set of selected packages, which starts off empty.
Process each package flag in turn.

For -package and -package-id, get the set of installed packages which
match the flag. (If multiple package names apply, process each in turn
as if it were a separate flag.)  Compute the transitive closure of
dependencies for all of them, and filter out all choices which have
dependencies which are inconsistent with the current set of selected
packages.  Consistency without multi-instances is a mapping of a package
name to an installed package.  If there is still more than one choice,
tiebreak by version, which database and time of install. (The latter
tiebreak should not be necessary until we allow multiple instances of a
package with the same package ID.)

For -hide-package, get the set of packages which match and hide them
all; for -ignore-package, hide the transitive closure of dependencies of it.
For -trust,distrust-package, toggle for all matching packages as before.

Here are some differences in behavior between this and the previous
scheme:

- It's no longer valid to indirectly depend on two different versions
  of the same package.  Most of the time, users didn't want this anyway.
  Note that the current scheme prevents directly depending on two
  different versions by shadowing the old ones.

- We can easily extend it to handle multi-instances by relaxing the
  consistency check.

- It assumes *-hide-all-packages* at the beginning.  This scheme
  probably works less well without that: now we need some consistent
  view of the database to start with.

What do people think?

Cheers,
Edward


More information about the ghc-devs mailing list