Fixing "breaking packages"

Administrator admin at rodlogic.net
Sat Mar 2 00:27:43 CET 2013


Thanks for the GSoCMultipleInstances link: it is very informative!

It seems that there is a consensus already on what needs to be done
here: GHC and Cabal must support multiple package instances with the
same name and version (package curation and development sandboxing
have their value above and beyond this too). And there also is seems
to be a general design of how this needs to be done.

Assuming that a package instance is identified by
{PackageName}-{Version}-{InstanceId} here are some specific comments:

** What are the precise inputs to generating {InstanceId}? The is a
key question and the rest of the design will flow from it.


** When developing a package or multiple packages there is no point in
keeping track of multiple instances (i.e. don't install). Cabal
sandboxing or a local package db where {InstanceId} is a constant is
enough. Cabal will, however, need to find their other package instance
dependencies in the user db or system db.


> [GSoCMultipleInstance] There are three identifiers:
> [GSoCMultipleInstance] XXXX: the identifier appended to the installation directory so that installed packages do not clash with each other
> [GSoCMultipleInstance] YYYY: the InstalledPackageId, which is an identifier used to uniquely identify a package in the package database.
> [GSoCMultipleInstance] ZZZZ: the ABI hash derived by GHC after compiling the package
** It would be nice to reduce the complexity here and strive for a
single {InstanceId} that together with {PackageName} and {Version} are
used throughout (libs, package.conf.d, etc)


> [GSoCMultipleInstance] "we need to distinguish between two packages that have identical ABIs but different behaviour (e.g. a bug was fixed)"
** This is why the package version {Version} exists. If a bug was
fixed, a proper release process must increase the package version and
the unique hash/id should not try to fix this.


> [GSoCMultipleInstance] "We define a new Cabal Hash that hashes the compilation inputs (the LocalBuildInfo and the contents of the source files)"
** I am not sure why hashing the sources here is important: an added
space character could render a different hash but the object file
could be exactly the same.
** There is paragraph later in the document that describes what could
be the motivation here: installing unreleased packages (a clean
install vs a dirty install).


> [GSoCMultipleInstance] "ZZZZ is recorded in the package database as a new field abi-hash. When two packages have identical ZZZZs then they are interface-compatible, and the user might in the future want to change a particular dependency to use a different package but the the same ZZZZ. We do not want to make this change automatically, because even when two packages have identical ZZZZs, they may have different behaviour (e.g. bugfixes)."
** It is not clear to me in what cases will this be useful. If my
.cabal defines that I depend on a version 1.2.3 (or a range) this
assumes these dependencies are interface compatible and the
InstallPlan should be able to pick one that makes most sense (same for
bug fixes). I don't deny that this may be an interesting requirement,
but sounds like secondary to me.
** I am a bit confused by who will be responsible for generating this
{InstanceId}: Cabal or GHC? My initial thought was that GHC should be
responsible for defining the required inputs and generating the
appropriate {InstanceId} specially since it needs to be able to
traverse package dependencies for linking/ghci. However, maybe this is
not an issue since the package DB will simply be a DAG with specific
{InstanceId} pointers between nodes/dependencies?


> [GSoCMultipleInstance] The best tool for determining suitable package instances to use as build inputs is cabal-install. However, in practice there will be many situations where users will probably not have the full cabal-install functionality available:
> [GSoCMultipleInstance] invoking GHCi from the command line,
> [GSoCMultipleInstance] invoking GHC directly from the command line,
> [GSoCMultipleInstance] invoking the configure phase of Cabal (without using cabal-install).
** If the package DB stores a graph of
{PackageName}-{Version}-{InstanceId} packages connected to other
specific package instances (e.g. the mypkg-1.0-1234 package instance
depends on the yourpkg-1.1-9876 package instance), navigating this DAG
is straightforward and I don't see why cabal-install would be needed
here. Maybe the issue is selecting the first package instance based on
a given {PackageName}-{Version} or just {PackageName}? Maybe the
design here should make sure that there are some minimal attributes
that GHC/GHCi can query to decide what initial package instance to
pick.



More information about the ghc-devs mailing list