planning for ghc-6.10.1 and hackage [or: combining packages to
yield new type correct programs]
Simon Marlow
marlowsd at gmail.com
Thu Oct 2 07:41:38 EDT 2008
Don Stewart wrote:
> Here's a summary of why this is non-trivial,
>
> * We're trying to compose packages on the users machine to yield new
> type correct programs.
>
> * We're using cabal dependencies to decide when it is safe to do this.
> Hopefully we don't rule in any type incorrect combinations, nor rule
> out to many type correct combinations.
>
> * This is scary - in fact, we think the package system admits type
> incorrect programs (based on Typeable, or initialised global state),
> as it is similar to the runtime linking problem for modules.
I think what you're referring to is the problem that occurs if the program
links in more than one copy of Data.Typeable, which would then invalidate
the assumptions that make Data.Typeable's use of unsafeCoerce safe. I
wouldn't call this "type-incorrect" - it's a violation of assumptions made
by Data.Typeable, not type-incorrectness in the Haskell sense.
But you'll be glad to know this doesn't happen anyway, because
Data.Typeable's state is held by the RTS these days, for exactly this reason.
However, there are libraries which do have private state (e.g.
System.Random). We'd prefer not to have more than one copy of the state,
but it's not usually fatal: in the case of System.Random, different clients
might get streams of random numbers initialised from different seeds, but
that's indistinguishable from sharing a single stream of random numbers.
Often this global-state stuff is for caching, which works just fine when
multiple clients use different versions of the library - it's just a bit
less efficient.
> * We use constraint solving determine when composition is safe, by looking at
> "package >= 3 && < 4"
> style constraints. That is, we try to guess when the composition
> would yield a type correct program.
The way to make this completely safe is to ensure that the resulting
program only has one of each module, rather than one of each package
version - that's an approximation, because the package name might have
changed too.
Now, we want to relax this in various ways. One way is the base-3/base-4
situation, where base-3 has a lot of the same modules as base-4, but all
they do is re-export stuff from other packages. How do we know this is
safe? Well, we don't - the only way is to check whether the resulting
program typechecks.
Another way we want to relax is it when a dependency is "private" to a
package; that is, the package API is completely independent of the
dependency, and hence changing the dependency cannot cause compilation
failure elsewhere. We've talked in the past about how it would be nice to
distinguish private from non-private dependencies.
Let's be clear: there are only two ways that something could "go wrong"
when composing packages:
1. the composition is not type-correct; you get a compile-time error
2. some top-level state is duplicated; if the programmer has been
careful in their use of unsafePerformIO, then typically this won't
lead to a run-time error.
So it's highly unlikely you end up with a program that goes wrong at
runtime, and in those cases arguably the library developer has made
incorrect assumptions about unsafePerformIO.
> * Again, we're using constraint solving on this language to
> determine when composition of Haskell module sets (aka packages) would
> yield type correct Haskell programs
>
> All without attempting to do type checking of the interfaces between
> packages -- the very thing that says whether this is sound!
True - but we already know that package/version pairs are a proxy for
interfaces, and subject to user failure. If the package says that it
compiles against a given package/version pair, there's no guarantee that it
actually does, that's up to the package author to ensure. Now obviously
we'd like something more robust here, but that's a separate problem - not
an unimportant one, but separate from the issue of how to make
cabal-install work with GHC 6.10.1.
cabal-install has to start from the assumption that all the dependencies
are correct. Then it can safely construct a complete program by combining
all the constraints, and additionally ensuring that the combination has no
more than one of each module (and possibly relaxing this restriction when
we know it is safe to do so).
> * So, the solver for cabal-install has to be updated to allow the same
> package to have multiple, conflicting versions, as long as version X
> depends on version Y, and then not reject programs that produce this
> constraint.
Right.
> * This is non trivial, but think this refactoring is possible, but it is
> hard. ultimately we're still making optimistic assumptions about
> when module sets can be combined to produce type correct programs,
> and conservative assumptions, at the same time.
>
> What we need is a semantics for packages, that in turn uses a
> semantics for modules, that explains interfaces in terms of types.
The semantics is quite straightforward: a module is identified by the pair
(package-id, module name), and then you just use the Haskell module system
semantics. That is, replace all module names in the program with
(package-id, module name) pairs according to which packages are in scope in
the context of each module, and then proceed to interpret the program as in
Haskell 98.
The main problem with looking at things this way is that you need to see
the whole program - which is what I've been arguing against in the context
of instances. So I agree that looking for a semantics for packages that
lets you treat them as an abstract entity would be useful. Still, the
above interpretation of packages is a good starting point, because it tells
you whether a higher-level semantics is really equivalent.
> * The end result is that cabal-install should be able to find automated
> install plans for packages that ask for base-3, even when base-4 is on
> the system as well, and it uses pieces of base-3 libraries and base-4
> libraries. Some more programs will work than if we didn't ship base-3.
So I'm not sure exactly how cabal-install works now, but I imagine you
could search for a solution with a backtracking algorithm, and prune
solutions that involve multiple versions of the same package, unless those
two versions are allowed to co-exist (e.g. base-3/base-4). If backtracking
turns out to be too expensive, then maybe more heavyweight
constraint-solving would be needed, but I'd try the simple way first.
What happens with automatic flag assignments? Presumably we can decide
what the flag assignment for each package is up-front?
Cheers,
Simon
More information about the Glasgow-haskell-users
mailing list