[GHC] #12485: -package-db flags now need to be sorted by dependency order
GHC
ghc-devs at haskell.org
Tue Aug 30 10:56:02 UTC 2016
#12485: -package-db flags now need to be sorted by dependency order
-------------------------------------+-------------------------------------
Reporter: niteria | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 8.0.2
Component: Package system | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): phab:D2450
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by ezyang):
OK. So the way I'll structure this is first describe some workarounds to
work with the current behavior, and then assuming those workarounds don't
work / are undesirable I'll try to comment on how we can make this work.
> The way the packages are built is with Setup.hs and a separate
package.db for each package.
One thing you can do in this situation is use ghc-pkg recache to create a
merged package database and pass that off to GHC. So you'd swizzle all the
text files into a directory, make the db, and you'd be off to the races.
> This is an undocumented change in behaviour in the very least. The
manual didn't state that they can be in any order, but also didn't put any
constraints on order.
To be fair, the manual does state that package databases are ordered, and
that packages closer to the top will shadow those below them
(https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/packages.html
#package-databases). It doesn't explicitly state that every substack
should be well-formed, but this is a constraint that `ghc-pkg` checks (if
you're not forcing it to register). (FWIW, Harendrar posted a Diff
https://phabricator.haskell.org/D2464 which should improve the docs here
further.)
> This sounds like an implementation detail informing the specification.
Is there a fundamental reason why the flags have to be ordered?
OK, let me explain the shadowing situation in more detail, and also how
the package database handling has changed in the recent few releases.
There is a very important correctness constraint GHC enforces on the
package databases that it reads in, which is there should not be two
distinct packages with the same "key" (what constitutes a key has changed
over time). This is pretty important because if two unit ids are equal,
GHC assumes they are type equal: if there are two distinct packages (which
could define totally types and functions) with the same key, GHC will mix
them up and generate code that almost certainly will segfault.
In GHC 7.8 and earlier, the key was just the package name plus version. So
it was not that uncommon to have two package databases which defined the
same package name and version. To keep things safe, GHC shadowed packages,
throwing out packages with any conflicting source package IDs. Every
package was also associated with an installed package id, which was
derived directly off of the ABI of a package. When two installed package
ids coincided, it was always safe to pick one (the later one) because the
coinciding installed package id meant that the ABIs matched, so there'd be
no problem reusing it.
OK, so this business where you can't have two packages with the same
source package ID was the source of Cabal hell (cabal-install was not
clever enough to put every package in a separate db) so in 7.10 we
introduced "package keys", which were a bit more fine-grained than source
package ids and what we used for type equality, linker symbols, etc. IPIDs
continued to be derived off ABI hashes. Package keys didn't really
necessitate any changes in how shadowing worked, since there still was a
separate notion for IPIDs.
At some point in the GHC 8.0 release cycle, SPJ was wondering why we need
package keys and IPIDs. At the same time, work on cabal new-build was
afoot, which eschewed the use of ABI hashes for IPIDs (since they couldn't
be computed before we actually built the package; new-build needs to
compute the IDs ahead of time so that it can determine if the particular
build it needs is already built.) So in GHC 8.0 we unified IPIDs and
package keys.
OK, and now we get to the set of commits which broke database for you. So,
when IPIDs don't track ABI hashes anymore, it's a bit more difficult to
say what ABI a package depends on: after all, we record dependencies as
IPIDs, not ABIs (maybe we should have recorded ABIs of the deps!) So, I
needed to find a new algorithm which:
1. Maintained the safety invariant, that we never tried to load two
distinct packages with the same IPID (previously package key, previously
source package id), and never used the wrong copy of the package with a
package that was compiled with a different package
2. Preserved the old shadowing behavior when the IPID conflicted when two
package databases merged together--we need to prefer the latest one (I
didn't want to implement this but bootstrapping stopped working without
this. I believe the issue is that the distributed boot libraries with GHC
don't come with hashes, so when we rebuild those libraries to boot, we end
up picking the same name. Better use the new one!)
3. Preserved the old behavior where if you were ABI-compatible, you could
override a package from the earlier database as long as the ABIs matched,
without breaking a pile of packages.
So... I sinned, and assumed that as we added databases to our stack, the
database would continue to be well-formed. Which has ruined your day!
Having written this, I don't think my suggested fix will keep GHC
bootstrapping as it is today. It's a bit unavoidable: if the build system
picks a deterministic id for a boot library, if you then immediately
bootstrap with that GHC, it will pick the *same* id. So you *need* some
form of shadowing.
Maybe what you just want is another mode for GHC to make it process
package databases differently? (This is why #12518 seems relevant.)
--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/12485#comment:9>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
More information about the ghc-tickets
mailing list