how to checkout proper submodules
Geoffrey Mainland
mainland at apeiron.net
Wed Jun 5 14:11:26 CEST 2013
I very much support moving to all-submodules. In fact, I argued for
all-submodules when we made the half-submodules transition last
year. Being able to easily check out a consistent and complete source
code tree in a repeatable way is extremely important.
Checking out by date "works" if you have dated history in your git
reflog. For example, see:
http://stackoverflow.com/questions/6990484/git-checkout-by-date
In general, git commits are *not* time ordered, so asking for the
version at a particular time is not well-defined across different
working repositories.
The GHC HQ buildbots dump fingerprints in a form that is usable directly
with fingerprint.py. You can get these fingerprints from the ghc-builds@
archive. Unfortunately there was a large gap after MSR moved buildings
where our builds did not run, but things are more or less working now. I
believe Ben's buildbot package dumps fingerprints in a form that needs
to be massaged before fingerprints.py can deal with it.
Geoff
On 06/05/2013 11:32 AM, Niklas Larsson wrote:
> When I was fiddling with having to rollback everything to a known good
> state I patched sync-all to checkout all the repos to the state they
> were in on a certain date, it's pretty naive, but it should be usable
> for doing manual bisecting at least. I can't find the old mailing list
> archives, so I attach the patch here.
>
> Niklas
>
>
> 2013/6/5 Austin Seipp <aseipp at pobox.com>
>
> (Warning: incoming answer, followed by a rant.)
>
> Base is not a submodule, meaning that there is essentially no way to
> automatically check it back out to the "exact same state" it was in,
> given some specified GHC commit - the commit IDs are not tracked.
>
> At this point, you are basically on your own. You'll have to manually
> checkout libraries/base to a specific commit that occurred 'around'
> the same time as the GHC commit. In this case, that means looking
> through whatever commits hit HEAD on May 7th:
>
> $ cd libraries/base
> $ git log --until="May 7th"
>
> The resulting list will show you what happened up to may 7th. Take the
> latest commit in that list, and check out base to that revision. Any
> commits afterword happened on may 8th or later:
>
> $ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>
>
> You're going to need to do this for every module that is not tracked
> as a submodule. Most of the repositories are very low-activity. base &
> testsuite are going to be the annoying ones.
>
> You'll have to continue this 'manual bisection' by hand, with a very
> hefty dose of frustrating trial-and-error, in my experience.
>
> There is a secondary alternative. GHC has a script called
> 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
> work around this deficiency (very poorly.) This script basically dumps
> out a text file, containing a key/value pair mapping every repository
> to its current HEAD commit. It can then take that text file and
> automatically do 'git checkout' for you in every repo. The idea is you
> can take fingerprints of the tree, save the results, and cleanly check
> out to some state later.
>
> The GHC build bots run by Ben L.'s "Buildbox" library automatically
> runs the 'fingerprint.py' script during nightly-builds, from what I
> remember. It may be possible to just look in the ghc-builds archives,
> and steal some fingerprints from the last month off one of the
> buildbots. I don't know who maintains the individual bots; perhaps you
> can ask the list. However, this will at best give you a 1-day level of
> granularity, rather than commit level granularity, which is still
> rather unsatisfying.
>
> ------------- Answer over, rant begins. ---------------------
>
> I know we had this discussion sometime recently I think, but can
> someone *please* explain why we are in this situation of half
> submodules, half random-floating-git-repository-checkouts? It's
> terrible. I'm frankly surprised we've even been doing it this long,
> over a year or more? It is literally the worst of submodules, and
> free-standing-repositories put together, with none of the advantages
> of either.
>
> Free-standing repos are attractive because they are just there, and
> you don't have to 'maintain' them (sort of.) Submodules are attractive
> because they identify the critical points in which your repositories
> depend on each other. We have neither benefit right now, clearly.
>
> In particular, this makes it impossible to use tools like 'git bisect'
> which is *incredibly* useful for just these exact cases. Hell, you can
> even make 'git bisect' work almost 100% automatically with a tiny bit
> of shell scripting.
>
>
http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html
>
> You could just instead have a script that built the compiler, and ran
> the built compiler on your testcase, after every bisection. Wouldn't
> it be *great* to have something like that Just Work? A tool like this
> could potentially boil down Kazu's bug almost automatically for
> example, with little-to-no frustrating intervention.
>
> And even now, looking at the repository listing of what is in
> libraries/, that are not submodules, I really see no reason why more -
> or even all - of them cannot be submodules. Is it a workflow issue of
> some sort? That's what I'm thinking at this point, but I also don't
> think it could be any worse than it is now.
>
> Realistically, very few libraries GHC needs for bootstrapping seem to
> change that much. unix, integer-simple, haskeline and filepath for
> example change *extremely* infrequently, but all are free-standing.
> Why? In the event they were submodules, would anything actually be
> lost?
>
> The maintainer - that is, not GHC HQ - would still 'own' the official
> repository. They can make changes to it. But if there is a necessity
> to pull that in for GHC (feature request, bug fix, random thing) it
> can be done by updating the submodule pointer to the new commit. But
> this must happen explicitly by a GHC committer. In the event they
> update the submodule pointer, they should also obviously make sure the
> build still works.
>
> That means we have to update the submodule pointers ourselves if
> things change. That sucks I guess, but really, aside from base and
> testsuite, the two most frequently changing repositories, is that
> *actually* going to cost us a lot of work?
>
> And even if it does cost us work, I'll speak for myself: I will gladly
> pay for that work and do it all myself if it means I can actually
> bisect and actually roll back my tree to some point to fix things -
> without needing to prepare for it months in advance using hacks. Like
> creating thousands of fingerprints, using fingerprint.py every day
> when people make commits (no, I haven't done this, but it could be
> done, and I really don't want to do it.)
>
> Long-term reproducible builds are, IMO, a must for any project.
> *Especially* a project of our size. *Especially* a compiler of all
> things. But as it stands, when you build GHC, you can probably
> reproduce *today's* results and *today's* bugs. Last month's results?
> Last years? Finding the difference between those months ago and today?
> Good luck - you will need it.
>
> On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
> > Hi,
> >
> > Andreas and I found that the new IO manager is not working
properly in
> > the current GHC head. I'm sure that it worked well at least on
May 7.
> >
> > We need to narrow the range of commits, so I did:
> >
> > % git checkout bb2795db36b36966697c228315ae20767c4a8753
> > % git submodule update
> >
> > But this does not checkout proper submodules. For instance,
> > libraries/base has newer commits. And of cource, building fails.
> >
> > Please tell us how to checkout proper submodules against a specific
> > GHC tree.
> >
> > --Kazu
>
> --
> Regards,
> Austin - PGP: 4096R/0x91384671
More information about the ghc-devs
mailing list