how to checkout proper submodules

Wed Jun 5 14:11:26 CEST 2013

I very much support moving to all-submodules. In fact, I argued for
all-submodules when we made the half-submodules transition last
year. Being able to easily check out a consistent and complete source
code tree in a repeatable way is extremely important.

Checking out by date "works" if you have dated history in your git
reflog. For example, see:

http://stackoverflow.com/questions/6990484/git-checkout-by-date

In general, git commits are *not* time ordered, so asking for the
version at a particular time is not well-defined across different
working repositories.

The GHC HQ buildbots dump fingerprints in a form that is usable directly
with fingerprint.py. You can get these fingerprints from the ghc-builds@
archive. Unfortunately there was a large gap after MSR moved buildings
where our builds did not run, but things are more or less working now. I
believe Ben's buildbot package dumps fingerprints in a form that needs
to be massaged before fingerprints.py can deal with it.

Geoff

On 06/05/2013 11:32 AM, Niklas Larsson wrote:
> When I was fiddling with having to rollback everything to a known good
> state I patched sync-all to checkout all the repos to the state they
> were in on a certain date, it's pretty naive, but it should be usable
> for doing manual bisecting at least. I can't find the old mailing list
> archives, so I attach the patch here.
>
> Niklas
>
>
> 2013/6/5 Austin Seipp <aseipp at pobox.com>
>
>     (Warning: incoming answer, followed by a rant.)
>
>     Base is not a submodule, meaning that there is essentially no way to
>     automatically check it back out to the "exact same state" it was in,
>     given some specified GHC commit - the commit IDs are not tracked.
>
>     At this point, you are basically on your own. You'll have to manually
>     checkout libraries/base to a specific commit that occurred 'around'
>     the same time as the GHC commit. In this case, that means looking
>     through whatever commits hit HEAD on May 7th:
>
>     $ cd libraries/base
>     $ git log --until="May 7th"
>
>     The resulting list will show you what happened up to may 7th. Take the
>     latest commit in that list, and check out base to that revision. Any
>     commits afterword happened on may 8th or later:
>
>     $ git checkout -b temporary-io-fix <sha1 of latest May 7th commit>
>
>     You're going to need to do this for every module that is not tracked
>     as a submodule. Most of the repositories are very low-activity. base &
>     testsuite are going to be the annoying ones.
>
>     You'll have to continue this 'manual bisection' by hand, with a very
>     hefty dose of frustrating trial-and-error, in my experience.
>
>     There is a secondary alternative. GHC has a script called
>     'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to
>     work around this deficiency (very poorly.) This script basically dumps
>     out a text file, containing a key/value pair mapping every repository
>     to its current HEAD commit. It can then take that text file and
>     automatically do 'git checkout' for you in every repo. The idea is you
>     can take fingerprints of the tree, save the results, and cleanly check
>     out to some state later.
>
>     The GHC build bots run by Ben L.'s "Buildbox" library automatically
>     runs the 'fingerprint.py' script during nightly-builds, from what I
>     remember. It may be possible to just look in the ghc-builds archives,
>     and steal some fingerprints from the last month off one of the
>     buildbots. I don't know who maintains the individual bots; perhaps you
>     can ask the list. However, this will at best give you a 1-day level of
>     granularity, rather than commit level granularity, which is still
>     rather unsatisfying.
>
>     ------------- Answer over, rant begins. ---------------------
>
>     I know we had this discussion sometime recently I think, but can
>     someone *please* explain why we are in this situation of half
>     submodules, half random-floating-git-repository-checkouts? It's
>     terrible. I'm frankly surprised we've even been doing it this long,
>     over a year or more? It is literally the worst of submodules, and
>     free-standing-repositories put together, with none of the advantages
>     of either.
>
>     Free-standing repos are attractive because they are just there, and
>     you don't have to 'maintain' them (sort of.) Submodules are attractive
>     because they identify the critical points in which your repositories
>     depend on each other. We have neither benefit right now, clearly.
>
>     In particular, this makes it impossible to use tools like 'git bisect'
>     which is *incredibly* useful for just these exact cases. Hell, you can
>     even make 'git bisect' work almost 100% automatically with a tiny bit
>     of shell scripting.
>
>    
http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html
>
>     You could just instead have a script that built the compiler, and ran
>     the built compiler on your testcase, after every bisection. Wouldn't
>     it be *great* to have something like that Just Work? A tool like this
>     could potentially boil down Kazu's bug almost automatically for
>     example, with little-to-no frustrating intervention.
>
>     And even now, looking at the repository listing of what is in
>     libraries/, that are not submodules, I really see no reason why more -
>     or even all - of them cannot be submodules. Is it a workflow issue of
>     some sort? That's what I'm thinking at this point, but I also don't
>     think it could be any worse than it is now.
>
>     Realistically, very few libraries GHC needs for bootstrapping seem to
>     change that much. unix, integer-simple, haskeline and filepath for
>     example change *extremely* infrequently, but all are free-standing.
>     Why? In the event they were submodules, would anything actually be
>     lost?
>
>     The maintainer - that is, not GHC HQ - would still 'own' the official
>     repository. They can make changes to it. But if there is a necessity
>     to pull that in for GHC (feature request, bug fix, random thing) it
>     can be done by updating the submodule pointer to the new commit. But
>     this must happen explicitly by a GHC committer. In the event they
>     update the submodule pointer, they should also obviously make sure the
>     build still works.
>
>     That means we have to update the submodule pointers ourselves if
>     things change. That sucks I guess, but really, aside from base and
>     testsuite, the two most frequently changing repositories, is that
>     *actually* going to cost us a lot of work?
>
>     And even if it does cost us work, I'll speak for myself: I will gladly
>     pay for that work and do it all myself if it means I can actually
>     bisect and actually roll back my tree to some point to fix things -
>     without needing to prepare for it months in advance using hacks. Like
>     creating thousands of fingerprints, using fingerprint.py every day
>     when people make commits (no, I haven't done this, but it could be
>     done, and I really don't want to do it.)
>
>     Long-term reproducible builds are, IMO, a must for any project.
>     *Especially* a project of our size. *Especially* a compiler of all
>     things. But as it stands, when you build GHC, you can probably
>     reproduce *today's* results and *today's* bugs. Last month's results?
>     Last years? Finding the difference between those months ago and today?
>     Good luck - you will need it.
>
>     On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
>     > Hi,
>     >
>     > Andreas and I found that the new IO manager is not working
properly in
>     > the current GHC head. I'm sure that it worked well at least on
May 7.
>     >
>     > We need to narrow the range of commits, so I did:
>     >
>     >   % git checkout bb2795db36b36966697c228315ae20767c4a8753
>     >   % git submodule update
>     >
>     > But this does not checkout proper submodules. For instance,
>     > libraries/base has newer commits. And of cource, building fails.
>     >
>     > Please tell us how to checkout proper submodules against a specific
>     > GHC tree.
>     >
>     > --Kazu
>
>     --
>     Regards,
>     Austin - PGP: 4096R/0x91384671