Continuous Integration and Cross Compilation

Thu Jun 19 00:23:07 UTC 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Great and detailed response Austin.  Thank you.

William, I'm happy to help in any way I can.

I run SmartOS x86 and x86_64 builds of GHC HEAD on my own equipment
using the GHC Builder Ian Lynagh developed:
https://ghc.haskell.org/trac/ghc/wiki/Builder
https://github.com/haskell/ghc-builder

I'm also currently working on small tweaks to the ghc-builder and
getting the GHC testsuite to pass on Illumos (and indirectly Solaris).

I follow Gábor's lead on the GHC Builder priorities and Carter
Schonwald acts a Pull Request gatekeeper for changes.

Best,
Alain

On 06/18/2014 11:53 PM, Austin Seipp wrote:
> Hi William,
> 
> Thanks for the email. Here're some things to consider.
> 
> For one, cross compilation is a hot topic, but it is going to be a 
> rather large amount of work to fix and it won't be easy. The
> primary problem is that we need to make Template Haskell
> cross-compile, but in general this is nontrivial: TemplateHaskell
> must load and run object code on the *host* platform, but the
> compiler must generate code for the *target* platform. There are
> ways around some of these problems; for one, we could compile every
> module twice, once for the host, and once for the target. Upon
> requesting TH, the Host GHC would load Host Object Code, but the
> final executable would link with the Target Object Code.
> 
> There are many, many subtle points to consider if we go down this 
> route - what happens for example if I cross compile from a 64bit 
> machine to a 32bit one, but TemplateHaskell wants some knowledge
> like what "sizeOf (undefined :: CLong)" is? The host code sees a
> 64-bit quantity while the target actually will deal with a 32bit
> one. This could later explode horribly. And this isn't limited to
> different endianness either - it applies to the ABI in general.
> 64bit Linux -> 64bit Windows would be just as problematic with this
> exact case, as one uses LP64, while the other uses LLP64 data
> models.
> 
> So #1 by itself is a very, very non-trivial amount of work, and IMO
> I don't think it's necessary for better builds. There are other
> routes possible for cross compilation perhaps, but I'd speculate
> they are all equally as non-trivial as this one.
> 
> Finally, the remainder of the scheme, including shipping builds to 
> remote machines and have them be tested sounds a bit more
> complicated, and I'm wondering what the advantages are. In
> particular it seems like this merely exposes more opportunities for
> failure points in the CI system, because now all CI depends on
> cross compilation working properly, being able to ship reports back
> and forth, and more. Depending on CC in particular is a huge burden
> it sounds: it makes it hard to distinguish when a cross-compilation
> bug may cause a failure as opposed to a changeset from a committer,
> which widens the scope of what we need to consider. A CI system
> should be absolutely as predictable as possible, and this adds a
> *lot* of variables to the mix. Cross compilation is really
> something that's not just one big task - there will be many *small*
> bugs laying in wait after that, the pain of a thousand cuts.
> 
> Really, we need to distinguish between two needs:
> 
> 1) Continuous integration.
> 
> 2) Nightly builds.
> 
> These two systems have very different needs in practice:
> 
> 1) A CI system needs to be *fast*, and it needs to have dedicated 
> resources to respond to changes quickly. This means we need to 
> *minimize* the amount of time for developer turn around to see 
> results. That includes minimizing the needed configurations.
> Shipping builds to remote machines just for CI would greatly
> complicate this and likely make it far longer on its own, not to
> mention it increases with every system we add.
> 
> 2) A nightly build system is under nowhere near the same time 
> constraints, although it also needs to be dedicated. If an
> ARM/Linux machine takes 6 hours to build (perhaps it's shared or
> something, or just really wimpy), that's totally acceptable. These
> can then report nightly about the results and we can reasonably
> blame people/changesets based on that.
> 
> Finally, both of these become more complicated by the fact GHC is
> a large project that has a highly variable number of configurations
> we have to keep under control: static, dynamic, static+dynamic, 
> profiling, LLVM builds, builds where GHC itself is profiled, as
> well as the matrix of those combinations: LLVM+GHC Profiled, etc
> etc etc. Each of these configurations expose bugs in their own
> right. Unfortunately doing #1 with all these configurations would
> be ludicrous: it would explode the build times for any given
> system, and it also drastically multiplies the hardware resources
> we'd need for CI if we wanted them to respond quickly to any given
> changeset, because you not only have to *build* them, you must run
> them. And now you have to run a lot of them. A nightly build system
> is more reasonable for these problems, because taking hours and
> hours is expected. These problems would still be true even with
> cross compilation, because it multiplies the amount of work every
> CI run must do no matter what.
> 
> We actually already do have both of these already, too: Joachim 
> Breitner for example has set us up a Travis-CI[1] setup, while
> Gabor Pali has set us up nightly builds[2]. Travis-CI does the job
> of fast CI, but it's not good for a few reasons:
> 
> 1) We have literally zero visibility into it for reports.
> Essentially we only know when it explodes because Joachim yells at
> us (normally at me :) This is because GitHub is not our
> center-of-the-universe, despite how much people yearn for it to be
> so.
> 
> 2) The time limit is unacceptable. Travis-CI for example actually 
> cannot do dynamic builds of GHC because it takes too long.
> Considering GHC is shipping dynamically on major platforms now,
> that's quite a huge loss for a CI system to miss (and no, a
> separate build matrix configuration doesn't work here - GHC builds
> statically and dynamically at the same time, and ships both -
> there's no way to have "only static" and "only dynamic" entries.)
> 
> 3) It has limited platform support - only recently did it have OS
> X, and Windows is not yet in sight. Ditto for FreeBSD. These are
> crucial for CI as well, as they encompass all our Tier-1 platforms.
> This could be fixed with cross compilation, but again, that's a
> big, big project.
> 
> And finally, on the GitHub note, as I said in the prior thread
> about Phabricator, I don't actually think it offers us anything
> useful at this point in time - literally almost nothing other than
> "other projects use GitHub", which is not an advantage, it's an
> appeal to popularity IMO. Webhooks still cannot do things like ban
> tabs, trailing whitespace, or enforce submodule integrity. We have
> to have our own setup for all of that. I'm never going to hit the
> 'Merge Button' for PRs - validation is 100% mandatory on behalf of
> the merger, and again, Travis-CI cannot provide coherent coverage
> even if we could use it for that. And because of that there's no
> difference between GitHub any other code site - I have to pull the
> branch manually and test myself, which I could do with any random
> git repository in the world.
> 
> The code review tools are worse than Phabricator. Finally, if we
> are going to accept patches from people, we need to have a
> coherent, singular way to do it - mixing GitHub PRs, Phabricator,
> and uploading patches to Trac is just a nightmare for pain, and not
> just for me, even though I do most of the patch work - it incurs
> the burden on *every* person who wants to review code to now do so
> in many separate places. And we need to make code review *easier*,
> not harder! If anything, we should be consolidating on a single
> place (obviously, I'd vote for Phabricator), not adding more places
> to make changes that we all have to keep up with, when we don't
> even use the service itself! That's why I proposed Phabricator:
> because it is coherent and a singular place to go to, and very good
> at what it does, and does not attempt to 'take over' GHC itself.
> GitHub is a fairly all-or-nothing proposition if you want any
> benefits it delivers, if you ask me (I say this as someone who
> likes GitHub for smaller projects). I just don't think their tools
> are suitable for us.
> 
> So, back to the topic. I think the nightly builds are actually in
> an OK state at the moment, since we do get reports from them, and 
> builders do check in regularly. The nightly builders also cover a
> more diverse set of platforms than our CI will. But the CI and
> turnaround could be *greatly* improved, I think, because
> ghc-complete is essentially ignored or unknown by many people.
> 
> So I'll also make a suggestion: just to actually get something
> that will pull GHC's repo every 10 minutes or so, do a build, and
> then email ghc-devs *only* if failures pop up. In fact, we could
> just re-use the existing nightly build infrastructure for this, and
> just make it check very regularly, and just run standard
> amd64/Linux and Windows builds upon changes. I could provide
> hardware for this. This would increase the visibility of reports,
> not require *any* new code, and already works.
> 
> Overall, I will absolutely help you in every possible way, because 
> this really is a problem for newcomers, and existing developers,
> when we catch dumb failures later than we should. But I think the
> proposed solution here is extraordinarily complex in comparison to
> what we actually need right now.
> 
> ... I will say that if you *did* fix cross compilation however to
> work with TH you would be a hero to many people - myself included
> - continuous integration aside! :)
> 
> [1] https://github.com/nomeata/ghc-complete [2]
> http://haskell.inf.elte.hu/builders/
> 
> On Wed, Jun 18, 2014 at 3:10 PM, William Knop 
> <william.knop.nospam at gmail.com> wrote:
>> Hello all,
>> 
>> I’ve seen quite a few comments on the list and elsewhere
>> lamenting the time it takes to compile and validate ghc. It’s
>> troublesome not only because it’s inconvenient, but, more
>> seriously, people are holding off on sending patches in which
>> stifles development. I would like to propose a solution:
>> 
>> 1. Implement proper cross-compilation, such that build and host
>> may be different— e.g. a linux x86_64 machine can build ghc that
>> runs on Windows x86. What sort of work would this entail?
>> 
>> 2. Batch cross-compiled builds for all OSs/archs on a continuous
>> integration service (e.g. Travis CI) or cloud service, then
>> package up the binaries with the test suite.
>> 
>> 3. Send the package to our buildbots, and run the test suite.
>> 
>> 4. (optional) If using a CI service, have the buildbots send
>> results back to the CI. This could be useful if we'd use GitHub
>> for pulls in the future *.
>> 
>> Cheers, Will
>> 
>> 
>> * I realize vanilla GitHub currently has certain annoying
>> limitations, though some of them are pretty easy to solve via the
>> github-services and/or webhooks. I don’t think this conflicts
>> with the desire to use Phabricator, either, so I’ll send details
>> and motivations to that thread.
>> 
>> 
>> _______________________________________________ ghc-devs mailing
>> list ghc-devs at haskell.org 
>> http://www.haskell.org/mailman/listinfo/ghc-devs
>> 
> 
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJToi1rAAoJEP0rIXJNjNSA7EkIAL2FFR8aBRsxHBTXIcCx6QsM
HE9EHpO9zVF0hZYoTTw9+SwyI08NCMUvRg65YD2Wwrgq+yvGurX/+Oat7UI+6ZJY
jWRY6LJpTDX9OcIFs3wCv7FmSbMDDLgdNR+2t1/atw/buVBityoYKi+1rqeU4I0y
l5mCxL1hXIKwpOVU0IQ1NlZ/Q0G9er5qFSkbQFlRwS2rYNArvmp8UlTxsClZBw07
uSt5Mq2sKuUAth3ZCAt+8Hqp+kWDmV8UPDfDbP/tKSx83XOmH0SDwYCtVj7WwT+V
psHkQwKPOg9QBto2DkxNVXLvwedV3awDhS88emtxQeulCZqly9FP5SWuHjRFHsU=
=Ldqt
-----END PGP SIGNATURE-----