From gnezdo at google.com Tue Oct 10 04:01:47 2017 From: gnezdo at google.com (Greg Steuck (Sh-toy-k)) Date: Tue, 10 Oct 2017 04:01:47 +0000 Subject: [GHC DevOps Group] Phabricator -> GitHub? In-Reply-To: <0FD726CA-ED50-4742-BD39-8D901A684C06@tweag.io> References: <03CFCB54-3433-44D6-AB57-8FF1A7A96FC6@tweag.io> <0FD726CA-ED50-4742-BD39-8D901A684C06@tweag.io> Message-ID: I second the motion to prefer paying for generic services/machine power as a substitute for expending expert effort. X could contribute toward such a goal monetarily. Thanks Greg On Mon, Oct 9, 2017 at 4:07 PM Manuel M T Chakravarty < manuel.chakravarty at tweag.io> wrote: > Am 09.10.2017 um 23:08 schrieb Simon Marlow : > > On 9 October 2017 at 13:04, Simon Marlow wrote: > > On 9 October 2017 at 12:10, Manuel M T Chakravarty < >> manuel.chakravarty at tweag.io> wrote: >> >>> >>> Thirdly, it still is much better than Phabricator on the new random tool >>> front because it requires no custom infrastructure and the PRs still go >>> through GitHub as usual. >>> >> >> I do buy the custom infrastructure argument in general - setting up our >> own CI has definitely taken a lot of Ben's time. I actually really liked >> having Travis for my GHC fork on GitHub. That was when it used to work, >> before our build exceeded what Travis would give us. So I guess that >> illustrates two things: custom infrastructure is nice when it works, but >> we're at the mercy of the suppliers. >> > > (sorry, I meant to say ”outsourced infrastructure", not "custom > infrastructure" above) > > > Yes, you are right. As Mathieu wrote, the limits of Travis are why we are > using CircleCI for the linear types fork of GHC. In other words, what we > are proposing is something that we have tried with success. > > Moreover, Mathieu has indicated that Tweag would be happy to contribute to > a payed CI option if that should become necessary. (Ben’s time is worth > much more than CI costs.) > > Cheers, > Manuel > > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4843 bytes Desc: S/MIME Cryptographic Signature URL: From manuel.chakravarty at tweag.io Tue Oct 10 05:57:23 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 16:57:23 +1100 Subject: [GHC DevOps Group] Mailing list issue Message-ID: <86654EF3-7B29-4EE9-BB95-35D276A20DC6@tweag.io> It seems there is an issue with the mailing list, where it does not forward all messages. I am talking to the haskell.org mailman maintainers to fix this. Sorry, Manuel From m at tweag.io Tue Oct 10 06:05:35 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 10 Oct 2017 08:05:35 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Ben, #14300 contains useful info. But I did have a general question. That ticket mentions specific instructions for FreeBSD in some README. But the README in GHC HEAD does not mention any BSD in any way. Is there some other platform specific README I should be aware of? If so, this should be included in the GHC repo proper. As it is, I did have gcc6 from ports installed before building GHC, but clearly the standard `./boot && ./configure && gmake` instructions were insufficient to make use of that. I later found out through browsing GHC Trac tickets that there are these instructions: https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/FreeBSD But they are marked in bold for "developers and early adopters", not any user that checks out the GHC source code. > In my experience GHC builds without any trouble on FreeBSD 11, which has a new, less broken toolchain. Not any recent checkout of GHC HEAD, surely? The commit that introduced the first bug in Manuel's list is timestamped August 1st. Here's the output of compiling FreeBSD currently on CircleCI: https://circleci.com/gh/tweag/ghc/41 Which I guess should be no surprise, since Gábor points out down-thread that the old FreeBSD build bot is no longer in service. So just like the unreliable presence of the OS X build bots, let's do away with them! (by using hosted infrastructure instead) As soon as this is fixed in HEAD, ./validate for FreeBSD will be able to make progress. Meanwhile, it sounds reasonable to me to call this "Tier 2". On 9 October 2017 at 17:03, Ben Gamari wrote: > Manuel M T Chakravarty writes: > >> According to >> >> https://ghc.haskell.org/trac/ghc/wiki/TeamGHC >> >> >> it is Páli Gábor János aka pgj whose last commit was 20 Jul 2016: >> >> https://github.com/ghc/ghc/commit/0df3f4cdd1dfff42461e3f5c3962f1ecd7c90652 >> >> > While Páli does not contribute many patches, I can confirm that he is > indeed active. In fact, he recently provided an explanation of the > failure that Mathieu observed in #14300. In short, the issue is that the > platform toolchain is buggy due to tiresome licensing issue. In my > experience GHC builds without any trouble on FreeBSD 11, which has a > new, less broken toolchain. > > Cheers, > > - Ben > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From m at tweag.io Tue Oct 10 06:27:35 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 10 Oct 2017 08:27:35 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87y3ojn5n5.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <871smboqtd.fsf@ben-laptop.smart-cactus.org> <87y3ojn5n5.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Ben, great! The below message contains a good partial spec for what behaviour we want from the CI system. To the point of my earlier email, Manuel, Ben and Jonas - could you guys flesh out the requirements we're shooting for in a wiki page? Also, IMHO there are two stages to this: even just getting ./validate to run on all Tier 1 (sans FreeBSD, candidate for demotion to Tier 2) reliably and without suffering outages would be a useful addition to today's arsenal. Then after that we can automate pushing release artifacts to a durable location. Best, -- Mathieu Boespflug Founder at http://tweag.io. On 10 October 2017 at 03:38, Ben Gamari wrote: > Manuel M T Chakravarty writes: > >> Yes, and we have got options for that, such as cross-compiling and >> using QEMU. We will cross that bridge when we get to it. >> >> For now, please let us focus on our immediate goal of automatically >> producing build artefacts to streamline the release process and for >> providing reliable CI for the core platforms. >> >> Given the CircleCI config Mathieu already contributed, what is missing >> to create the release artefacts on CircleCI? >> > Building release artifacts entails the following: > > * Produce and archive a source distribution > > * For each platform: > > * Build a build in a release configuration from this source > distribution (not a checkout, to ensure that our packaging logic > is correct) > > * Produce a binary distribution from this build > > * Run test_bindist on the build > > * Ideally also run the testsuite on this build > > * Archive all of these artifacts > > * Produce and archive a documentation tree > > You will find that these are the steps that my own release tooling > performs [1]. > > Cheers, > > - Ben > > > [1] https://github.com/bgamari/ghc-utils/blob/master/rel-eng/bin-release.sh > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From manuel.chakravarty at tweag.io Tue Oct 10 06:42:25 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 17:42:25 +1100 Subject: [GHC DevOps Group] Mailing list issue In-Reply-To: <86654EF3-7B29-4EE9-BB95-35D276A20DC6@tweag.io> References: <86654EF3-7B29-4EE9-BB95-35D276A20DC6@tweag.io> Message-ID: <52FC2278-B174-4596-A149-CC4215582A58@tweag.io> Apparently, there was permission issue in the set up of this list, which prevented the population of the list archive at https://mail.haskell.org/pipermail/ghc-devops-group/ I think, it is important that our discussions are recorded and publicly available. Hence, I will re-send some of the messages. I’ll try to make sure that everything that has been said is at least in the quoted portion of a recored message — as I don’t know of a better way to achieve this with mailman. I am sorry for the noise. (I will mark all resent messages appropriately.) Manuel PS: Feel free to put this down as another example of how custom infrastructure fails us and creates extra work… > Manuel M T Chakravarty : > > It seems there is an issue with the mailing list, where it does not forward all messages. I am talking to the haskell.org mailman maintainers to fix this. > > Sorry, > Manuel > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 06:46:10 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 17:46:10 +1100 Subject: [GHC DevOps Group] Welcome to GHC DevOps Message-ID: <040A609A-2708-424E-90C2-CF0385E8FB19@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] Hello everybody! Now that the dust of ICFP has settled, I’d like to move the joint effort that we kickstarted in Oxford forward. First of all, thank you all for being willing to contribute and be part of this group. The current members are listed at https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter but we may be able to add further organisations. Administrative issues ~~~~~~~~~~~~~~~~~~~~~ * Simon PJ suggested that I do chair this group for the time being. I am happy to serve in this role unless there are any objections. * Announcement: we announced this group at SimonPJ’s HIW ”Progress on GHC” talk, but I like to also send a message to the ghc-dev list. * We haven’t formalised any rules concerning membership etc, and I’d like to punt on this for now until we have got some experience with the group. Technical items ~~~~~~~~~~~~~~~ (1) We already started a discussion on CI and automating the building of release artefacts. This is really important. Please have look if you haven’t yet. (2) Ben will outline the release schedule that we discussed for GHC 8.4. (3) We need to talk about supporting GitHub for serious contributions to GHC. (I will summarise the points in a separate message.) (4) In the course of (1), the status of FreeBSD as a Tier 1 distribution came up and, I think, this warrants some further discussion. (I’ll start a separate thread.) Cheers, Manuel _______________________________________________ Ghc-devops-group mailing list Ghc-devops-group at haskell.org https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From manuel.chakravarty at tweag.io Tue Oct 10 06:55:22 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 17:55:22 +1100 Subject: [GHC DevOps Group] Fwd: DevOps: Next steps References: Message-ID: <8085CD5D-97AF-42C2-823E-848D21FF033A@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] [Includes messages from Ben and me to which it responds.] > From: "Boespflug, Mathieu" > Subject: Aw: [GHC DevOps Group] DevOps: Next steps > Date: 5. Oktober 2017 um 09:29:10 GMT+11 > To: Ben Gamari > Cc: Jonas Pfenniger Chevalier , ghc-devops-group at haskell.org > > Hi Ben, > > many thanks for your detailed and thoughtful reply. I won't myself > address your points one by one (I expect Manuel will jump in), but I > do want to ground the discussion with the following remarks: > > * What are the requirements that the current Jenkins effort building > towards? I seem to remember some page on the GHC wiki stating these > and then comparing various alternatives, but I can't find it now, so > maybe I dreamed it. The blog post [1] mentions alternatives but > doesn't evaluate them, nor does it state the requirements. > * A key requirement I think is not just that this kind of > infrastructure should not take time to setup given scarce development > resources, but more importantly that none of the maintenance be > bottlenecked on a single person managing a custom fleet of machines > whose state cannot be reproduced. > * Better yet if anyone that forks GHC (with a single click on GitHub) > gets a local copy of the CI by the same token, which can then be > modified at will. > * If we can get very quick wins today for at least 3 of the 4 "Tier 1" > platforms, that's already a step forward and we can work on the rest > later, just like Rust has (see below). > > I'll copy here an experience report [2] from the Rust infra authors > from before they switched to a Travis CI backed solution: > >> * Our buildbot-based CI / release infrastructure cannot be maintained >> by community members, is generally bottlenecked on Alex and myself. > > Sounds like this applies equally to the current Harbourmaster setup. > Perhaps to the Jenkins based one also? > >> * Our buildbot configuration has reliability issues, particularly around >> managing dynamic EC2 instances. > > Sounds familiar. Is any OS X automated testing happening at this > point? I heard some time befor ICFP that one or both of the OS X build > bots had fallen off the edge of the Internet. > >> * Our nightly builds sometimes fail for reasons not caught during CI and >> are down for multiple days. > > This matches my experience when adding CircleCI support: the tip of > the master branch at the time had failing tests. > >> * Packaging Rust for distribution is overly complex, involving >> many systems and source repositories. > > Yup. But admittedly this is an orthogonal issue. > >> * The beta and stable branches do not run the test suite today. >> With the volume of beta backports each release receives this is >> a freightening situation. > > I assume this is not the case for us. But it's unclear where I'd look > to find a declarative description of what's going on for each branch? > Can each branch define their own way to perform CI? > >> * As certain core Rust tools mature we want to deliver them as part of >> the Rust distribution, and this is difficult to do within the >> current infrastructure / build system design. Distributing >> additional tools with Rust is particularly crucial for those >> intimately tied to compiler internals, like the RLS and clippy. > > Also a familiar situation, though again an orthogonal issue. > > So it sounds like at this cross road we've been seeing a lot of the > same things the Rust team has experienced. The jurisprudence they've > established here is pretty strong. If we want to address the very same > problems then we need: > > 1. Reproducible cloud instances that are created/destroyed on-demand, > and whose state doesn't drift over time. That way, no problems with > build bots that eventually disappear. > 2. A declarative description of the *entire infrastructure and test > environment*, for each target platform, so that it can be replicated > by anyone who wants to so, in a single command. That way we're not > blocked on any single person to make changes to it. > > I believe reusing existing managed CI solutions. But let's discuss. > Just know that we'd be happy to contribute towards any paid > subscription necessary. So that shouldn't be a barrier. > > Best, > > Mathieu > > [1] https://ghc.haskell.org/trac/ghc/blog/jenkins-ci > [2] https://internals.rust-lang.org/t/rust-ci-release-infrastructure-changes/4489 > -- > Mathieu Boespflug > Founder at http://tweag.io. > > > On 4 October 2017 at 19:30, Ben Gamari wrote: >> Manuel M T Chakravarty writes: >> >>> Hi Ben, >>> >> Hi Manuel, >> >> Thanks again for your help here! >> >>> Since we talked last week, I have talked with Mathieu and Jonas (our >>> resident DevOps guru) about the whole CI situation and our discussion >>> about automating the production of build artefacts for GHC to make the >>> release process less labour-intensive. I am adding both to CC, so that >>> they can correct me if I am getting anything wrong. >>> >>> When we talked on the phone, you mentioned that we need to be able to >>> support all the Tier 1 platforms, and we both concluded that this >>> implies the need for using Jenkins and we can’t, e.g., use CircleCI as >>> they only support macOS and Linux. Mathieu and Jonas explained to me >>> that this is actually not the case. Apparently, Rust solves this issue >>> by building Linux and macOS artefacts on CircleCI, Windows on >>> Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD >>> could be done that way and eventually ARM builds). >>> >> Indeed when starting this I looked a bit at what rustc does. By my >> recollection, they don't actually perform builds on anything but >> Linux/amd64. Instead they build cross-compilers on x86-64, use these to >> build their testsuite artifacts, and then run these under qemu (and in >> some cases, e.g. FreeBSD, they don't even do this). >> >> While in general I would love to be able to do everything with >> cross-compiled binaries from Linux/amd64, our cross-compilation story >> may be a bit lacking to pull this off at the moment. Moritz Angerman has >> been making great strides in this area recently but it's going to be a >> while until we can really make this work. In particular, our Template >> Haskell story will need quite some work before we can reliably do a full >> cross-compiled testsuite run. >> >> In general I'm a bit skeptical of moving to a solution that relegates >> non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have >> commercial users and do deserve first-class CI support. Furthermore, >> without KVM or hypervisor support (which, as far as I can tell, CircleCI >> does not provide [1]) I'm not sure that virtualisation will allow us to >> get where we want to be in terms of test coverage and build response >> time due to the cost of virtualisation. Without hardware support qemu >> can be rather expensive. >> >>> They convinced me that this is a worthwhile direction to consider for >>> the following reasons: >>> >>> * Jenkins is a fickle beast: apparently scaling Jenkins to work >>> reliably when running tests against multiple PRs on distributed >>> infrastructure is hard — we ran into significant problems in a client >>> project recently. >>> >> >> I agree that Jenkins is a rather fickle beast; indeed it can be >> positively infuriating to work with. However, I've not yet noticed the >> scaling issues you describe. What in particular did you observe? >> >>> * All the custom set up and maintaining of build nodes etc required by >>> Jenkins disappears. (Mathieu built the CircleCI setup that he >>> contributed recently quite quickly, so there really is little overhead >>> in setting this up.) >>> >> I'm not sure that the difference here is actually so great. Yes, in the >> case of Jenkins you do have physical machines to administer. However, >> this typically isn't the hard part. If you look at Rust's configuration, >> they have roughly a dozen Docker environments which they had to setup >> and maintain; this effort will likely far outweigh the setup cost of the >> machines themselves. This has certainly been the case for Jenkins and I >> suspect it would be true of CircleCI as well; this is simply the cost to >> entry for cross-platform testing. >> >> Moreover, we can't write off the cost of integrating with CircleCI. Of >> course, if we do decide to move to GitHub then perhaps this cost shrinks >> dramatically. However, until this decision is made it seems like we need >> to assume that Phabricator integration will be necessary. >> >>> * The problems we discussed with possibly not having enough Rackspace >>> capacity for the transition disappears. >>> >> In some sense this is true; however, it seems like we are trading one >> commodity of finite supply for another. We currently have Rackspace >> credit and consequently these instances can be considered to be >> essentially free. >> >> While CircleCI is does offer four free containers for open source >> projects (and perhaps a bit more in our case if we ask), I'm skeptical >> that this will be enough; currently our four build bots give us >> multi-day wait times which makes development remarkably painful. The >> appeal of Jenkins is that we can shorten this timescale as well as grow >> our test coverage with the resources that we already have. >> >> Let's have a brief look at what resources we may need. >> >> A quick back-of-the-envelope calculation suggests that to simply keep up >> with our current average commit rate (around 200 commits/month) for the >> four environments that we currently build we need a bare minimum of: >> >> 200 commit/month >> * 4 build/commit (Linux/i386, Linux/amd64, >> OS X, Windows/amd64) >> * 2.5 CPU-hour/build (approx. average across platforms >> for a validate) >> / (2 CPU-hour/machine-hour) (CircleCI appears to use 2 vCPU instances) >> / (30*24 machine-hour/month) >> ~ 2 machines >> >> note that this doesn't guarantee reasonable wait times but rather merely >> ensure that we can keep up on the mean. On top of this, we see around >> 300 differential revisions per month. This requires another 3 machines >> to keep up. >> >> So, we need at least five machines but, again, this is a minimum; >> modelling response times is hard but I expect we would likely need to >> add at least two more machines to keep response times in the >> contributor-friendly range, especially considering that under Circle CI >> we will lose the ability to prioritize jobs (by contrast, with Jenkins >> we can prioritize pull requests as this is the response time that we >> really care about). Now consider that we would like to add at least >> three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may >> be relatively slow to build due to virtualisation overhead) as well as a >> few more build configurations on amd64 (LLVM, unregisterised, at least >> one cross-compilation target) and a periodic slow validation and we may >> be at over a dozen machines. >> >> All of this appears to put us well outside CircleCI's offering to >> open-source projects. Of course, it may be worth asking whether they are >> willing to extend GHC a more generous offer. However, I don't think we >> can count on this and I'm not certain that Haskell.org is currently in a >> position to be able to shoulder such a financial burden. >> >>> * We also don’t need to worry about a macOS box either. >>> >> Quite true. >> >>> Also, Jonas could help us getting things running and, I think, his >>> wealth of experience would be very useful. (At least, I would be very >>> grateful for his advise.) >>> >>> I think, this route has the potential to get us to where we want to be >>> quite quickly and in a manner that is very little effort to maintain >>> once set up. What do you think? >>> >> Indeed I can see that there are many advantages to the CircleCI option. >> The ease of bringing up a Linux/amd64 build environment which easily >> scales and requires no administration is quite enticing. However, I am a >> skeptical that it will be as easy to get the full suite of builds that >> we are aiming to produce. I would be quite curious to see what Jonas has >> to say on the matter of non-Linux platforms. Seeing a simple >> configuration which compiles and tests even a FreeBSD/amd64 build in a >> reasonable amout of time may well be enough to convince me. >> >> Thanks again for your help on this! >> >> Cheers, >> >> - Ben >> >> >> [1] https://circleci.com/docs/1.0/android/ > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 06:57:40 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 17:57:40 +1100 Subject: [GHC DevOps Group] Fwd: DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> Message-ID: <65EC2CF1-B487-4006-A3A5-07EEDCB58FE4@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] Ben, thanks for pointing out important issues in our requirements. And, Mathieu, thanks for moving this to the list. > 05.10.2017, 08:31, Boespflug, Mathieu : > > Ben's response. Copying it to the list now that this list exists. > > ---------- Forwarded message ---------- > From: Ben Gamari > Date: 4 October 2017 at 19:30 > Subject: Re: DevOps: Next steps > To: Manuel M T Chakravarty > Cc: Mathieu Boespflug , Jonas Pfenniger Chevalier > > > Manuel M T Chakravarty writes: > >> When we talked on the phone, you mentioned that we need to be able to >> support all the Tier 1 platforms, and we both concluded that this >> implies the need for using Jenkins and we can’t, e.g., use CircleCI as >> they only support macOS and Linux. Mathieu and Jonas explained to me >> that this is actually not the case. Apparently, Rust solves this issue >> by building Linux and macOS artefacts on CircleCI, Windows on >> Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD >> could be done that way and eventually ARM builds). >> > Indeed when starting this I looked a bit at what rustc does. By my > recollection, they don't actually perform builds on anything but > Linux/amd64. Instead they build cross-compilers on x86-64, use these to > build their testsuite artifacts, and then run these under qemu (and in > some cases, e.g. FreeBSD, they don't even do this). > > While in general I would love to be able to do everything with > cross-compiled binaries from Linux/amd64, our cross-compilation story > may be a bit lacking to pull this off at the moment. Moritz Angerman has > been making great strides in this area recently but it's going to be a > while until we can really make this work. In particular, our Template > Haskell story will need quite some work before we can reliably do a full > cross-compiled testsuite run. > > In general I'm a bit skeptical of moving to a solution that relegates > non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have > commercial users and do deserve first-class CI support. Furthermore, > without KVM or hypervisor support (which, as far as I can tell, CircleCI > does not provide [1]) I'm not sure that virtualisation will allow us to > get where we want to be in terms of test coverage and build response > time due to the cost of virtualisation. Without hardware support qemu > can be rather expensive. Sorry for not expressing myself clearly here. I didn’t want to propose to exactly copy Rust’s approach. In particular, as you are writing, relying on cross-compilation is not an option for us. (Although, from my reading of the Rust repo, they do not build everything via QEMU.) In concrete terms, the proposal for GHC would be the following: * Linux & macOS builds: CircleCI * Windows builds: Appveyor * Everything else: QEMU (and maybe it is not necessary to run all the test on these either) >> They convinced me that this is a worthwhile direction to consider for >> the following reasons: >> >> * Jenkins is a fickle beast: apparently scaling Jenkins to work >> reliably when running tests against multiple PRs on distributed >> infrastructure is hard — we ran into significant problems in a client >> project recently. >> > > I agree that Jenkins is a rather fickle beast; indeed it can be > positively infuriating to work with. However, I've not yet noticed the > scaling issues you describe. What in particular did you observe? Jonas, could you maybe explain it? >> * All the custom set up and maintaining of build nodes etc required by >> Jenkins disappears. (Mathieu built the CircleCI setup that he >> contributed recently quite quickly, so there really is little overhead >> in setting this up.) >> > I'm not sure that the difference here is actually so great. Yes, in the > case of Jenkins you do have physical machines to administer. However, > this typically isn't the hard part. If you look at Rust's configuration, > they have roughly a dozen Docker environments which they had to setup > and maintain; this effort will likely far outweigh the setup cost of the > machines themselves. This has certainly been the case for Jenkins and I > suspect it would be true of CircleCI as well; this is simply the cost to > entry for cross-platform testing. I misspoke earlier and Rust seems to use Travis CI together with Appveyor. Looking at https://github.com/rust-lang/rust/blob/master/.travis.yml and https://github.com/rust-lang/rust/blob/master/appveyor.yml They only seem to do the Docker thing for their cross-compilation targets (and I believe those are always going to be harder to set up). One nice thing about this, as Mathieu pointed out, is that somebody who forks the repo can just run the same CI on their own Travis/Circle/Appveyor accounts with little effort — just as we are doing this currently with Tweag’s linear types fork of GHC: https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml This is a powerful way of scaling. > Moreover, we can't write off the cost of integrating with CircleCI. Of > course, if we do decide to move to GitHub then perhaps this cost shrinks > dramatically. However, until this decision is made it seems like we need > to assume that Phabricator integration will be necessary. By the ”re-use existing infrastructure instead of writing your own” mantra, this is just another reason to go for GitHub. >> * The problems we discussed with possibly not having enough Rackspace >> capacity for the transition disappears. >> > In some sense this is true; however, it seems like we are trading one > commodity of finite supply for another. We currently have Rackspace > credit and consequently these instances can be considered to be > essentially free. > > While CircleCI is does offer four free containers for open source > projects (and perhaps a bit more in our case if we ask), I'm skeptical > that this will be enough; currently our four build bots give us > multi-day wait times which makes development remarkably painful. The > appeal of Jenkins is that we can shorten this timescale as well as grow > our test coverage with the resources that we already have. > > Let's have a brief look at what resources we may need. > > A quick back-of-the-envelope calculation suggests that to simply keep up > with our current average commit rate (around 200 commits/month) for the > four environments that we currently build we need a bare minimum of: > > 200 commit/month > * 4 build/commit (Linux/i386, Linux/amd64, > OS X, Windows/amd64) > * 2.5 CPU-hour/build (approx. average across platforms > for a validate) > / (2 CPU-hour/machine-hour) (CircleCI appears to use 2 vCPU instances) > / (30*24 machine-hour/month) > ~ 2 machines > > note that this doesn't guarantee reasonable wait times but rather merely > ensure that we can keep up on the mean. On top of this, we see around > 300 differential revisions per month. This requires another 3 machines > to keep up. > > So, we need at least five machines but, again, this is a minimum; > modelling response times is hard but I expect we would likely need to > add at least two more machines to keep response times in the > contributor-friendly range, especially considering that under Circle CI > we will lose the ability to prioritize jobs (by contrast, with Jenkins > we can prioritize pull requests as this is the response time that we > really care about). Now consider that we would like to add at least > three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may > be relatively slow to build due to virtualisation overhead) as well as a > few more build configurations on amd64 (LLVM, unregisterised, at least > one cross-compilation target) and a periodic slow validation and we may > be at over a dozen machines. > > All of this appears to put us well outside CircleCI's offering to > open-source projects. Of course, it may be worth asking whether they are > willing to extend GHC a more generous offer. However, I don't think we > can count on this and I'm not certain that Haskell.org is currently in a > position to be able to shoulder such a financial burden. Mathieu has indicated that Tweag would be willing to contribute towards those costs. (Developer time, such as yours, is so much more expensive than these subscription costs that it’ll always be more efficient to outsource to CI companies.) >> Also, Jonas could help us getting things running and, I think, his >> wealth of experience would be very useful. (At least, I would be very >> grateful for his advise.) >> >> I think, this route has the potential to get us to where we want to be >> quite quickly and in a manner that is very little effort to maintain >> once set up. What do you think? >> > Indeed I can see that there are many advantages to the CircleCI option. > The ease of bringing up a Linux/amd64 build environment which easily > scales and requires no administration is quite enticing. However, I am a > skeptical that it will be as easy to get the full suite of builds that > we are aiming to produce. I would be quite curious to see what Jonas has > to say on the matter of non-Linux platforms. Seeing a simple > configuration which compiles and tests even a FreeBSD/amd64 build in a > reasonable amout of time may well be enough to convince me. Ok, fair enough, let’s look at exactly how hard this is. Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:01:48 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:01:48 +1100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: Message-ID: <1E43A635-2742-446B-9820-FF2C82922E0E@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] According to https://ghc.haskell.org/trac/ghc/wiki/TeamGHC it is Páli Gábor János aka pgj whose last commit was 20 Jul 2016: https://github.com/ghc/ghc/commit/0df3f4cdd1dfff42461e3f5c3962f1ecd7c90652 Manuel > Am 09.10.2017 um 19:02 schrieb Simon Marlow >: > > The usual requirement for a platform to be in Tier 1 is that there's an active maintainer to fix issues as they arise. Do we have a maintainer for the FreeBSD port? > > On 9 October 2017 at 06:38, Manuel M T Chakravarty > wrote: > Mathieu sunk quite a bit of time over the weekend into building GHC on FreeBSD and found that it doesn’t build and hasn’t been building for a while. Specifically, there is > > https://ghc.haskell.org/trac/ghc/ticket/14064 > > and > > https://github.com/haskell/unix/issues/102 > > Also, due to > > https://ghc.haskell.org/trac/ghc/ticket/12695 > > FreeBSD builds need to be configured specially. > > This leads to my question: > > ** Why is FreeBSD in Tier 1? ** > > (See https://ghc.haskell.org/trac/ghc/wiki/Platforms ) > > It seems that nobody is sufficiently interested in FreeBSD to fix these issues. I don’t think it is fair to hold up releases of true Tier 1 platforms only because FreeBSD issues haven’t been fixed. Moreover, as we discussed on a separate thread, FreeBSD introduces additional constraints/work in the CI setup. > > I’d like to propose to move FreeBSD to Tier 2. Is there any good reason not to? > > Cheers, > Manuel > > PS: If anybody is using GHC on FreeBSD in a mission-critical way, that would be a good reason, but then I would expect that party to commit some resources to help us out. > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:02:43 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:02:43 +1100 Subject: [GHC DevOps Group] Fwd: FreeBSD in Tier 1 References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: <79FF0196-85DD-4FE4-BE61-97C3CF18D7D2@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > From: Ben Gamari > Subject: Aw: [GHC DevOps Group] FreeBSD in Tier 1 > Date: 10. Oktober 2017 um 02:03:32 GMT+11 > To: Manuel M T Chakravarty , Simon Marlow > Cc: ghc-devops-group at haskell.org, Pali Gabor Janos > > Manuel M T Chakravarty writes: > >> According to >> >> https://ghc.haskell.org/trac/ghc/wiki/TeamGHC >> >> >> it is Páli Gábor János aka pgj whose last commit was 20 Jul 2016: >> >> https://github.com/ghc/ghc/commit/0df3f4cdd1dfff42461e3f5c3962f1ecd7c90652 >> >> > While Páli does not contribute many patches, I can confirm that he is > indeed active. In fact, he recently provided an explanation of the > failure that Mathieu observed in #14300. In short, the issue is that the > platform toolchain is buggy due to tiresome licensing issue. In my > experience GHC builds without any trouble on FreeBSD 11, which has a > new, less broken toolchain. > > Cheers, > > - Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:03:13 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:03:13 +1100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: <564177CB-1A0E-4F01-81AC-7A3DF130263F@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] Hi Gábor, thanks a lot for clarifying the situation and also for continuing to provide advise about FreeBSD where your time allows. That is much appreciated. Simon, Ben, this sounds to me very much like a Tier 2 commitment. May I suggest that we classify FreeBSD as Tier 2 until somebody else comes along who can commit to actively maintaining the platform and provide timely fixes (starting with fixing the current problems)? Manuel PS: I know, it would be perfect to provide great support for many platforms, but as it stands, we are having trouble with even the core ones. > Páli Gábor János : > > Hello there, > > 2017-10-09 17:03 GMT+02:00 Ben Gamari : >> While Páli does not contribute many patches, I can confirm that he is >> indeed active. > > Thanks Ben for vouching me :-) Though I do not know what the original > question was, let me just give you a brief "status report" perhaps > that could help with the answer. > > TL;DR: Yes, I am still here, and available for questions and support, > but I do not track the status of GHC-head/FreeBSD so closely and not > do changes to it myself these days. > > I did most of my work in the FreeBSD Project where I maintained the > GHC port and ports for certain Cabal packages. I also run a GHC build > bot to monitor the health of FreeBSD builds for GHC-head, and I > requested for a GHC repository commit access to submit occasional > fixes or port-specific changes to the upstream directly. I use > FreeBSD daily as a primary system, where I usually have some version > of GHC (8.0.2 as of yet) installed as well. > > My priorities have changed a while ago, I gave up my Haskell-related > position at the university by September, and I am about to start a new > non-Haskell job in the industry soon. As a result, the machine that > served the daily FreeBSD snapshots is currently offline, I do not > either do Haskell commits to the FreeBSD ports repository directly, > and I silently acknowledged that GHC HQ now does the FreeBSD/amd64 > builds for the GHC releases. > > But I am still helping the interested FreeBSD Project committers or > contributors with reviewing patches, and I am still watching the > FreeBSD-specific GHC Trac tickets and comment on them as my time > permits. I may be back on the ride once for more but I cannot tell > that for now. > >> In my experience GHC builds without any trouble on FreeBSD 11, which has a >> new, less broken toolchain. > > We have been using the latest version of GCC and binutils from the > FreeBSD Ports Collection as binutils in the FreeBSD base system is > stuck in 2007 and the now-default LLVM-based alternative (Clang, LLDB, > LLD etc.) is not yet there on every supported release as you could > have also experienced that. > > There is a patch floating around somewhere in the FreeBSD Phabricator > to make the official FreeBSD GHC port to use base Clang by default, so > it could get a wider testing, but apparently it is only a viable > option on FreeBSD 11 and later. > > Cheers, > Gábor From manuel.chakravarty at tweag.io Tue Oct 10 07:04:16 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:04:16 +1100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <871smboqtd.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <871smboqtd.fsf@ben-laptop.smart-cactus.org> Message-ID: <20013719-1027-42E5-BA00-D5B680E45EF8@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] Yes, and we have got options for that, such as cross-compiling and using QEMU. We will cross that bridge when we get to it. For now, please let us focus on our immediate goal of automatically producing build artefacts to streamline the release process and for providing reliable CI for the core platforms. Given the CircleCI config Mathieu already contributed, what is missing to create the release artefacts on CircleCI? Cheers, Manuel > Ben Gamari >: > Manuel M T Chakravarty > writes: >> Simon, Ben, this sounds to me very much like a Tier 2 commitment. May >> I suggest that we classify FreeBSD as Tier 2 until somebody else comes >> along who can commit to actively maintaining the platform and provide >> timely fixes (starting with fixing the current problems)? >> > Yes, I agree and am fine with moving FreeBSD to tier 2 status. > > However, I would like to nevertheless emphasize the point I raised > earlier about cross-platform CI: If and when a maintainer picks up > FreeBSD (or any other Tier 2 platform) our infrastructure should be able > to accomodate them. > > For instance, I imagine Moritz will move to make AArch64 Tier 1 soon > after he stabilizes that platform. > > Cheers, > > - Ben _______________________________________________ Ghc-devops-group mailing list Ghc-devops-group at haskell.org https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:08:21 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:08:21 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? Message-ID: <6C1CFC7D-958E-4C3B-A9C8-D0A13282CEE4@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] I have spoken to a number of people about the question of using GitHub pull requests and code reviews instead of Phabricator for GHC development. And while some people are quite happy with Phabricator and/or prefer it due to using it at work anyway, the majority of people I talked to would prefer to use GitHub. In fact, some people (such as Neil Mitchell, Will Jones (VP Engineering @ Habito), and myself) stated that they do not contribute patches to GHC because they don’t want to deal with the overhead that Phabricator imposes. (So far, I haven’t had anybody who said they would not contribute if they have to use GitHub, but obviously my sample set is small and likely skewed.) Having said that, obviously, we will always find people preferring one tool over another. And also obviously, both tools can do the job (and for GitHub there are options for more sophisticated than the basic type of code reviews if need be: http://reviewable.io/). Hence, I like to offer two technical and one social reasons why we should replace Phabricator by GitHub (and I do mean replace, not run both side-by-side). = Technical (1) Rule One of DevOps: minimise custom infrastructure [Our resources are scarce. Why waste them on something that can be outsourced (for free)?] (2) We really need to sort out CI and integration with GitHub is easier — see also (1). = Social * Virtually every developer knows how to use GitHub and custom-anything creates friction. That learning Phabricator is little effort compared to learning to contribute to GHC is a red herring IMHO. Firstly, if the learning curve is steep, you don’t make it steeper. Secondly, there are people (e.g., Neil and me) who can very well contribute to GHC, but who don’t, because they don’t want waste time on yet another random tool. Life is too short! The reason why I don’t want to run Phabricator and GitHub side-by-side is because this would fail to help with the two technical reasons. Cheers, Manuel PS: This is *not* about moving Trac to GitHub. We are only talking about pull requests and code reviews. _______________________________________________ Ghc-devops-group mailing list Ghc-devops-group at haskell.org https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From manuel.chakravarty at tweag.io Tue Oct 10 07:10:06 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:10:06 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? In-Reply-To: References: Message-ID: <12D2C103-A36E-431A-BC8E-585C22B1CDFF@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] [Includes all of Simon’s message.] > Simon Marlow >: > > Personally I prefer to stay with Phabricator because it's better for code reviews, and because we already use it. Having said that, if a majority of the developer community would prefer GitHub, and there is effort available to make the transition, then we should do it. But I do have strong feelings on a couple of things: > > - We should consult the whole developer community. Changing the code review tool affects not just potential contributors, but also the people doing the reviewing. If the reviewers are less productive or less keen to spend time on reviews, then overall quality will drop. An increase in contributions can only be supported if we also have an increase in the pool of people reviewing and merging contributions. It’s about increasing the size of the developer community, not drive-by contributions. We will absolutely talk to more developers, but please keep in mind that projects like the Rust and Swift compiler use GitHub. These are projects that are striving and get lots of contributions. Rust had 120 active PRs last week and Swift had 132 active PRs. I don’t quite see what is so different about GHC. > - We should use *one* code-review tool only. Making a transition to a situation where we have two different code review tools (GitHub + reviewable.io ) would be a step backwards. (after all, one of the arguments below is against learning a new random tool….) > > The alternative that we discussed in the past is to better support contributions via GitHub, which we could still do. I am not at all arguing for reviewable.io (I have never used the latter), but I don’t understand your argument. Firstly, why is GitHub + reviewable.io worse than GitHub + Phabricator as mix? Secondly, the suggestion was only to use reviewable.io for particularly heavy PRs. Thirdly, it still is much better than Phabricator on the new random tool front because it requires no custom infrastructure and the PRs still go through GitHub as usual. In any case, I don’t want to argue for reviewable.io . Let’s just say that, if we move to GitHub, and we decide at some point, we want extra code review power for select contributions, there are options. Cheers, Manuel > On 9 October 2017 at 07:44, Manuel M T Chakravarty > wrote: > I have spoken to a number of people about the question of using GitHub pull requests and code reviews instead of Phabricator for GHC development. And while some people are quite happy with Phabricator and/or prefer it due to using it at work anyway, the majority of people I talked to would prefer to use GitHub. In fact, some people (such as Neil Mitchell, Will Jones (VP Engineering @ Habito), and myself) stated that they do not contribute patches to GHC because they don’t want to deal with the overhead that Phabricator imposes. (So far, I haven’t had anybody who said they would not contribute if they have to use GitHub, but obviously my sample set is small and likely skewed.) > > Having said that, obviously, we will always find people preferring one tool over another. And also obviously, both tools can do the job (and for GitHub there are options for more sophisticated than the basic type of code reviews if need be: http://reviewable.io/ ). > > Hence, I like to offer two technical and one social reasons why we should replace Phabricator by GitHub (and I do mean replace, not run both side-by-side). > > = Technical > > (1) Rule One of DevOps: minimise custom infrastructure > [Our resources are scarce. Why waste them on something that can be outsourced (for free)?] > > (2) We really need to sort out CI and integration with GitHub is easier — see also (1). > > = Social > > * Virtually every developer knows how to use GitHub and custom-anything creates friction. That learning Phabricator is little effort compared to learning to contribute to GHC is a red herring IMHO. Firstly, if the learning curve is steep, you don’t make it steeper. Secondly, there are people (e.g., Neil and me) who can very well contribute to GHC, but who don’t, because they don’t want waste time on yet another random tool. Life is too short! > > The reason why I don’t want to run Phabricator and GitHub side-by-side is because this would fail to help with the two technical reasons. > > Cheers, > Manuel > > PS: This is *not* about moving Trac to GitHub. We are only talking about pull requests and code reviews. > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > _______________________________________________ Ghc-devops-group mailing list Ghc-devops-group at haskell.org https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:15:02 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:15:02 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? In-Reply-To: References: Message-ID: [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] [Includes all of Simon’s message] > Simon Peyton Jones >: > > I don’t have a well-informed opinion, but in instinct is to follow the mainstream even if a technically-better alternative exists, unless it’s a LOT better. For the reasons Manuel outlines. > > Am I right that GitHub code review has improved? It has improved since this issue was discussed last. I believe one of the main criticisms in the past was that while people could comment on individual lines of a proposed contributions (aka pull request), there was no way to tie those into a code review unit. This facility has since been added. Moreover, contributors can now request code reviews from specific reviewers and the repositories can be configured such that contributions cannot be merged until signed off by a reviewer. As Ben recently remarked, the main downside seems to be that GitHub code reviews don’t play very nicely with rebasing commits during a code review. However, some of these things are also a matter of using the right workflow. As I mentioned before, the Swift compiler uses GitHub. They deal with a much bigger throughput. Here are their instructions to contributors (including the code review policy): https://swift.org/contributing/#contributing-code > Would using reviewable.io impose similar socio-technical barriers that Phab does? No, it is more lightweight. Contributors don’t need to install extra software. Contributions are still just pull requests, and it could be used only for complex contributions (i.e., most people wouldn’t have to deal with it). Manuel > From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org ] On Behalf Of Simon Marlow > Sent: 09 October 2017 08:58 > To: Manuel M T Chakravarty > > Cc: ghc-devops-group at haskell.org > Subject: Re: [GHC DevOps Group] Phabricator -> GitHub? > > Personally I prefer to stay with Phabricator because it's better for code reviews, and because we already use it. Having said that, if a majority of the developer community would prefer GitHub, and there is effort available to make the transition, then we should do it. But I do have strong feelings on a couple of things: > > > > - We should consult the whole developer community. Changing the code review tool affects not just potential contributors, but also the people doing the reviewing. If the reviewers are less productive or less keen to spend time on reviews, then overall quality will drop. An increase in contributions can only be supported if we also have an increase in the pool of people reviewing and merging contributions. It's about increasing the size of the developer community, not drive-by contributions. > > > > - We should use *one* code-review tool only. Making a transition to a situation where we have two different code review tools (GitHub + reviewable.io ) would be a step backwards. (after all, one of the arguments below is against learning a new random tool....) > > > > The alternative that we discussed in the past is to better support contributions via GitHub, which we could still do. > > > > Cheers > > Simon > > > > > > On 9 October 2017 at 07:44, Manuel M T Chakravarty > wrote: > > I have spoken to a number of people about the question of using GitHub pull requests and code reviews instead of Phabricator for GHC development. And while some people are quite happy with Phabricator and/or prefer it due to using it at work anyway, the majority of people I talked to would prefer to use GitHub. In fact, some people (such as Neil Mitchell, Will Jones (VP Engineering @ Habito), and myself) stated that they do not contribute patches to GHC because they don’t want to deal with the overhead that Phabricator imposes. (So far, I haven’t had anybody who said they would not contribute if they have to use GitHub, but obviously my sample set is small and likely skewed.) > > Having said that, obviously, we will always find people preferring one tool over another. And also obviously, both tools can do the job (and for GitHub there are options for more sophisticated than the basic type of code reviews if need be: http://reviewable.io/ ). > > Hence, I like to offer two technical and one social reasons why we should replace Phabricator by GitHub (and I do mean replace, not run both side-by-side). > > = Technical > > (1) Rule One of DevOps: minimise custom infrastructure > [Our resources are scarce. Why waste them on something that can be outsourced (for free)?] > > (2) We really need to sort out CI and integration with GitHub is easier — see also (1). > > = Social > > * Virtually every developer knows how to use GitHub and custom-anything creates friction. That learning Phabricator is little effort compared to learning to contribute to GHC is a red herring IMHO. Firstly, if the learning curve is steep, you don’t make it steeper. Secondly, there are people (e.g., Neil and me) who can very well contribute to GHC, but who don’t, because they don’t want waste time on yet another random tool. Life is too short! > > The reason why I don’t want to run Phabricator and GitHub side-by-side is because this would fail to help with the two technical reasons. > > Cheers, > Manuel > > PS: This is *not* about moving Trac to GitHub. We are only talking about pull requests and code reviews. > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group _______________________________________________ Ghc-devops-group mailing list Ghc-devops-group at haskell.org https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:16:59 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:16:59 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? Message-ID: <532B17DC-5E94-475E-990B-C67067D81E2D@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > From: "Boespflug, Mathieu" > Subject: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Date: 9. Oktober 2017 um 23:02:32 GMT+11 > To: Manuel M T Chakravarty > Cc: "ghc-devops-group at haskell.org" > > On 9 October 2017 at 13:23, Manuel M T Chakravarty > wrote: >> Simon Peyton Jones : >> >> I don’t have a well-informed opinion, but in instinct is to follow the >> mainstream even if a technically-better alternative exists, unless it’s a >> LOT better. For the reasons Manuel outlines. >> >> Am I right that GitHub code review has improved? >> >> It has improved since this issue was discussed last. I believe one of the >> main criticisms in the past was that while people could comment on >> individual lines of a proposed contributions (aka pull request), there was >> no way to tie those into a code review unit. This facility has since been >> added. > > To add to Manuel's comment - from a practical perspective what this > meant was that in the past if someone had 15 comments to make about > your pull request during review, you'd be bombarded with 15 emails in > your inbox. Simon M in particular pointed this out as particularly > problematic. And I agree. But as Manuel points out, GitHub has now > fixed this: a reviewer can send a bunch of comments in one batch, and > attach semantics to it (accept PR / request changes / refuse it etc). > >> Moreover, contributors can now request code reviews from specific >> reviewers and the repositories can be configured such that contributions >> cannot be merged until signed off by a reviewer. > > Indeed. There is also a new related feature, and likely one that may > prove quite useful for a large project like GHC. You can enforce > things like "all changes to template-haskell need to be reviewed by > person X", or "person Y is the gatekeeper for all type checker related > changes" etc. That said - this is just extra mechanism that large > GitHub projects (and there are many) have lived without it okay until > recently (e.g. the Nixpkgs project, with ~2k commits every month). > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > Anfang der weitergeleiteten Nachricht: > > Von: "Boespflug, Mathieu" > Betreff: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Datum: 9. Oktober 2017 um 23:02:32 GMT+11 > An: Manuel M T Chakravarty > Kopie: Simon Peyton-Jones , "ghc-devops-group at haskell.org" > > On 9 October 2017 at 13:23, Manuel M T Chakravarty > wrote: >> Simon Peyton Jones : >> >> I don’t have a well-informed opinion, but in instinct is to follow the >> mainstream even if a technically-better alternative exists, unless it’s a >> LOT better. For the reasons Manuel outlines. >> >> Am I right that GitHub code review has improved? >> >> It has improved since this issue was discussed last. I believe one of the >> main criticisms in the past was that while people could comment on >> individual lines of a proposed contributions (aka pull request), there was >> no way to tie those into a code review unit. This facility has since been >> added. > > To add to Manuel's comment - from a practical perspective what this > meant was that in the past if someone had 15 comments to make about > your pull request during review, you'd be bombarded with 15 emails in > your inbox. Simon M in particular pointed this out as particularly > problematic. And I agree. But as Manuel points out, GitHub has now > fixed this: a reviewer can send a bunch of comments in one batch, and > attach semantics to it (accept PR / request changes / refuse it etc). > >> Moreover, contributors can now request code reviews from specific >> reviewers and the repositories can be configured such that contributions >> cannot be merged until signed off by a reviewer. > > Indeed. There is also a new related feature, and likely one that may > prove quite useful for a large project like GHC. You can enforce > things like "all changes to template-haskell need to be reviewed by > person X", or "person Y is the gatekeeper for all type checker related > changes" etc. That said - this is just extra mechanism that large > GitHub projects (and there are many) have lived without it okay until > recently (e.g. the Nixpkgs project, with ~2k commits every month). > Anfang der weitergeleiteten Nachricht: > > Von: "Boespflug, Mathieu" > Betreff: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Datum: 9. Oktober 2017 um 23:02:32 GMT+11 > An: Manuel M T Chakravarty > Kopie: Simon Peyton-Jones , "ghc-devops-group at haskell.org" > > On 9 October 2017 at 13:23, Manuel M T Chakravarty > wrote: >> Simon Peyton Jones : >> >> I don’t have a well-informed opinion, but in instinct is to follow the >> mainstream even if a technically-better alternative exists, unless it’s a >> LOT better. For the reasons Manuel outlines. >> >> Am I right that GitHub code review has improved? >> >> It has improved since this issue was discussed last. I believe one of the >> main criticisms in the past was that while people could comment on >> individual lines of a proposed contributions (aka pull request), there was >> no way to tie those into a code review unit. This facility has since been >> added. > > To add to Manuel's comment - from a practical perspective what this > meant was that in the past if someone had 15 comments to make about > your pull request during review, you'd be bombarded with 15 emails in > your inbox. Simon M in particular pointed this out as particularly > problematic. And I agree. But as Manuel points out, GitHub has now > fixed this: a reviewer can send a bunch of comments in one batch, and > attach semantics to it (accept PR / request changes / refuse it etc). > >> Moreover, contributors can now request code reviews from specific >> reviewers and the repositories can be configured such that contributions >> cannot be merged until signed off by a reviewer. > > Indeed. There is also a new related feature, and likely one that may > prove quite useful for a large project like GHC. You can enforce > things like "all changes to template-haskell need to be reviewed by > person X", or "person Y is the gatekeeper for all type checker related > changes" etc. That said - this is just extra mechanism that large > GitHub projects (and there are many) have lived without it okay until > recently (e.g. the Nixpkgs project, with ~2k commits every month). -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:19:17 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:19:17 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? Message-ID: [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > From: "Boespflug, Mathieu" > Subject: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Date: 10. Oktober 2017 um 08:10:39 GMT+11 > To: Simon Marlow > Cc: Manuel M T Chakravarty , ghc-devops-group at haskell.org > > Hi Simon, > >>>> Secondly, the suggestion was only to use reviewable.io for particularly >>>> heavy PRs. >>> >>> >>> I think that would be worse, because we have some PRs going through >>> different code review tools. On Phabricator I have a single dashboard that >>> shows me which PRs are needing my attention, I don't want to have to visit >>> two different tools to see that. > > I agree with you that not having all PR's in a single dashboard would > be very inconvenient. But note that even with reviewable.io (which I > think really isn't very necessary at this point, but is at least an > option), you'd still have a single dashboard, that dashboard being the > GitHub PR dashboard you know currently. > > The essential difference is that reviewable.io is not an alternate > tool. It's an opt-in thin layer on top of GitHub. > >>>> >>>> Thirdly, it still is much better than Phabricator on the new random tool >>>> front because it requires no custom infrastructure and the PRs still go >>>> through GitHub as usual. >>> >>> >>> I do buy the custom infrastructure argument in general - setting up our >>> own CI has definitely taken a lot of Ben's time. I actually really liked >>> having Travis for my GHC fork on GitHub. That was when it used to work, >>> before our build exceeded what Travis would give us. So I guess that >>> illustrates two things: custom infrastructure is nice when it works, but >>> we're at the mercy of the suppliers. > > Just another note here: these kinds of limits have proven "soft" in > the past, in that the Travis CI folks have been willing to bump the > limits for a large project like GHC. I'd be surprised if they weren't > willing to bump these limits some more, especially for a highly > visible open source project like GHC. In any case, I've had very good > success setting up CI for Tweag I/O's GHC fork for linear types on top > of CircleCI, which has the option run on faster machines, and also has > less aggressive limits. I hear the Hadrian project still uses Travis > CI to good effect. In terms of effort vs quality, to setup something > that's a) uses only scalable resources, and b) is completely > reproducible by anyone, both these hosted CI options proved to have a > very good ratio indeed. > Anfang der weitergeleiteten Nachricht: > > Von: "Boespflug, Mathieu" > Betreff: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Datum: 10. Oktober 2017 um 08:10:39 GMT+11 > An: Simon Marlow > Kopie: Manuel M T Chakravarty , ghc-devops-group at haskell.org > > Hi Simon, > >>>> Secondly, the suggestion was only to use reviewable.io for particularly >>>> heavy PRs. >>> >>> >>> I think that would be worse, because we have some PRs going through >>> different code review tools. On Phabricator I have a single dashboard that >>> shows me which PRs are needing my attention, I don't want to have to visit >>> two different tools to see that. > > I agree with you that not having all PR's in a single dashboard would > be very inconvenient. But note that even with reviewable.io (which I > think really isn't very necessary at this point, but is at least an > option), you'd still have a single dashboard, that dashboard being the > GitHub PR dashboard you know currently. > > The essential difference is that reviewable.io is not an alternate > tool. It's an opt-in thin layer on top of GitHub. > >>>> >>>> Thirdly, it still is much better than Phabricator on the new random tool >>>> front because it requires no custom infrastructure and the PRs still go >>>> through GitHub as usual. >>> >>> >>> I do buy the custom infrastructure argument in general - setting up our >>> own CI has definitely taken a lot of Ben's time. I actually really liked >>> having Travis for my GHC fork on GitHub. That was when it used to work, >>> before our build exceeded what Travis would give us. So I guess that >>> illustrates two things: custom infrastructure is nice when it works, but >>> we're at the mercy of the suppliers. > > Just another note here: these kinds of limits have proven "soft" in > the past, in that the Travis CI folks have been willing to bump the > limits for a large project like GHC. I'd be surprised if they weren't > willing to bump these limits some more, especially for a highly > visible open source project like GHC. In any case, I've had very good > success setting up CI for Tweag I/O's GHC fork for linear types on top > of CircleCI, which has the option run on faster machines, and also has > less aggressive limits. I hear the Hadrian project still uses Travis > CI to good effect. In terms of effort vs quality, to setup something > that's a) uses only scalable resources, and b) is completely > reproducible by anyone, both these hosted CI options proved to have a > very good ratio indeed. > Anfang der weitergeleiteten Nachricht: > > Von: "Boespflug, Mathieu" > Betreff: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Datum: 10. Oktober 2017 um 08:10:39 GMT+11 > An: Simon Marlow > Kopie: ghc-devops-group at haskell.org > > Hi Simon, > >>>> Secondly, the suggestion was only to use reviewable.io for particularly >>>> heavy PRs. >>> >>> >>> I think that would be worse, because we have some PRs going through >>> different code review tools. On Phabricator I have a single dashboard that >>> shows me which PRs are needing my attention, I don't want to have to visit >>> two different tools to see that. > > I agree with you that not having all PR's in a single dashboard would > be very inconvenient. But note that even with reviewable.io (which I > think really isn't very necessary at this point, but is at least an > option), you'd still have a single dashboard, that dashboard being the > GitHub PR dashboard you know currently. > > The essential difference is that reviewable.io is not an alternate > tool. It's an opt-in thin layer on top of GitHub. > >>>> >>>> Thirdly, it still is much better than Phabricator on the new random tool >>>> front because it requires no custom infrastructure and the PRs still go >>>> through GitHub as usual. >>> >>> >>> I do buy the custom infrastructure argument in general - setting up our >>> own CI has definitely taken a lot of Ben's time. I actually really liked >>> having Travis for my GHC fork on GitHub. That was when it used to work, >>> before our build exceeded what Travis would give us. So I guess that >>> illustrates two things: custom infrastructure is nice when it works, but >>> we're at the mercy of the suppliers. > > Just another note here: these kinds of limits have proven "soft" in > the past, in that the Travis CI folks have been willing to bump the > limits for a large project like GHC. I'd be surprised if they weren't > willing to bump these limits some more, especially for a highly > visible open source project like GHC. In any case, I've had very good > success setting up CI for Tweag I/O's GHC fork for linear types on top > of CircleCI, which has the option run on faster machines, and also has > less aggressive limits. I hear the Hadrian project still uses Travis > CI to good effect. In terms of effort vs quality, to setup something > that's a) uses only scalable resources, and b) is completely > reproducible by anyone, both these hosted CI options proved to have a > very good ratio indeed. > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:20:20 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:20:20 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? In-Reply-To: References: <03CFCB54-3433-44D6-AB57-8FF1A7A96FC6@tweag.io> Message-ID: <94537953-54D7-4693-A418-CBEF7759E152@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > Simon Marlow >: > > On 9 October 2017 at 12:10, Manuel M T Chakravarty > wrote: > We will absolutely talk to more developers, but please keep in mind that projects like the Rust and Swift compiler use GitHub. These are projects that are striving and get lots of contributions. Rust had 120 active PRs last week and Swift had 132 active PRs. I don’t quite see what is so different about GHC. > > LLVM uses Phabricator and I didn't count how many PRs they have open, but it looks like a *lot*, and these are all active: https://reviews.llvm.org/differential/query/lks1dJdapQFa/#R > > So I don't buy the argument that we have to move to GitHub to get more contributions. Phabricator is clearly not a barrier for LLVM, so why should it be a barrier for GHC? Firstly, please let me re-iterate that I do not doubt Phabricator’s utility as a review tool. As for Phabricator as a barrier to entry, LLVM and GHC are very different projects. Since, GCC manoeuvred itself technically and politically into a corner, LLVM essentially has a monopoly in the open-source compiler-backend space. (For example, students in compiler research groups usually have no choice, but to build on LLVM.) In contrast, there are lots of open-source frontends to contribute to. Moreover, please keep in mind that LLVM is Chris Lattner’s first major OSS project (and when it started, GitHub didn’t even exist). Now, for his second major OSS project, the Swift compiler, he did choose GitHub over Phabricator. >> - We should use *one* code-review tool only. Making a transition to a situation where we have two different code review tools (GitHub + reviewable.io ) would be a step backwards. (after all, one of the arguments below is against learning a new random tool….) >> >> The alternative that we discussed in the past is to better support contributions via GitHub, which we could still do. > > I am not at all arguing for reviewable.io (I have never used the latter), but I don’t understand your argument. Firstly, why is GitHub + reviewable.io worse than GitHub + Phabricator as mix? > > I’m saying GitHub + reviewable.io would be worse than either GitHub alone, or Phabricator alone. Sorry, maybe I misunderstood what you where saying. Maybe it depends on what we regard as ”better support for contributions" through GitHub. For me, that would mean that contributions can be submitted and reviewed through either GitHub or Phabricator; hence, my comment. Sorry if that wasn’t what you meant. Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:21:02 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:21:02 +1100 Subject: [GHC DevOps Group] Phabricator -> GitHub? In-Reply-To: References: <03CFCB54-3433-44D6-AB57-8FF1A7A96FC6@tweag.io> Message-ID: <44710744-359A-44D5-A219-8A65A1F7A871@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > Am 09.10.2017 um 23:08 schrieb Simon Marlow >: > On 9 October 2017 at 13:04, Simon Marlow > wrote: > On 9 October 2017 at 12:10, Manuel M T Chakravarty > wrote: >> > > Thirdly, it still is much better than Phabricator on the new random tool front because it requires no custom infrastructure and the PRs still go through GitHub as usual. > > I do buy the custom infrastructure argument in general - setting up our own CI has definitely taken a lot of Ben's time. I actually really liked having Travis for my GHC fork on GitHub. That was when it used to work, before our build exceeded what Travis would give us. So I guess that illustrates two things: custom infrastructure is nice when it works, but we're at the mercy of the suppliers. > > (sorry, I meant to say ”outsourced infrastructure", not "custom infrastructure" above) Yes, you are right. As Mathieu wrote, the limits of Travis are why we are using CircleCI for the linear types fork of GHC. In other words, what we are proposing is something that we have tried with success. Moreover, Mathieu has indicated that Tweag would be happy to contribute to a payed CI option if that should become necessary. (Ben’s time is worth much more than CI costs.) Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 10 07:29:50 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 10 Oct 2017 18:29:50 +1100 Subject: [GHC DevOps Group] Fwd: Phabricator -> GitHub? References: Message-ID: <05B0B319-222F-4A12-B6E7-E1AE3C8854D1@tweag.io> [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html] > From: Simon Marlow > Subject: Aw: [GHC DevOps Group] Phabricator -> GitHub? > Date: 9. Oktober 2017 um 23:08:58 GMT+11 > To: Manuel M T Chakravarty > Cc: ghc-devops-group at haskell.org > > On 9 October 2017 at 13:04, Simon Marlow > wrote: > On 9 October 2017 at 12:10, Manuel M T Chakravarty > wrote: >> Simon Marlow >: >> >> Personally I prefer to stay with Phabricator because it's better for code reviews, and because we already use it. Having said that, if a majority of the developer community would prefer GitHub, and there is effort available to make the transition, then we should do it. But I do have strong feelings on a couple of things: >> >> - We should consult the whole developer community. Changing the code review tool affects not just potential contributors, but also the people doing the reviewing. If the reviewers are less productive or less keen to spend time on reviews, then overall quality will drop. An increase in contributions can only be supported if we also have an increase in the pool of people reviewing and merging contributions. It’s about increasing the size of the developer community, not drive-by contributions. > > We will absolutely talk to more developers, but please keep in mind that projects like the Rust and Swift compiler use GitHub. These are projects that are striving and get lots of contributions. Rust had 120 active PRs last week and Swift had 132 active PRs. I don’t quite see what is so different about GHC. > > LLVM uses Phabricator and I didn't count how many PRs they have open, but it looks like a *lot*, and these are all active: https://reviews.llvm.org/differential/query/lks1dJdapQFa/#R > > So I don't buy the argument that we have to move to GitHub to get more contributions. Phabricator is clearly not a barrier for LLVM, so why should it be a barrier for GHC? > >> - We should use *one* code-review tool only. Making a transition to a situation where we have two different code review tools (GitHub + reviewable.io ) would be a step backwards. (after all, one of the arguments below is against learning a new random tool….) >> >> The alternative that we discussed in the past is to better support contributions via GitHub, which we could still do. > > I am not at all arguing for reviewable.io (I have never used the latter), but I don’t understand your argument. Firstly, why is GitHub + reviewable.io worse than GitHub + Phabricator as mix? > > I'm saying GitHub + reviewable.io would be worse than either GitHub alone, or Phabricator alone. > > Secondly, the suggestion was only to use reviewable.io for particularly heavy PRs. > > I think that would be worse, because we have some PRs going through different code review tools. On Phabricator I have a single dashboard that shows me which PRs are needing my attention, I don't want to have to visit two different tools to see that. > > Thirdly, it still is much better than Phabricator on the new random tool front because it requires no custom infrastructure and the PRs still go through GitHub as usual. > > I do buy the custom infrastructure argument in general - setting up our own CI has definitely taken a lot of Ben's time. I actually really liked having Travis for my GHC fork on GitHub. That was when it used to work, before our build exceeded what Travis would give us. So I guess that illustrates two things: custom infrastructure is nice when it works, but we're at the mercy of the suppliers. > > (sorry, I meant to say "outsourced infrastructure", not "custom infrastructure" above) > > Cheers > Simon > > In any case, I don’t want to argue for reviewable.io . Let’s just say that, if we move to GitHub, and we decide at some point, we want extra code review power for select contributions, there are options. > > Cheers, > Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Tue Oct 10 15:44:00 2017 From: ben at well-typed.com (Ben Gamari) Date: Tue, 10 Oct 2017 11:44:00 -0400 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: <87a80zm2in.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Hi Ben, > Hi Mathieu, Thanks for your time on this matter! > #14300 contains useful info. But I did have a general question. That > ticket mentions specific instructions for FreeBSD in some README. But > the README in GHC HEAD does not mention any BSD in any way. Is there > some other platform specific README I should be aware of? If so, this > should be included in the GHC repo proper. > > As it is, I did have gcc6 from ports installed before building GHC, > but clearly the standard `./boot && ./configure && gmake` instructions > were insufficient to make use of that. I later found out through > browsing GHC Trac tickets that there are these instructions: > > https://ghc.haskell.org/trac/ghc/wiki/Building/Preparation/FreeBSD > > But they are marked in bold for "developers and early adopters", not > any user that checks out the GHC source code. > >> In my experience GHC builds without any trouble on FreeBSD 11, which has a new, less broken toolchain. > > Not any recent checkout of GHC HEAD, surely? The commit that > introduced the first bug in Manuel's list is timestamped August 1st. > Here's the output of compiling FreeBSD currently on CircleCI: > > https://circleci.com/gh/tweag/ghc/41 > Touché; the last commit that I built locally dates from 19 July 2017. > Which I guess should be no surprise, since Gábor points out > down-thread that the old FreeBSD build bot is no longer in service. So > just like the unreliable presence of the OS X build bots, let's do > away with them! (by using hosted infrastructure instead) > Just to clarify: the reliability of the OS X build bot generally hasn't been a problem. As I point out elsewhere, there really has only been one two-day outage in the two years we've used it. However, to move back to the point of cross-platform builds: My concern is that I'm not yet convinced that we have a viable plan for extending a hosted solution for non-Linux/amd64 environments. The numbers I have seen suggest that one incurs more than a 50% performance hit over even slow ARM hardware in moving to virtualisation. Even dynamic translation of amd64 on amd64 incurs a significant hit (which is the environment we would need for a non-Linux operating system on amd64). In my mind this is a concern regardless of whether FreeBSD is Tier 1 or not. If someone were to step up to maintain FreeBSD, or any other non-Linux/amd64 platform, the day after we adopt CircleCI, what would we tell them? It seems to me the response may very well be "sorry, we would love to support you but our CI infrastructure isn't up the task." This gives me pause. I am here to be convinced, however. Cheers, - Ben [1] https://raspberrypi.stackexchange.com/questions/333/how-does-speed-of-qemu-emulation-compare-to-a-real-raspberry-pi-board -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From gnezdo at google.com Tue Oct 10 17:01:42 2017 From: gnezdo at google.com (Greg Steuck (Sh-toy-k)) Date: Tue, 10 Oct 2017 17:01:42 +0000 Subject: [GHC DevOps Group] Fwd: DevOps: Next steps In-Reply-To: <65EC2CF1-B487-4006-A3A5-07EEDCB58FE4@tweag.io> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <65EC2CF1-B487-4006-A3A5-07EEDCB58FE4@tweag.io> Message-ID: Google has a Cloud product with VM offerings. We could supply them as part of X contribution. Linux and Windows are expressly supported. FreeBSD is listed as an option. I still would prefer paying CI companies rather than dealing with VMs. Thanks Greg On Mon, Oct 9, 2017 at 11:58 PM Manuel M T Chakravarty < manuel.chakravarty at tweag.io> wrote: > [RESENT MESSAGE — see > https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html > ] > > > Ben, thanks for pointing out important issues in our requirements. > > And, Mathieu, thanks for moving this to the list. > > 05.10.2017, 08:31, Boespflug, Mathieu : > > Ben's response. Copying it to the list now that this list exists. > > > > ---------- Forwarded message ---------- > From: Ben Gamari > Date: 4 October 2017 at 19:30 > Subject: Re: DevOps: Next steps > To: Manuel M T Chakravarty > Cc: Mathieu Boespflug , Jonas Pfenniger Chevalier > > > Manuel M T Chakravarty writes: > > When we talked on the phone, you mentioned that we need to be able to > > support all the Tier 1 platforms, and we both concluded that this > implies the need for using Jenkins and we can’t, e.g., use CircleCI as > they only support macOS and Linux. Mathieu and Jonas explained to me > that this is actually not the case. Apparently, Rust solves this issue > by building Linux and macOS artefacts on CircleCI, Windows on > Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD > could be done that way and eventually ARM builds). > > Indeed when starting this I looked a bit at what rustc does. By my > recollection, they don't actually perform builds on anything but > Linux/amd64. Instead they build cross-compilers on x86-64, use these to > build their testsuite artifacts, and then run these under qemu (and in > some cases, e.g. FreeBSD, they don't even do this). > > While in general I would love to be able to do everything with > cross-compiled binaries from Linux/amd64, our cross-compilation story > may be a bit lacking to pull this off at the moment. Moritz Angerman has > been making great strides in this area recently but it's going to be a > while until we can really make this work. In particular, our Template > Haskell story will need quite some work before we can reliably do a full > cross-compiled testsuite run. > > In general I'm a bit skeptical of moving to a solution that relegates > non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have > commercial users and do deserve first-class CI support. Furthermore, > without KVM or hypervisor support (which, as far as I can tell, CircleCI > does not provide [1]) I'm not sure that virtualisation will allow us to > get where we want to be in terms of test coverage and build response > time due to the cost of virtualisation. Without hardware support qemu > can be rather expensive. > > > Sorry for not expressing myself clearly here. I didn’t want to propose to > exactly copy Rust’s approach. In particular, as you are writing, relying on > cross-compilation is not an option for us. (Although, from my reading of > the Rust repo, they do not build everything via QEMU.) > > In concrete terms, the proposal for GHC would be the following: > > * Linux & macOS builds: CircleCI > * Windows builds: Appveyor > * Everything else: QEMU (and maybe it is not necessary to run all the test > on these either) > > They convinced me that this is a worthwhile direction to consider for > the following reasons: > > * Jenkins is a fickle beast: apparently scaling Jenkins to work > reliably when running tests against multiple PRs on distributed > infrastructure is hard — we ran into significant problems in a client > project recently. > > > I agree that Jenkins is a rather fickle beast; indeed it can be > positively infuriating to work with. However, I've not yet noticed the > scaling issues you describe. What in particular did you observe? > > > Jonas, could you maybe explain it? > > * All the custom set up and maintaining of build nodes etc required by > Jenkins disappears. (Mathieu built the CircleCI setup that he > contributed recently quite quickly, so there really is little overhead > in setting this up.) > > I'm not sure that the difference here is actually so great. Yes, in the > case of Jenkins you do have physical machines to administer. However, > this typically isn't the hard part. If you look at Rust's configuration, > they have roughly a dozen Docker environments which they had to setup > and maintain; this effort will likely far outweigh the setup cost of the > machines themselves. This has certainly been the case for Jenkins and I > suspect it would be true of CircleCI as well; this is simply the cost to > entry for cross-platform testing. > > > I misspoke earlier and Rust seems to use Travis CI together with Appveyor. > Looking at > > https://github.com/rust-lang/rust/blob/master/.travis.yml > > and > > https://github.com/rust-lang/rust/blob/master/appveyor.yml > > They only seem to do the Docker thing for their cross-compilation targets > (and I believe those are always going to be harder to set up). > > One nice thing about this, as Mathieu pointed out, is that somebody who > forks the repo can just run the same CI on their own Travis/Circle/Appveyor > accounts with little effort — just as we are doing this currently with > Tweag’s linear types fork of GHC: > > https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml > > This is a powerful way of scaling. > > Moreover, we can't write off the cost of integrating with CircleCI. Of > course, if we do decide to move to GitHub then perhaps this cost shrinks > dramatically. However, until this decision is made it seems like we need > to assume that Phabricator integration will be necessary. > > > By the ”re-use existing infrastructure instead of writing your own” > mantra, this is just another reason to go for GitHub. > > > * The problems we discussed with possibly not having enough Rackspace > capacity for the transition disappears. > > In some sense this is true; however, it seems like we are trading one > commodity of finite supply for another. We currently have Rackspace > credit and consequently these instances can be considered to be > essentially free. > > While CircleCI is does offer four free containers for open source > projects (and perhaps a bit more in our case if we ask), I'm skeptical > that this will be enough; currently our four build bots give us > multi-day wait times which makes development remarkably painful. The > appeal of Jenkins is that we can shorten this timescale as well as grow > our test coverage with the resources that we already have. > > Let's have a brief look at what resources we may need. > > A quick back-of-the-envelope calculation suggests that to simply keep up > with our current average commit rate (around 200 commits/month) for the > four environments that we currently build we need a bare minimum of: > > 200 commit/month > * 4 build/commit (Linux/i386, Linux/amd64, > OS X, Windows/amd64) > * 2.5 CPU-hour/build (approx. average across platforms > for a validate) > / (2 CPU-hour/machine-hour) (CircleCI appears to use 2 vCPU instances) > / (30*24 machine-hour/month) > ~ 2 machines > > note that this doesn't guarantee reasonable wait times but rather merely > ensure that we can keep up on the mean. On top of this, we see around > 300 differential revisions per month. This requires another 3 machines > to keep up. > > So, we need at least five machines but, again, this is a minimum; > modelling response times is hard but I expect we would likely need to > add at least two more machines to keep response times in the > contributor-friendly range, especially considering that under Circle CI > we will lose the ability to prioritize jobs (by contrast, with Jenkins > we can prioritize pull requests as this is the response time that we > really care about). Now consider that we would like to add at least > three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may > be relatively slow to build due to virtualisation overhead) as well as a > few more build configurations on amd64 (LLVM, unregisterised, at least > one cross-compilation target) and a periodic slow validation and we may > be at over a dozen machines. > > All of this appears to put us well outside CircleCI's offering to > open-source projects. Of course, it may be worth asking whether they are > willing to extend GHC a more generous offer. However, I don't think we > can count on this and I'm not certain that Haskell.org is currently in a > position to be able to shoulder such a financial burden. > > > Mathieu has indicated that Tweag would be willing to contribute towards > those costs. (Developer time, such as yours, is so much more expensive than > these subscription costs that it’ll always be more efficient to outsource > to CI companies.) > > Also, Jonas could help us getting things running and, I think, his > > wealth of experience would be very useful. (At least, I would be very > grateful for his advise.) > > I think, this route has the potential to get us to where we want to be > quite quickly and in a manner that is very little effort to maintain > once set up. What do you think? > > Indeed I can see that there are many advantages to the CircleCI option. > The ease of bringing up a Linux/amd64 build environment which easily > scales and requires no administration is quite enticing. However, I am a > skeptical that it will be as easy to get the full suite of builds that > we are aiming to produce. I would be quite curious to see what Jonas has > to say on the matter of non-Linux platforms. Seeing a simple > configuration which compiles and tests even a FreeBSD/amd64 build in a > reasonable amout of time may well be enough to convince me. > > > Ok, fair enough, let’s look at exactly how hard this is. > > Cheers, > Manuel > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4843 bytes Desc: S/MIME Cryptographic Signature URL: From marlowsd at gmail.com Tue Oct 10 19:26:20 2017 From: marlowsd at gmail.com (Simon Marlow) Date: Tue, 10 Oct 2017 20:26:20 +0100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87a80zm2in.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: On 10 October 2017 at 16:44, Ben Gamari wrote: > Just to clarify: the reliability of the OS X build bot generally hasn't > been a problem. As I point out elsewhere, there really has only been one > two-day outage in the two years we've used it. > I think it has a massive backlog though: the last complete build I see is on Sept 30: https://phabricator.haskell.org/diffusion/GHC/history/master/ Cheers Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Tue Oct 10 19:33:11 2017 From: ben at well-typed.com (Ben Gamari) Date: Tue, 10 Oct 2017 15:33:11 -0400 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: <87376qn6h4.fsf@ben-laptop.smart-cactus.org> Simon Marlow writes: > On 10 October 2017 at 16:44, Ben Gamari wrote: > >> Just to clarify: the reliability of the OS X build bot generally hasn't >> been a problem. As I point out elsewhere, there really has only been one >> two-day outage in the two years we've used it. >> > > I think it has a massive backlog though: the last complete build I see is > on Sept 30: https://phabricator.haskell.org/diffusion/GHC/history/master/ > I don't believe that is true; it finished 00ff02352f07 (from Oct 2) just a few minutes ago [1]. Keep in mind that Harbormaster doesn't build in any particular order (which I consider to be one of its larger flaws). Cheers, - Ben [1] https://phabricator.haskell.org/harbormaster/build/35427/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From pali.gabor at gmail.com Mon Oct 9 16:09:49 2017 From: pali.gabor at gmail.com (=?UTF-8?B?UMOhbGkgR8OhYm9yIErDoW5vcw==?=) Date: Mon, 9 Oct 2017 18:09:49 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: Hello there, 2017-10-09 17:03 GMT+02:00 Ben Gamari : > While Páli does not contribute many patches, I can confirm that he is > indeed active. Thanks Ben for vouching me :-) Though I do not know what the original question was, let me just give you a brief "status report" perhaps that could help with the answer. TL;DR: Yes, I am still here, and available for questions and support, but I do not track the status of GHC-head/FreeBSD so closely and not do changes to it myself these days. I did most of my work in the FreeBSD Project where I maintained the GHC port and ports for certain Cabal packages. I also run a GHC build bot to monitor the health of FreeBSD builds for GHC-head, and I requested for a GHC repository commit access to submit occasional fixes or port-specific changes to the upstream directly. I use FreeBSD daily as a primary system, where I usually have some version of GHC (8.0.2 as of yet) installed as well. My priorities have changed a while ago, I gave up my Haskell-related position at the university by September, and I am about to start a new non-Haskell job in the industry soon. As a result, the machine that served the daily FreeBSD snapshots is currently offline, I do not either do Haskell commits to the FreeBSD ports repository directly, and I silently acknowledged that GHC HQ now does the FreeBSD/amd64 builds for the GHC releases. But I am still helping the interested FreeBSD Project committers or contributors with reviewing patches, and I am still watching the FreeBSD-specific GHC Trac tickets and comment on them as my time permits. I may be back on the ride once for more but I cannot tell that for now. > In my experience GHC builds without any trouble on FreeBSD 11, which has a > new, less broken toolchain. We have been using the latest version of GCC and binutils from the FreeBSD Ports Collection as binutils in the FreeBSD base system is stuck in 2007 and the now-default LLVM-based alternative (Clang, LLDB, LLD etc.) is not yet there on every supported release as you could have also experienced that. There is a patch floating around somewhere in the FreeBSD Phabricator to make the official FreeBSD GHC port to use base Clang by default, so it could get a wider testing, but apparently it is only a viable option on FreeBSD 11 and later. Cheers, Gábor From pali.gabor at gmail.com Tue Oct 10 09:23:24 2017 From: pali.gabor at gmail.com (=?UTF-8?B?UMOhbGkgR8OhYm9yIErDoW5vcw==?=) Date: Tue, 10 Oct 2017 11:23:24 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> Message-ID: 2017-10-10 8:05 GMT+02:00 Boespflug, Mathieu : > the README in GHC HEAD does not mention any BSD in any way. Is there > some other platform specific README I should be aware of? When I documented this, I falled back to the traditional solution, i.e. I added a page on the GHC developer wiki about it. That is basically the page you have already found there. > As it is, I did have gcc6 from ports installed before building GHC, > but clearly the standard `./boot && ./configure && gmake` instructions > were insufficient to make use of that. This will not probably work indeed. FreeBSD juggles with multiple compiler toolchains and libraries, some of them is in the base system (i.e. available by default), some of them is in the Ports Collection (i.e. one has to install explicitly), so setting CC is highly preferred on calling the configure script. In addition to that, FreeBSD does not follow the usual GNU/Linux file system layout but aims to separate the files of base system from the files of third-party applications (elements of the Port Collection). In result, it stores many non-standard headers, libraries, and binaries under /usr/local (but this could be different), which could also useful to be specified. > But they are marked in bold for "developers and early adopters", not > any user that checks out the GHC source code. I considered anybody who checks out the GHC source code an "developer or early adopter" as regular users should just `pkg install ghc` (install GHC from the FreeBSD package repository) and should not care about anything else. Anyhow, the `README.md` in the GHC git repository used to had (perhaps still has) this sentence: "Before building GHC you may need to install some other tools and libraries. See, Setting up your system for building GHC [on the wiki]". > Which I guess should be no surprise, since Gábor points out > down-thread that the old FreeBSD build bot is no longer in service. So > just like the unreliable presence of the OS X build bots, let's do > away with them! (by using hosted infrastructure instead) As far as I am aware, there were already plans to run the build bots under the umbrella of haskell.org or GHC HQ. Ian Lynagh has implemented a nice CI service in Haskell, that is what I continued to use, but I agree that there should be something again that is officially maintained. If I just got some access to some VM with FreeBSD that I would have to look after to keep the respective bots in a good order, it would be happy to do that. In absence of that, I had to run my own solution, but will not really work in the long run. From manuel.chakravarty at tweag.io Tue Oct 10 23:49:12 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Wed, 11 Oct 2017 10:49:12 +1100 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <65EC2CF1-B487-4006-A3A5-07EEDCB58FE4@tweag.io> Message-ID: <00024768-37E3-4B7A-8745-CC1EB7892A84@tweag.io> Thank you very much for the offer and also for being willing to contribute to CI fees. That is very helpful. Cheers, Manuel > Greg Steuck (Sh-toy-k) : > > Google has a Cloud product with VM offerings. We could supply them as part of X contribution. Linux and Windows are expressly supported. FreeBSD is listed as an option. > > I still would prefer paying CI companies rather than dealing with VMs. > > Thanks > Greg > > On Mon, Oct 9, 2017 at 11:58 PM Manuel M T Chakravarty > wrote: > [RESENT MESSAGE — see https://mail.haskell.org/pipermail/ghc-devops-group/2017-October/000004.html ] > > > Ben, thanks for pointing out important issues in our requirements. > > And, Mathieu, thanks for moving this to the list. > >> 05.10.2017, 08:31, Boespflug, Mathieu >: >> >> Ben's response. Copying it to the list now that this list exists. > >> >> >> ---------- Forwarded message ---------- >> From: Ben Gamari > >> Date: 4 October 2017 at 19:30 >> Subject: Re: DevOps: Next steps >> To: Manuel M T Chakravarty > >> Cc: Mathieu Boespflug >, Jonas Pfenniger Chevalier >> > >> >> Manuel M T Chakravarty > writes: >> > >>> When we talked on the phone, you mentioned that we need to be able to >>> support all the Tier 1 platforms, and we both concluded that this >>> implies the need for using Jenkins and we can’t, e.g., use CircleCI as >>> they only support macOS and Linux. Mathieu and Jonas explained to me >>> that this is actually not the case. Apparently, Rust solves this issue >>> by building Linux and macOS artefacts on CircleCI, Windows on >>> Appveyor, and everything else using QEMU on CircleCI (e.g., FreeBSD >>> could be done that way and eventually ARM builds). >>> >> Indeed when starting this I looked a bit at what rustc does. By my >> recollection, they don't actually perform builds on anything but >> Linux/amd64. Instead they build cross-compilers on x86-64, use these to >> build their testsuite artifacts, and then run these under qemu (and in >> some cases, e.g. FreeBSD, they don't even do this). >> >> While in general I would love to be able to do everything with >> cross-compiled binaries from Linux/amd64, our cross-compilation story >> may be a bit lacking to pull this off at the moment. Moritz Angerman has >> been making great strides in this area recently but it's going to be a >> while until we can really make this work. In particular, our Template >> Haskell story will need quite some work before we can reliably do a full >> cross-compiled testsuite run. >> >> In general I'm a bit skeptical of moving to a solution that relegates >> non-Linux/amd64 builds to a VM. Non-Linux/amd64 platforms have >> commercial users and do deserve first-class CI support. Furthermore, >> without KVM or hypervisor support (which, as far as I can tell, CircleCI >> does not provide [1]) I'm not sure that virtualisation will allow us to >> get where we want to be in terms of test coverage and build response >> time due to the cost of virtualisation. Without hardware support qemu >> can be rather expensive. > > > Sorry for not expressing myself clearly here. I didn’t want to propose to exactly copy Rust’s approach. In particular, as you are writing, relying on cross-compilation is not an option for us. (Although, from my reading of the Rust repo, they do not build everything via QEMU.) > > In concrete terms, the proposal for GHC would be the following: > > * Linux & macOS builds: CircleCI > * Windows builds: Appveyor > * Everything else: QEMU (and maybe it is not necessary to run all the test on these either) > >>> They convinced me that this is a worthwhile direction to consider for >>> the following reasons: >>> >>> * Jenkins is a fickle beast: apparently scaling Jenkins to work >>> reliably when running tests against multiple PRs on distributed >>> infrastructure is hard — we ran into significant problems in a client >>> project recently. >>> >> >> I agree that Jenkins is a rather fickle beast; indeed it can be >> positively infuriating to work with. However, I've not yet noticed the >> scaling issues you describe. What in particular did you observe? > > Jonas, could you maybe explain it? > >>> * All the custom set up and maintaining of build nodes etc required by >>> Jenkins disappears. (Mathieu built the CircleCI setup that he >>> contributed recently quite quickly, so there really is little overhead >>> in setting this up.) >>> >> I'm not sure that the difference here is actually so great. Yes, in the >> case of Jenkins you do have physical machines to administer. However, >> this typically isn't the hard part. If you look at Rust's configuration, >> they have roughly a dozen Docker environments which they had to setup >> and maintain; this effort will likely far outweigh the setup cost of the >> machines themselves. This has certainly been the case for Jenkins and I >> suspect it would be true of CircleCI as well; this is simply the cost to >> entry for cross-platform testing. > > I misspoke earlier and Rust seems to use Travis CI together with Appveyor. Looking at > > https://github.com/rust-lang/rust/blob/master/.travis.yml > > and > > https://github.com/rust-lang/rust/blob/master/appveyor.yml > > They only seem to do the Docker thing for their cross-compilation targets (and I believe those are always going to be harder to set up). > > One nice thing about this, as Mathieu pointed out, is that somebody who forks the repo can just run the same CI on their own Travis/Circle/Appveyor accounts with little effort — just as we are doing this currently with Tweag’s linear types fork of GHC: > > https://github.com/tweag/ghc/blob/linear-types/.circleci/config.yml > > This is a powerful way of scaling. > >> Moreover, we can't write off the cost of integrating with CircleCI. Of >> course, if we do decide to move to GitHub then perhaps this cost shrinks >> dramatically. However, until this decision is made it seems like we need >> to assume that Phabricator integration will be necessary. > > By the ”re-use existing infrastructure instead of writing your own” mantra, this is just another reason to go for GitHub. > > >>> * The problems we discussed with possibly not having enough Rackspace >>> capacity for the transition disappears. >>> >> In some sense this is true; however, it seems like we are trading one >> commodity of finite supply for another. We currently have Rackspace >> credit and consequently these instances can be considered to be >> essentially free. >> >> While CircleCI is does offer four free containers for open source >> projects (and perhaps a bit more in our case if we ask), I'm skeptical >> that this will be enough; currently our four build bots give us >> multi-day wait times which makes development remarkably painful. The >> appeal of Jenkins is that we can shorten this timescale as well as grow >> our test coverage with the resources that we already have. >> >> Let's have a brief look at what resources we may need. >> >> A quick back-of-the-envelope calculation suggests that to simply keep up >> with our current average commit rate (around 200 commits/month) for the >> four environments that we currently build we need a bare minimum of: >> >> 200 commit/month >> * 4 build/commit (Linux/i386, Linux/amd64, >> OS X, Windows/amd64) >> * 2.5 CPU-hour/build (approx. average across platforms >> for a validate) >> / (2 CPU-hour/machine-hour) (CircleCI appears to use 2 vCPU instances) >> / (30*24 machine-hour/month) >> ~ 2 machines >> >> note that this doesn't guarantee reasonable wait times but rather merely >> ensure that we can keep up on the mean. On top of this, we see around >> 300 differential revisions per month. This requires another 3 machines >> to keep up. >> >> So, we need at least five machines but, again, this is a minimum; >> modelling response times is hard but I expect we would likely need to >> add at least two more machines to keep response times in the >> contributor-friendly range, especially considering that under Circle CI >> we will lose the ability to prioritize jobs (by contrast, with Jenkins >> we can prioritize pull requests as this is the response time that we >> really care about). Now consider that we would like to add at least >> three more platforms (FreeBSD, OpenBSD, Linux/aarch64, all of which may >> be relatively slow to build due to virtualisation overhead) as well as a >> few more build configurations on amd64 (LLVM, unregisterised, at least >> one cross-compilation target) and a periodic slow validation and we may >> be at over a dozen machines. >> >> All of this appears to put us well outside CircleCI's offering to >> open-source projects. Of course, it may be worth asking whether they are >> willing to extend GHC a more generous offer. However, I don't think we >> can count on this and I'm not certain that Haskell.org is currently in a >> position to be able to shoulder such a financial burden. > > Mathieu has indicated that Tweag would be willing to contribute towards those costs. (Developer time, such as yours, is so much more expensive than these subscription costs that it’ll always be more efficient to outsource to CI companies.) > >>> Also, Jonas could help us getting things running and, I think, his >>> wealth of experience would be very useful. (At least, I would be very >>> grateful for his advise.) >>> >>> I think, this route has the potential to get us to where we want to be >>> quite quickly and in a manner that is very little effort to maintain >>> once set up. What do you think? >>> >> Indeed I can see that there are many advantages to the CircleCI option. >> The ease of bringing up a Linux/amd64 build environment which easily >> scales and requires no administration is quite enticing. However, I am a >> skeptical that it will be as easy to get the full suite of builds that >> we are aiming to produce. I would be quite curious to see what Jonas has >> to say on the matter of non-Linux platforms. Seeing a simple >> configuration which compiles and tests even a FreeBSD/amd64 build in a >> reasonable amout of time may well be enough to convince me. > > Ok, fair enough, let’s look at exactly how hard this is. > > Cheers, > Manuel > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Wed Oct 11 01:09:44 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Wed, 11 Oct 2017 12:09:44 +1100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87a80zm2in.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: > 11.10.2017, 02:44 Ben Gamari : > > However, to move back to the point of cross-platform builds: My concern > is that I'm not yet convinced that we have a viable plan for extending a > hosted solution for non-Linux/amd64 environments. The numbers I have > seen suggest that one incurs more than a 50% performance hit over even > slow ARM hardware in moving to virtualisation. Even dynamic translation > of amd64 on amd64 incurs a significant hit (which is the environment we > would need for a non-Linux operating system on amd64). > > In my mind this is a concern regardless of whether FreeBSD is Tier 1 or > not. If someone were to step up to maintain FreeBSD, or any other > non-Linux/amd64 platform, the day after we adopt CircleCI, what would we > tell them? It seems to me the response may very well be "sorry, we would > love to support you but our CI infrastructure isn't up the task." This > gives me pause. > > I am here to be convinced, however. Firstly, as a general comment IMHO a solution that covers the most important platforms soon, reliably, and with little maintenance from our side is better than a perfectly flexible most general solution later, which suffers from the usual DIY flakiness and costs effort on an ongoing basis. Secondly, the real problem here a fringe platforms, such as FreeBSD. There are CI solutions for Android and iOS, of course. In other words, ARM is not the problem. ** The problem are platforms used by so few people that no CI provider offers a solution for it. ** These are exactly the platforms that are also used by only very few GHC users. So, let me be blunt here: if we expend time, effort, and money (all of us are getting paid to do this, I think) on creating and maintaining a maximally general solution whose benefit is reaped by a very small fraction of the GHC users base, we do 99% of GHC users a disservice. I think, this is wrong. With the CircleCI conf created by Mathieu and the build script you linked to, we should be able to have something running pretty quickly. This seems like the quickest way to get any results to me. In any case, Mathieu is right, we should write up the requirements. I’ll take a stab at that tomorrow. Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From facundo.dominguez at tweag.io Wed Oct 11 10:57:18 2017 From: facundo.dominguez at tweag.io (=?UTF-8?Q?Facundo_Dom=C3=ADnguez?=) Date: Wed, 11 Oct 2017 07:57:18 -0300 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: > If someone were to step up to maintain FreeBSD, or any other non-Linux/amd64 platform, the day after we adopt CircleCI, what would we tell them? There is the possibility to have someone contribute a machine where the CI boxes can build the code and run the tests remotely. Facundo On Tue, Oct 10, 2017 at 10:09 PM, Manuel M T Chakravarty wrote: > 11.10.2017, 02:44 Ben Gamari : > > However, to move back to the point of cross-platform builds: My concern > is that I'm not yet convinced that we have a viable plan for extending a > hosted solution for non-Linux/amd64 environments. The numbers I have > seen suggest that one incurs more than a 50% performance hit over even > slow ARM hardware in moving to virtualisation. Even dynamic translation > of amd64 on amd64 incurs a significant hit (which is the environment we > would need for a non-Linux operating system on amd64). > > In my mind this is a concern regardless of whether FreeBSD is Tier 1 or > not. If someone were to step up to maintain FreeBSD, or any other > non-Linux/amd64 platform, the day after we adopt CircleCI, what would we > tell them? It seems to me the response may very well be "sorry, we would > love to support you but our CI infrastructure isn't up the task." This > gives me pause. > > I am here to be convinced, however. > > > Firstly, as a general comment IMHO a solution that covers the most important > platforms soon, reliably, and with little maintenance from our side is > better than a perfectly flexible most general solution later, which suffers > from the usual DIY flakiness and costs effort on an ongoing basis. > > Secondly, the real problem here a fringe platforms, such as FreeBSD. There > are CI solutions for Android and iOS, of course. In other words, ARM is not > the problem. > > ** The problem are platforms used by so few people that no CI provider > offers a solution for it. ** > > These are exactly the platforms that are also used by only very few GHC > users. > > So, let me be blunt here: if we expend time, effort, and money (all of us > are getting paid to do this, I think) on creating and maintaining a > maximally general solution whose benefit is reaped by a very small fraction > of the GHC users base, we do 99% of GHC users a disservice. I think, this is > wrong. > > With the CircleCI conf created by Mathieu and the build script you linked > to, we should be able to have something running pretty quickly. This seems > like the quickest way to get any results to me. > > In any case, Mathieu is right, we should write up the requirements. I’ll > take a stab at that tomorrow. > > Cheers, > Manuel > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From marlowsd at gmail.com Wed Oct 11 11:16:06 2017 From: marlowsd at gmail.com (Simon Marlow) Date: Wed, 11 Oct 2017 12:16:06 +0100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: On 11 October 2017 at 02:09, Manuel M T Chakravarty < manuel.chakravarty at tweag.io> wrote: > 11.10.2017, 02:44 Ben Gamari : > > However, to move back to the point of cross-platform builds: My concern > is that I'm not yet convinced that we have a viable plan for extending a > hosted solution for non-Linux/amd64 environments. The numbers I have > seen suggest that one incurs more than a 50% performance hit over even > slow ARM hardware in moving to virtualisation. Even dynamic translation > of amd64 on amd64 incurs a significant hit (which is the environment we > would need for a non-Linux operating system on amd64). > > In my mind this is a concern regardless of whether FreeBSD is Tier 1 or > not. If someone were to step up to maintain FreeBSD, or any other > non-Linux/amd64 platform, the day after we adopt CircleCI, what would we > tell them? It seems to me the response may very well be "sorry, we would > love to support you but our CI infrastructure isn't up the task." This > gives me pause. > > I am here to be convinced, however. > > > Firstly, as a general comment IMHO a solution that covers the most > important platforms soon, reliably, and with little maintenance from our > side is better than a perfectly flexible most general solution later, which > suffers from the usual DIY flakiness and costs effort on an ongoing basis. > > Secondly, the real problem here a fringe platforms, such as FreeBSD. There > are CI solutions for Android and iOS, of course. In other words, ARM is not > the problem. > > ** The problem are platforms used by so few people that no CI provider > offers a solution for it. ** > > These are exactly the platforms that are also used by only very few GHC > users. > > So, let me be blunt here: if we expend time, effort, and money (all of us > are getting paid to do this, I think) on creating and maintaining a > maximally general solution whose benefit is reaped by a very small fraction > of the GHC users base, we do 99% of GHC users a disservice. I think, this > is wrong. > > With the CircleCI conf created by Mathieu and the build script you linked > to, we should be able to have something running pretty quickly. This seems > like the quickest way to get any results to me. > > In any case, Mathieu is right, we should write up the requirements. I’ll > take a stab at that tomorrow. > > I'm a little worried about having multiple different CI solutions to cover the platforms we need. The proposal is already to use AppVeyor to cover Windows, so is it possible to have any kind of unification in the ways that the different CI mechanisms communicate their results? i.e. will we get a single type of "build failed" email with aggregate results, a single build results UI in GitHub, and so on, or will these all be different? Cheers Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From m at tweag.io Wed Oct 11 11:32:34 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Wed, 11 Oct 2017 13:32:34 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Simon, > I'm a little worried about having multiple different CI solutions to cover > the platforms we need. The proposal is already to use AppVeyor to cover > Windows, so is it possible to have any kind of unification in the ways that > the different CI mechanisms communicate their results? i.e. will we get a > single type of "build failed" email with aggregate results, a single build > results UI in GitHub, and so on, or will these all be different? In principle what you have in the CI config is just the specification of the build environment, together with calls to entry points to each of the phases you care about (building GHC, running the test suite, copying binary distributions to some stable location), or indeed just a single call to a single entry point if you just want to build, test and prepare build artifacts in one go. So the entirety of the meat of each step should be factored out into one (or several) scripts common to all platforms. Here's an example CircleCI script that deals with two platforms (linux + cross-compile freebsd): https://github.com/tweag/ghc/blob/circleci-workflows/.circleci/config.yml As you can see, most of what it does is specifiy the build environment (which we have to do separately for each platform anyways). The calls to ./validate could be made shorter still by modifying ./validate. As for notifications, Appveyor will send its own email (though we can tell it not to of course). That's a tad unfortunate, I agree. But I would hope that's not a big hardship. At any rate failures are all presented in a unified way as annotations on the GitHub PR. From m at tweag.io Wed Oct 11 12:09:49 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Wed, 11 Oct 2017 14:09:49 +0200 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> Message-ID: I assume Ben meant to keep the list in CC on this one. Will reply shortly. -- Mathieu Boespflug Founder at http://tweag.io. On 10 October 2017 at 17:23, Ben Gamari wrote: > "Boespflug, Mathieu" writes: > >> Hi Ben, >> >> many thanks for your detailed and thoughtful reply. I won't myself >> address your points one by one (I expect Manuel will jump in), but I >> do want to ground the discussion with the following remarks: > > Oh no! Sorry for the increibly belated response, Mathieu. I somehow > overlooked this message. > >> * What are the requirements that the current Jenkins effort building >> towards? I seem to remember some page on the GHC wiki stating these >> and then comparing various alternatives, but I can't find it now, so >> maybe I dreamed it. The blog post [1] mentions alternatives but >> doesn't evaluate them, nor does it state the requirements. > > The requirements are briefly listed in see #13716. > > >> * A key requirement I think is not just that this kind of >> infrastructure should not take time to setup given scarce development >> resources, but more importantly that none of the maintenance be >> bottlenecked on a single person managing a custom fleet of machines >> whose state cannot be reproduced. > > Of course, it goes without saying that the state of the builders should > certainly be reproducible. > >> * Better yet if anyone that forks GHC (with a single click on GitHub) >> gets a local copy of the CI by the same token, which can then be >> modified at will. >> >> * If we can get very quick wins today for at least 3 of the 4 "Tier 1" >> platforms, that's already a step forward and we can work on the rest >> later, just like Rust has (see below). >> > My thought here is that a solution that doesn't allow code to be > tested on real target hardware isn't particularly fit to test a > compiler. Qemu is neither fast nor bug-free; the GCC project uses qemu > for their nightly builds and they have been forced to resign themselves > to ignoring entire classes of failures which are only attributable to > qemu bugs. This is something that I would like to avoid for our primary > CI system if possible. > >> I'll copy here an experience report [2] from the Rust infra authors >> from before they switched to a Travis CI backed solution: >> >>> * Our buildbot-based CI / release infrastructure cannot be maintained >>> by community members, is generally bottlenecked on Alex and myself. >> >> Sounds like this applies equally to the current Harbourmaster setup. >> Perhaps to the Jenkins based one also? >> > In the future I imagine that the devops group will also have some > administrative authority over the CI infrastructure. But currently this > is quite true: our CI infrastructure is very much bottlenecked on me and > indeed can at times suffer as a consequence. > >>> * Our buildbot configuration has reliability issues, particularly around >>> managing dynamic EC2 instances. >> >> Sounds familiar. Is any OS X automated testing happening at this >> point? I heard some time befor ICFP that one or both of the OS X build >> bots had fallen off the edge of the Internet. >> > To clarify: the OS X builder (we have only one) has only been down for a > single weekend in the roughly two years that we have been using it; the > outage was due to scheduled network maintenance at the facility that > housed it. It just so happens that this was the weekend before ICFP. > >>> * Our nightly builds sometimes fail for reasons not caught during CI and >>> are down for multiple days. >> >> This matches my experience when adding CircleCI support: the tip of >> the master branch at the time had failing tests. >> > Indeed, this is a real problem and something which I have been hoping to > solve with our CI reboot. Currently we test individual differentials via > Harbormaster and I do local integration testing when I merge them. > However, this does not mean that things won't break on other platforms > after merge. > > Ideally we would do pre-merge integration testing in all of our CI > environments before a given commit becomes `master`. This is the sort of > thing that Jenkins will solve. > >>> * Packaging Rust for distribution is overly complex, involving >>> many systems and source repositories. >> >> Yup. But admittedly this is an orthogonal issue. >> >>> * The beta and stable branches do not run the test suite today. >>> With the volume of beta backports each release receives this is >>> a freightening situation. >> >> I assume this is not the case for us. But it's unclear where I'd look >> to find a declarative description of what's going on for each branch? >> Can each branch define their own way to perform CI? >> > All CI currently is currently performed via a single set of Harbormaster > build plans, regardless of branch. See [1]. Indeed the user can't easily > change this configuration, although this changes in Jenkins where the > pipeline configuration is in the repository. > > > [1] https://phabricator.haskell.org/harbormaster/plan/ > >>> * As certain core Rust tools mature we want to deliver them as part of >>> the Rust distribution, and this is difficult to do within the >>> current infrastructure / build system design. Distributing >>> additional tools with Rust is particularly crucial for those >>> intimately tied to compiler internals, like the RLS and clippy. >> >> Also a familiar situation, though again an orthogonal issue. >> >> So it sounds like at this cross road we've been seeing a lot of the >> same things the Rust team has experienced. The jurisprudence they've >> established here is pretty strong. If we want to address the very same >> problems then we need: >> >> 1. Reproducible cloud instances that are created/destroyed on-demand, >> and whose state doesn't drift over time. That way, no problems with >> build bots that eventually disappear. >> > Indeed; but CircleCI/Travis are not the only solution which enable this > sort of reproducibility. This same sort of thing can be achieved in > Jenkins as well. > >> 2. A declarative description of the *entire infrastructure and test >> environment*, for each target platform, so that it can be replicated >> by anyone who wants to so, in a single command. That way we're not >> blocked on any single person to make changes to it. >> > Yes, Jenkins also provides this [2]. > >> I believe reusing existing managed CI solutions. But let's discuss. >> Just know that we'd be happy to contribute towards any paid >> subscription necessary. So that shouldn't be a barrier. >> > That is good to know; however I think we first make sure that the > contributions that we have will amount to what is needed to make this > idea fly before taking the plunge. > > > To be clear, I only grudgingly find myself advocating for Jenkins; it is > in many ways terrible to work with. Furthermore, I'll be the first to > admit that the administration that it requires does carry a very real > cost. However, I think we should be careful to distinguish the > accidental complexity imposed by Jenkins from the intrinsic complexity > of testing a large project like GHC. For better or worse much of the > effort that has gone into setting up Jenkins thusfar hasn't actually > been Jenkins-specific; rather it's been adapting GHC to be amenable to > the sort of end-to-end testing that we want and fixing bugs when I find > them. > > I fear that in moving to a hosted solution in place of our own > infrastructure we incur a different set of no-less-significant, > > * we fragment our testing infrastructure since now we need at least > CircleCI and Appveyor > > * we preclude proper testing of non-Linux/amd64 environments > > * as a substitute for proper bare-metal testing of these platforms we > instead have to write, administer, and pay for the inefficiency of > emulation-based testing > > * we lose the ability to prioritize jobs to use our resources > effectively (e.g. prioritizing patch validating over commit validation) > > As with most things in life, this is a trade-off. I'm still quite > undecided whether it's a worthwhile trade-off, but at the moment I > remain a bit skeptical. However, as I said earlier, if we can > demonstrate that it is possible to test non-Linux platforms reliably and > efficiently, then that certainly helps convince me. > > Cheers, > > - Ben > > > [2] https://github.com/bgamari/ghc/blob/wip/jenkins/Jenkinsfile From m at tweag.io Wed Oct 11 12:15:48 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Wed, 11 Oct 2017 14:15:48 +0200 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Ben, On 10 October 2017 at 17:23, Ben Gamari wrote: > > [...] > >> * What are the requirements that the current Jenkins effort building >> towards? I seem to remember some page on the GHC wiki stating these >> and then comparing various alternatives, but I can't find it now, so >> maybe I dreamed it. The blog post [1] mentions alternatives but >> doesn't evaluate them, nor does it state the requirements. > > The requirements are briefly listed in see #13716. Thanks for the pointer. Let's merge those with the list you provide in another email. I think a few more regarding the following topics need to be added to that list: * infrastructure reproducibility (easy to reproduce build environments and results) * infrastructure forkability (easy for others to fork the infra, test it the changes and then submit a pull request) * security (who has access, who can build etc) * one you mention in your email: prioritization? (run tests for some platforms first) Most important requirement: low maintenance overhead. >> * If we can get very quick wins today for at least 3 of the 4 "Tier 1" >> platforms, that's already a step forward and we can work on the rest >> later, just like Rust has (see below). >> > My thought here is that a solution that doesn't allow code to be > tested on real target hardware isn't particularly fit to test a > compiler. Qemu is neither fast nor bug-free; the GCC project uses qemu > for their nightly builds and they have been forced to resign themselves > to ignoring entire classes of failures which are only attributable to > qemu bugs. This is something that I would like to avoid for our primary > CI system if possible. References would be appreciated. My thoughts: - Emulation for non x86 targets will be required anyways as it is (unless you have an iPhone lying around 247/7 as a build bot), if we are to include testing on them as part of CI. - The needs of GHC are not those of GCC (we are targeting far fewer platforms, with a simpler NCG). - Emulation hasn't prevented Rust (also a compiler) from being tested successfully on far more platforms than we are be going to be targeting anytime soon. Either because possibly QEMU works just fine, or they're not fiddling with the NCG on a daily basis (since they outsource that to LLVM). > Ideally we would do pre-merge integration testing in all of our CI > environments before a given commit becomes `master`. This is the sort of > thing that Jenkins will solve. There is an important security issue here. If you build PR's from any spontaneous contributor on the Internet (as you should), then you should only do so in a sandboxed environment. But Jenkins does not give you that out-of-the-box. Without any sandboxing it's not reasonable to let users run arbitrary code on the CI server, i.e. the very same server on which later that day or many months later, a release binary distribution will be cut and sent out to thousands of users to install... It's possible to add in other technologies into the mix to sandbox each Jenkins build (we've done it, and it took us a fair amount of time, and even then, not with the same security requirements). But by then, you've reinvented half of TravisCI/CircleCI/Appveyor/etc. Best to outsource this security aspect to providers that are *paid* by thousands of companies to get it right, I think. >>> * The beta and stable branches do not run the test suite today. >>> With the volume of beta backports each release receives this is >>> a freightening situation. >> >> I assume this is not the case for us. But it's unclear where I'd look >> to find a declarative description of what's going on for each branch? >> Can each branch define their own way to perform CI? >> > All CI currently is currently performed via a single set of Harbormaster > build plans, regardless of branch. See [1]. Indeed the user can't easily > change this configuration, although this changes in Jenkins where the > pipeline configuration is in the repository. Cool. >> 1. Reproducible cloud instances that are created/destroyed on-demand, >> and whose state doesn't drift over time. That way, no problems with >> build bots that eventually disappear. >> > Indeed; but CircleCI/Travis are not the only solution which enable this > sort of reproducibility. This same sort of thing can be achieved in > Jenkins as well. True. Through mechanisms orthogonal to Jenkins. One can mitigate build drone configurations drift and get some reproducibility using configuration management tools (Ansible, SaltStack etc). Or via Dockerfiles. Or via OS images. It's just more work. > For better or worse much of the > effort that has gone into setting up Jenkins thusfar hasn't actually > been Jenkins-specific; rather it's been adapting GHC to be amenable to > the sort of end-to-end testing that we want and fixing bugs when I find > them. Great! That's as I expected: we ought to be able to reuse a lot of existing work no matter the CI driver. :) > As with most things in life, this is a trade-off. I'm still quite > undecided whether it's a worthwhile trade-off, but at the moment I > remain a bit skeptical. However, as I said earlier, if we can > demonstrate that it is possible to test non-Linux platforms reliably and > efficiently, then that certainly helps convince me. Not that I think it's worth investing much time on this just yet (see Manuel's earlier comment), but here's a screenshot of FreeBSD running inside QEMU inside a Docker container on CircleCI: https://imgur.com/a/3YRXs -- Mathieu Boespflug Founder at http://tweag.io. On 11 October 2017 at 14:09, Boespflug, Mathieu wrote: > I assume Ben meant to keep the list in CC on this one. Will reply shortly. > -- > Mathieu Boespflug > Founder at http://tweag.io. > > > On 10 October 2017 at 17:23, Ben Gamari wrote: >> "Boespflug, Mathieu" writes: >> >>> Hi Ben, >>> >>> many thanks for your detailed and thoughtful reply. I won't myself >>> address your points one by one (I expect Manuel will jump in), but I >>> do want to ground the discussion with the following remarks: >> >> Oh no! Sorry for the increibly belated response, Mathieu. I somehow >> overlooked this message. >> >>> * What are the requirements that the current Jenkins effort building >>> towards? I seem to remember some page on the GHC wiki stating these >>> and then comparing various alternatives, but I can't find it now, so >>> maybe I dreamed it. The blog post [1] mentions alternatives but >>> doesn't evaluate them, nor does it state the requirements. >> >> The requirements are briefly listed in see #13716. >> >> >>> * A key requirement I think is not just that this kind of >>> infrastructure should not take time to setup given scarce development >>> resources, but more importantly that none of the maintenance be >>> bottlenecked on a single person managing a custom fleet of machines >>> whose state cannot be reproduced. >> >> Of course, it goes without saying that the state of the builders should >> certainly be reproducible. >> >>> * Better yet if anyone that forks GHC (with a single click on GitHub) >>> gets a local copy of the CI by the same token, which can then be >>> modified at will. >>> >>> * If we can get very quick wins today for at least 3 of the 4 "Tier 1" >>> platforms, that's already a step forward and we can work on the rest >>> later, just like Rust has (see below). >>> >> My thought here is that a solution that doesn't allow code to be >> tested on real target hardware isn't particularly fit to test a >> compiler. Qemu is neither fast nor bug-free; the GCC project uses qemu >> for their nightly builds and they have been forced to resign themselves >> to ignoring entire classes of failures which are only attributable to >> qemu bugs. This is something that I would like to avoid for our primary >> CI system if possible. >> >>> I'll copy here an experience report [2] from the Rust infra authors >>> from before they switched to a Travis CI backed solution: >>> >>>> * Our buildbot-based CI / release infrastructure cannot be maintained >>>> by community members, is generally bottlenecked on Alex and myself. >>> >>> Sounds like this applies equally to the current Harbourmaster setup. >>> Perhaps to the Jenkins based one also? >>> >> In the future I imagine that the devops group will also have some >> administrative authority over the CI infrastructure. But currently this >> is quite true: our CI infrastructure is very much bottlenecked on me and >> indeed can at times suffer as a consequence. >> >>>> * Our buildbot configuration has reliability issues, particularly around >>>> managing dynamic EC2 instances. >>> >>> Sounds familiar. Is any OS X automated testing happening at this >>> point? I heard some time befor ICFP that one or both of the OS X build >>> bots had fallen off the edge of the Internet. >>> >> To clarify: the OS X builder (we have only one) has only been down for a >> single weekend in the roughly two years that we have been using it; the >> outage was due to scheduled network maintenance at the facility that >> housed it. It just so happens that this was the weekend before ICFP. >> >>>> * Our nightly builds sometimes fail for reasons not caught during CI and >>>> are down for multiple days. >>> >>> This matches my experience when adding CircleCI support: the tip of >>> the master branch at the time had failing tests. >>> >> Indeed, this is a real problem and something which I have been hoping to >> solve with our CI reboot. Currently we test individual differentials via >> Harbormaster and I do local integration testing when I merge them. >> However, this does not mean that things won't break on other platforms >> after merge. >> >> Ideally we would do pre-merge integration testing in all of our CI >> environments before a given commit becomes `master`. This is the sort of >> thing that Jenkins will solve. >> >>>> * Packaging Rust for distribution is overly complex, involving >>>> many systems and source repositories. >>> >>> Yup. But admittedly this is an orthogonal issue. >>> >>>> * The beta and stable branches do not run the test suite today. >>>> With the volume of beta backports each release receives this is >>>> a freightening situation. >>> >>> I assume this is not the case for us. But it's unclear where I'd look >>> to find a declarative description of what's going on for each branch? >>> Can each branch define their own way to perform CI? >>> >> All CI currently is currently performed via a single set of Harbormaster >> build plans, regardless of branch. See [1]. Indeed the user can't easily >> change this configuration, although this changes in Jenkins where the >> pipeline configuration is in the repository. >> >> >> [1] https://phabricator.haskell.org/harbormaster/plan/ >> >>>> * As certain core Rust tools mature we want to deliver them as part of >>>> the Rust distribution, and this is difficult to do within the >>>> current infrastructure / build system design. Distributing >>>> additional tools with Rust is particularly crucial for those >>>> intimately tied to compiler internals, like the RLS and clippy. >>> >>> Also a familiar situation, though again an orthogonal issue. >>> >>> So it sounds like at this cross road we've been seeing a lot of the >>> same things the Rust team has experienced. The jurisprudence they've >>> established here is pretty strong. If we want to address the very same >>> problems then we need: >>> >>> 1. Reproducible cloud instances that are created/destroyed on-demand, >>> and whose state doesn't drift over time. That way, no problems with >>> build bots that eventually disappear. >>> >> Indeed; but CircleCI/Travis are not the only solution which enable this >> sort of reproducibility. This same sort of thing can be achieved in >> Jenkins as well. >> >>> 2. A declarative description of the *entire infrastructure and test >>> environment*, for each target platform, so that it can be replicated >>> by anyone who wants to so, in a single command. That way we're not >>> blocked on any single person to make changes to it. >>> >> Yes, Jenkins also provides this [2]. >> >>> I believe reusing existing managed CI solutions. But let's discuss. >>> Just know that we'd be happy to contribute towards any paid >>> subscription necessary. So that shouldn't be a barrier. >>> >> That is good to know; however I think we first make sure that the >> contributions that we have will amount to what is needed to make this >> idea fly before taking the plunge. >> >> >> To be clear, I only grudgingly find myself advocating for Jenkins; it is >> in many ways terrible to work with. Furthermore, I'll be the first to >> admit that the administration that it requires does carry a very real >> cost. However, I think we should be careful to distinguish the >> accidental complexity imposed by Jenkins from the intrinsic complexity >> of testing a large project like GHC. For better or worse much of the >> effort that has gone into setting up Jenkins thusfar hasn't actually >> been Jenkins-specific; rather it's been adapting GHC to be amenable to >> the sort of end-to-end testing that we want and fixing bugs when I find >> them. >> >> I fear that in moving to a hosted solution in place of our own >> infrastructure we incur a different set of no-less-significant, >> >> * we fragment our testing infrastructure since now we need at least >> CircleCI and Appveyor >> >> * we preclude proper testing of non-Linux/amd64 environments >> >> * as a substitute for proper bare-metal testing of these platforms we >> instead have to write, administer, and pay for the inefficiency of >> emulation-based testing >> >> * we lose the ability to prioritize jobs to use our resources >> effectively (e.g. prioritizing patch validating over commit validation) >> >> As with most things in life, this is a trade-off. I'm still quite >> undecided whether it's a worthwhile trade-off, but at the moment I >> remain a bit skeptical. However, as I said earlier, if we can >> demonstrate that it is possible to test non-Linux platforms reliably and >> efficiently, then that certainly helps convince me. >> >> Cheers, >> >> - Ben >> >> >> [2] https://github.com/bgamari/ghc/blob/wip/jenkins/Jenkinsfile From ben at well-typed.com Wed Oct 11 15:01:18 2017 From: ben at well-typed.com (Ben Gamari) Date: Wed, 11 Oct 2017 11:01:18 -0400 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> Message-ID: <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Hi Ben, > > On 10 October 2017 at 17:23, Ben Gamari wrote: >> >> [...] >> >> The requirements are briefly listed in see #13716. > > Thanks for the pointer. Let's merge those with the list you provide in > another email. I think a few more regarding the following topics need > to be added to that list: > > * infrastructure reproducibility (easy to reproduce build environments > and results) > * infrastructure forkability (easy for others to fork the infra, test > it the changes and then submit a pull request) > * security (who has access, who can build etc) > * one you mention in your email: prioritization? (run tests for some > platforms first) > > Most important requirement: low maintenance overhead. Yes, these all sound like perfectly reasonable goals. >> My thought here is that a solution that doesn't allow code to be >> tested on real target hardware isn't particularly fit to test a >> compiler. Qemu is neither fast nor bug-free; the GCC project uses qemu >> for their nightly builds and they have been forced to resign themselves >> to ignoring entire classes of failures which are only attributable to >> qemu bugs. This is something that I would like to avoid for our primary >> CI system if possible. > > References would be appreciated. My thoughts: > I'm afraid I can't provide a public reference for the GCC experience; however, I can say that the source is an ARM employee who works full time on GCC. Regardless, it's not hard to find infelicities in qemu's dynamic translation layer, even in the quite "mature" x86 implementation. This isn't surprising; faithfully emulating an entire CPU architecture, memory model, and support peripherals is a quite nontrivial task. Just a quick glance through the currently open tickets reveals * https://bugs.launchpad.net/qemu/+bug/645662 * https://bugs.launchpad.net/qemu/+bug/1098729 * https://bugs.launchpad.net/qemu/+bug/902413 * https://bugs.launchpad.net/qemu/+bug/1226531 > - Emulation for non x86 targets will be required anyways as it is > (unless you have an iPhone lying around 247/7 as a build bot), if we > are to include testing on them as part of CI. There are a variety of people in the GHC community who have access to such hardware. Furthermore, programs like the OSU OSL are actively looking for open-source projects to support. Finally, if all else fails this sort of hardware is easily procured via a variety of VPS providers. > - The needs of GHC are not those of GCC (we are targeting far fewer > platforms, with a simpler NCG). > - Emulation hasn't prevented Rust (also a compiler) from being tested > successfully on far more platforms than we are be going to be > targeting anytime soon. Either because possibly QEMU works just fine, > or they're not fiddling with the NCG on a daily basis (since they > outsource that to LLVM). > Well, as GHC is not GCC, GHC is also not Rust. Rust has the advantage of having a strong cross-compilation story, and a testsuite which was designed to make this usage easy. GHC is behind rust in both of these areas. Yesterday I discussed this with two core members of the Rust infrastructure team; who explicitly said that (paraphrasing, albeit closely, as I didn't ask permission to quote him at the time), * making CI under qemu fast is nontrivial; their testing strategy of running the compiler on the host and running only the tests themselves on the target is critical to making the approach scale * when issues occur, debugging issues inside the emulator has proven to be quite difficult >> Ideally we would do pre-merge integration testing in all of our CI >> environments before a given commit becomes `master`. This is the sort of >> thing that Jenkins will solve. > > There is an important security issue here. If you build PR's from any > spontaneous contributor on the Internet (as you should), then you > should only do so in a sandboxed environment. But Jenkins does not > give you that out-of-the-box. Without any sandboxing it's not > reasonable to let users run arbitrary code on the CI server, i.e. the > very same server on which later that day or many months later, a > release binary distribution will be cut and sent out to thousands of > users to install... > Absolutely; this is indeed something I'm currently quite uncomfortable with under the current Harbormaster scheme. You are quite right that this problem becomes much thornier once we are building release artifacts on these machines as well. > It's possible to add in other technologies into the mix to sandbox > each Jenkins build (we've done it, and it took us a fair amount of > time, and even then, not with the same security requirements). But by > then, you've reinvented half of TravisCI/CircleCI/Appveyor/etc. > > Best to outsource this security aspect to providers that are *paid* by > thousands of companies to get it right, I think. > Indeed this is a fair point. In order to keep complexity at bay my plan in the Jenkins infrastructure was to simply spin up new instances for releases (using automation, of course). You are quite right that a general solution of this problem is quite hard to get right and Jenkins offers very little help in this area. This is one area where hosted services win hands down. [...] >>> 1. Reproducible cloud instances that are created/destroyed on-demand, >>> and whose state doesn't drift over time. That way, no problems with >>> build bots that eventually disappear. >>> >> Indeed; but CircleCI/Travis are not the only solution which enable this >> sort of reproducibility. This same sort of thing can be achieved in >> Jenkins as well. > > True. Through mechanisms orthogonal to Jenkins. One can mitigate build > drone configurations drift and get some reproducibility using > configuration management tools (Ansible, SaltStack etc). Or via > Dockerfiles. Or via OS images. It's just more work. > Yes, it is indeed more work. However, I would argue it is the only sane way to deploy Jenkins. >> For better or worse much of the effort that has gone into setting up >> Jenkins thusfar hasn't actually been Jenkins-specific; rather it's >> been adapting GHC to be amenable to the sort of end-to-end testing >> that we want and fixing bugs when I find them. > > Great! That's as I expected: we ought to be able to reuse a lot of > existing work no matter the CI driver. :) > Right, this work should be applicable regardless of which CI solution we use. >> As with most things in life, this is a trade-off. I'm still quite >> undecided whether it's a worthwhile trade-off, but at the moment I >> remain a bit skeptical. However, as I said earlier, if we can >> demonstrate that it is possible to test non-Linux platforms reliably and >> efficiently, then that certainly helps convince me. > > Not that I think it's worth investing much time on this just yet (see > Manuel's earlier comment), but here's a screenshot of FreeBSD running > inside QEMU inside a Docker container on CircleCI: > To be clear, I'm not claiming that it is impossible to run qemu inside CircleCI. I'm simply worried that it will be prohibitively slow. It would be nice to see some evidence that this is not the case before committing to this path, but I can understand if getting a minimal viable solution takes priority. At this point I'm fairly close to agreeing with you that CircleCI is the right path forward. My primary reservation continues to be non-Linux platforms. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 483 bytes Desc: not available URL: From m at tweag.io Wed Oct 11 16:13:22 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Wed, 11 Oct 2017 18:13:22 +0200 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87fubiy9ew.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Ben, On 11 October 2017 at 17:01, Ben Gamari wrote: > > [...] > > Well, as GHC is not GCC, GHC is also not Rust. Rust has the advantage of > having a strong cross-compilation story, and a testsuite which was > designed to make this usage easy. GHC is behind rust in both of these > areas. Arguably so, yes. My experience is that there are more or less shallow bugs that stand in the way of a good cross compilation story, but nothing we can't address in due time. My experiments with FreeBSD this weekend uncovered the following as-yet-unresolved issues (which I do still need to open tickets for): - ./validate --build-only does execute some some command compiled for the target, not the host. And therefore fails. This happens very late in the build process. - make binary-dist works, but for some reason the RPATH's for libHS*.so libraries don't get set the same as they do when being built natively (symptom: make install fails when running commands that dynamically load such libraries before they go to /usr/local/lib/). > Yesterday I discussed this with two core members of the Rust > infrastructure team; who explicitly said that (paraphrasing, albeit > closely, as I didn't ask permission to quote him at the time), > > * making CI under qemu fast is nontrivial; their testing strategy of > running the compiler on the host and running only the tests > themselves on the target is critical to making the approach scale > > * when issues occur, debugging issues inside the emulator has proven to > be quite difficult It's great to have first hand experience reports from the Rust folks. Thanks for having reached out to them already! I can very much believe that building a cross compiler first on the host saves quite a bit of time. > At this point I'm fairly close to agreeing with you that CircleCI is the > right path forward. My primary reservation continues to be non-Linux > platforms. Cool. :) I suggest pushing forward with drafting the list of requirements first. To make sure we identify anything important that may or not have come up in the discussion so far. And then Manuel will want to see a proposal put forth to the group. Given this current landscape, - there are 3 major desktop OS'es (Windows, macOS, Linux) - there are 2 major mobile OS'es (iOS, Android), I think we'll want something that works for the 3 major desktop platforms now (aka the current "Tier 1"). And maybe later consider how to best support the major mobile platforms, or indeed other desktop platforms if the maintainership resources are there. As Facundo mentions - in reality even the choice of how we run tests for the mobile platforms is independent of the CI driver. Remote drones are also *possible* using hosted CI services (at some complexity cost that might not outweigh those of straight up emulation). We've done that before with AWS drones. Best, Mathieu From ben at well-typed.com Wed Oct 11 17:03:19 2017 From: ben at well-typed.com (Ben Gamari) Date: Wed, 11 Oct 2017 13:03:19 -0400 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> Message-ID: <87d15tliqw.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Hi Ben, > > On 11 October 2017 at 17:01, Ben Gamari wrote: >> >> [...] >> >> Well, as GHC is not GCC, GHC is also not Rust. Rust has the advantage of >> having a strong cross-compilation story, and a testsuite which was >> designed to make this usage easy. GHC is behind rust in both of these >> areas. > > Arguably so, yes. My experience is that there are more or less shallow > bugs that stand in the way of a good cross compilation story, but > nothing we can't address in due time. My experiments with FreeBSD this > weekend uncovered the following as-yet-unresolved issues (which I do > still need to open tickets for): > Some are shallow; some are less so. For instance, Template Haskell is one of the larger issues at the moment. However, happily, it's possible Angerman will be able to fix this for 8.4 (see D3608). > - ./validate --build-only does execute some some command compiled for > the target, not the host. And therefore fails. This happens very late > in the build process. > - make binary-dist works, but for some reason the RPATH's for > libHS*.so libraries don't get set the same as they do when being built > natively (symptom: make install fails when running commands that > dynamically load such libraries before they go to /usr/local/lib/). > Thanks for pointing these out. Indeed having tickets for these would be quite helpful when you get a chance. Also, do note that Moritz has been doing work in this area. D4058 is somewhat relevant. [1] https://phabricator.haskell.org/D4058 [...] >> At this point I'm fairly close to agreeing with you that CircleCI is the >> right path forward. My primary reservation continues to be non-Linux >> platforms. > > Cool. :) I suggest pushing forward with drafting the list of > requirements first. To make sure we identify anything important that > may or not have come up in the discussion so far. And then Manuel will > want to see a proposal put forth to the group. Given this current > landscape, > > - there are 3 major desktop OS'es (Windows, macOS, Linux) > - there are 2 major mobile OS'es (iOS, Android), > > I think we'll want something that works for the 3 major desktop > platforms now (aka the current "Tier 1"). And maybe later consider how > to best support the major mobile platforms, or indeed other desktop > platforms if the maintainership resources are there. > > As Facundo mentions - in reality even the choice of how we run tests > for the mobile platforms is independent of the CI driver. Remote > drones are also *possible* using hosted CI services (at some > complexity cost that might not outweigh those of straight up > emulation). We've done that before with AWS drones. Manuel no doubt has something to add here, but for what it's worth I have generally written off testing GHC on Android/iOS. Everything I have heard of automated testing on these platforms make it sound absolutely terrible. Instead, I think it's significantly easier to just test a standard Linux distribution running on ARM and AArch64. When I have done work on ARM in the past this was how I tested (using any one of a number of inexpensive development boards). Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From manuel.chakravarty at tweag.io Thu Oct 12 03:06:57 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Thu, 12 Oct 2017 14:06:57 +1100 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: <87d15tliqw.fsf@ben-laptop.smart-cactus.org> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> Message-ID: > Ben Gamari : > "Boespflug, Mathieu" > writes: >>> At this point I'm fairly close to agreeing with you that CircleCI is the >>> right path forward. My primary reservation continues to be non-Linux >>> platforms. >> >> Cool. :) I suggest pushing forward with drafting the list of >> requirements first. To make sure we identify anything important that >> may or not have come up in the discussion so far. And then Manuel will >> want to see a proposal put forth to the group. Given this current >> landscape, >> >> - there are 3 major desktop OS'es (Windows, macOS, Linux) >> - there are 2 major mobile OS'es (iOS, Android), >> >> I think we'll want something that works for the 3 major desktop >> platforms now (aka the current "Tier 1"). And maybe later consider how >> to best support the major mobile platforms, or indeed other desktop >> platforms if the maintainership resources are there. >> >> As Facundo mentions - in reality even the choice of how we run tests >> for the mobile platforms is independent of the CI driver. Remote >> drones are also *possible* using hosted CI services (at some >> complexity cost that might not outweigh those of straight up >> emulation). We've done that before with AWS drones. > > Manuel no doubt has something to add here, but for what it's worth I > have generally written off testing GHC on Android/iOS. Everything I have > heard of automated testing on these platforms make it sound absolutely > terrible. > > Instead, I think it's significantly easier to just test a standard Linux > distribution running on ARM and AArch64. When I have done work on ARM in > the past this was how I tested (using any one of a number of inexpensive > development boards). I am not sure what’s hard about testing iOS…provided you are willing to go with a payed service. A popular one among iOS devs is Buildkite: https://buildkite.com/ Maybe Danny Greg’s choice quote from the Buildkite website should inspire us too ;) > I managed to set up our entire CI rig in one afternoon after battling for weeks with Xcode bots and Jenkins (and Danny, ex-GitHub, knows what he is doing) Manuel PS: Not that I want to further complicate this discussion, but Buildkite provides another alternative in the design space. Run your own build servers, but let somebody else do the work of organising running the builds (and pay them money in exchange). -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Thu Oct 12 06:13:25 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Thu, 12 Oct 2017 17:13:25 +1100 Subject: [GHC DevOps Group] CI Message-ID: As promised, I have taken a first cut at listing the requirements and the pros and cons of the main contenders on a Trac page: https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration Maybe I am biased, but is there any advantage to Jenkins other than that we can run builds and tests on exotic platforms? Manuel From ben at well-typed.com Thu Oct 12 11:23:49 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 07:23:49 -0400 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> Message-ID: <87tvz4k3sq.fsf@ben-laptop.smart-cactus.org> Facundo Domínguez writes: >> If someone were to step up to maintain FreeBSD, or any other non-Linux/amd64 platform, the day after we adopt CircleCI, what would we tell them? > > There is the possibility to have someone contribute a machine where > the CI boxes can build the code and run the tests remotely. > If that is true then that certainly makes for a much nicer story. However, I was under the impression that CircleCI doesn't allow for this sort of usage. Perhaps I am mistaken? Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From m at tweag.io Thu Oct 12 11:31:43 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 12 Oct 2017 13:31:43 +0200 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: <87tvz4k3sq.fsf@ben-laptop.smart-cactus.org> References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> <87tvz4k3sq.fsf@ben-laptop.smart-cactus.org> Message-ID: > However, I was under the impression that CircleCI doesn't allow for this sort of usage. Perhaps I am mistaken? In their ToS you mean? Not that I've seen. What we did was, spawn an AWS machine at the start of a job, start a watch dog timer for the machine to self destruct and then run commands remotely on the distant AWS machine from CircleCI via SSH. Very low-tech. Not particularly robust. There are other ways to do this, hopefully more robust. -- Mathieu Boespflug Founder at http://tweag.io. On 12 October 2017 at 13:23, Ben Gamari wrote: > Facundo Domínguez writes: > >>> If someone were to step up to maintain FreeBSD, or any other non-Linux/amd64 platform, the day after we adopt CircleCI, what would we tell them? >> >> There is the possibility to have someone contribute a machine where >> the CI boxes can build the code and run the tests remotely. >> > If that is true then that certainly makes for a much nicer story. > However, I was under the impression that CircleCI doesn't allow for this > sort of usage. Perhaps I am mistaken? > > Cheers, > > - Ben > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From ben at well-typed.com Thu Oct 12 12:04:51 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 08:04:51 -0400 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> Message-ID: <87po9sk1wc.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: >> Ben Gamari : >> >> Instead, I think it's significantly easier to just test a standard Linux >> distribution running on ARM and AArch64. When I have done work on ARM in >> the past this was how I tested (using any one of a number of inexpensive >> development boards). > > I am not sure what’s hard about testing iOS…provided you are willing > to go with a payed service. I see. Thanks for the reference! Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From ben at well-typed.com Thu Oct 12 13:18:24 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 09:18:24 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: References: Message-ID: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: > As promised, I have taken a first cut at listing the requirements and > the pros and cons of the main contenders on a Trac page: > > https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration > I think this list is being a bit generous to the hosted option. Other costs of this approach might include: * Under this heterogeneous scheme we will have to maintain two or more distinct CI systems, each requiring some degree of setup and maintenance. * Using qemu for building on/for a non-Linux/amd64 platforms requires a non-negligible amount of additional complexity (see rust's CI implementation [1]) * It's unclear whether testing GHC via qemu is even practical given computational constraints. * We lose the ability to prioritize jobs, requiring more hardware to maintain similar build turnaround * We are utterly dependent on our CI service(s) to behave well; for instance, here are two examples that the Rust infrastructure team related to me, * They have been struggling to keep Travis the tail of their build turnaround time distribution in check, with some builds taking over 8 hours to complete. Despite raising the issue with Travis customer support they are still having trouble, despite being a paying customer. * They have noticed that Travis has a tendency to simply drop builds in mid-flight, losing hours of work. Again, despite working with upstream they haven't been able to resolve the problem * They have been strongly affected by apparent instability in Travis' OS X infrastructure which goes down, to quote, "*a lot*" Of course, both of these are picking on Travis in particular as that is the example we have available. However, in general the message here is that by giving up our own infrastructure we are at the mercy of the services that we use. Unfortunately, sometimes those services are not accustomed to testing projects of the scale of GHC or rustc. At this point you have little recourse but to minimize the damage. We avoid all of this by self-hosting (at, of course, the expense of administration time). Furthermore, we continue to benefit from hardware provided by a multitude of sources including users, Rackspace (and other VPS providers if we wanted), and programs like OSU OSL. It is important to remember that until recently we were operating under the assumption that these were the only resources available to us for testing. It's still quite unclear to me what a CircleCI/Appveyor solution will ultimately cost, but will almost certainly not be free. Assuming there are users who are willing to foot that bill, this is of course fine. However, it's quite contrary to the assumptions we have been working with for much of this process. Lastly: If I understand the point correctly, the "the set up is not forkable" "con" of Jenkins is not accurate. Under Jenkins the build configuration resides in the repository being tested. A user can easily modify it and submit a PR, which will be tested just like any other change. [1] https://github.com/rust-lang/rust/tree/master/src/ci > Maybe I am biased, but is there any advantage to Jenkins other than > that we can run builds and tests on exotic platforms? Some of these "exotic" platforms might also be called "the most populous architecture in the world" (ARM), "the operating system that feeds a third of the world's Internet traffic (FreeBSD), and "the operating system that powers much of the world's financial system" (AIX). I'm not sure that the "exotic" label really does these platforms justice. More importantly, all of these platforms have contributors working on their support in GHC. Historically, GHC HQ has tried to recognize their efforts by allowing porters to submit binary distributions which are distributed alongside GHC HQ distributions. Recently I have tried to pursue a different model, handling some of these binary builds myself in the name of consistency and reduced release overhead (as previously we incurred a full round-trip through binary build contributors every time we released). The desire to scale our release process up to handle the breadth of platforms that GHC supports, with either Tier 1 or what is currently Tier 2 support, was one motivation for the new CI effort. While I don't consider testing any one of these platforms to be a primary goal, I do think it is important to have a viable plan by which they might be covered in the future for this reason. To be clear, I am supportive of the CI-as-a-service direction. However, I want to recognize the trade-offs where they exist and have answers to some of the thorny questions, including those surrounding platform support, before committing. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From m at tweag.io Thu Oct 12 16:01:39 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 12 Oct 2017 18:01:39 +0200 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: <87d15tliqw.fsf@ben-laptop.smart-cactus.org> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <87d16iwyrk.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Ben, On 11 October 2017 at 19:03, Ben Gamari wrote: > "Boespflug, Mathieu" writes: > >> Hi Ben, >> >> On 11 October 2017 at 17:01, Ben Gamari wrote: >>> >>> [...] >>> >>> Well, as GHC is not GCC, GHC is also not Rust. Rust has the advantage of >>> having a strong cross-compilation story, and a testsuite which was >>> designed to make this usage easy. GHC is behind rust in both of these >>> areas. >> >> Arguably so, yes. My experience is that there are more or less shallow >> bugs that stand in the way of a good cross compilation story, but >> nothing we can't address in due time. My experiments with FreeBSD this >> weekend uncovered the following as-yet-unresolved issues (which I do >> still need to open tickets for): >> > Some are shallow; some are less so. For instance, Template Haskell is > one of the larger issues at the moment. However, happily, it's possible > Angerman will be able to fix this for 8.4 (see D3608). Right. I was speaking to Moritz last week (I'm including him in CC), who forwarded the experience from Sergei Trofimovich on cross compiling GHC. He did mention this ongoing template-haskell work. But this doesn't apply to the specific thing we're trying to do, right? Since we're cross compiling GHC itself and then we'd be running GHC in the target environment? Best, Mathieu From ben at well-typed.com Thu Oct 12 16:15:44 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 12:15:44 -0400 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> Message-ID: <878tggjqa7.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Hi Ben, > > On 11 October 2017 at 19:03, Ben Gamari wrote: >> >> Some are shallow; some are less so. For instance, Template Haskell is >> one of the larger issues at the moment. However, happily, it's possible >> Angerman will be able to fix this for 8.4 (see D3608). > > Right. I was speaking to Moritz last week (I'm including him in CC), > who forwarded the experience from Sergei Trofimovich on cross > compiling GHC. He did mention this ongoing template-haskell work. But > this doesn't apply to the specific thing we're trying to do, right? > Since we're cross compiling GHC itself and then we'd be running GHC in > the target environment? > Depending upon how you approach it, perhaps not. I had assumed that you would want to do as much compilation as possible on the host and then run the resulting binaries on the target. As I understand it this is what the Rustaceans do wherever possible to make testing under qemu feasible. Of course, if you want to run the whole testsuite, including all compilation, under qemu, then naturally aren't affected by the TH issue. I would be very curious to know just how slow this is. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From gnezdo at google.com Thu Oct 12 16:25:32 2017 From: gnezdo at google.com (Greg Steuck (Sh-toy-k)) Date: Thu, 12 Oct 2017 16:25:32 +0000 Subject: [GHC DevOps Group] CI In-Reply-To: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: Thanks Manuel for putting together the requirements. Is there a way to make the choice more data-driven? I realize many things are very hard to estimate in advance, order of magnitude swags could still be informative. Off the cuff it would be useful to have the numbers of builds and their latencies, numbers of binary package downloads, prices of different options, expected human operations time, how long the solution is required to work before it is to be reconsidered. Given such numbers one could pose an optimization problem and compare the options. Thanks Greg On Thu, Oct 12, 2017 at 6:18 AM Ben Gamari wrote: > Manuel M T Chakravarty writes: > > > As promised, I have taken a first cut at listing the requirements and > > the pros and cons of the main contenders on a Trac page: > > > > https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration > > > I think this list is being a bit generous to the hosted option. > > Other costs of this approach might include: > > * Under this heterogeneous scheme we will have to maintain two or more > distinct CI systems, each requiring some degree of setup and > maintenance. > > * Using qemu for building on/for a non-Linux/amd64 platforms requires a > non-negligible amount of additional complexity (see rust's CI > implementation [1]) > > * It's unclear whether testing GHC via qemu is even practical given > computational constraints. > > * We lose the ability to prioritize jobs, requiring more hardware to > maintain similar build turnaround > > * We are utterly dependent on our CI service(s) to behave well; for > instance, here are two examples that the Rust infrastructure team > related to me, > > * They have been struggling to keep Travis the tail of their build > turnaround time distribution in check, with some builds taking > over 8 hours to complete. Despite raising the issue with Travis > customer support they are still having trouble, despite being a > paying customer. > > * They have noticed that Travis has a tendency to simply drop builds > in mid-flight, losing hours of work. Again, despite working with > upstream they haven't been able to resolve the problem > > * They have been strongly affected by apparent instability in > Travis' OS X infrastructure which goes down, to quote, "*a lot*" > > Of course, both of these are picking on Travis in particular as that > is the example we have available. However, in general the message > here is that by giving up our own infrastructure we are at the mercy > of the services that we use. Unfortunately, sometimes those services > are not accustomed to testing projects of the scale of GHC or rustc. > At this point you have little recourse but to minimize the damage. > > We avoid all of this by self-hosting (at, of course, the expense of > administration time). Furthermore, we continue to benefit from hardware > provided by a multitude of sources including users, Rackspace (and other > VPS providers if we wanted), and programs like OSU OSL. It is important > to remember that until recently we were operating under the assumption > that these were the only resources available to us for testing. > > It's still quite unclear to me what a CircleCI/Appveyor solution will > ultimately cost, but will almost certainly not be free. Assuming there > are users who are willing to foot that bill, this is of course fine. > However, it's quite contrary to the assumptions we have been working > with for much of this process. > > > Lastly: If I understand the point correctly, the "the set up is not > forkable" "con" of Jenkins is not accurate. Under Jenkins the build > configuration resides in the repository being tested. A user can easily > modify it and submit a PR, which will be tested just like any other > change. > > > [1] https://github.com/rust-lang/rust/tree/master/src/ci > > > > Maybe I am biased, but is there any advantage to Jenkins other than > > that we can run builds and tests on exotic platforms? > > Some of these "exotic" platforms might also be called "the most populous > architecture in the world" (ARM), "the operating system that feeds a > third of the world's Internet traffic (FreeBSD), and "the operating > system that powers much of the world's financial system" (AIX). I'm not > sure that the "exotic" label really does these platforms justice. > > More importantly, all of these platforms have contributors working on > their support in GHC. Historically, GHC HQ has tried to recognize their > efforts by allowing porters to submit binary distributions which are > distributed alongside GHC HQ distributions. Recently I have tried to > pursue a different model, handling some of these binary builds myself in > the name of consistency and reduced release overhead (as previously we > incurred a full round-trip through binary build contributors every time > we released). > > The desire to scale our release process up to handle the breadth of > platforms that GHC supports, with either Tier 1 or what is currently > Tier 2 support, was one motivation for the new CI effort. While I don't > consider testing any one of these platforms to be a primary goal, I do > think it is important to have a viable plan by which they might be > covered in the future for this reason. > > > To be clear, I am supportive of the CI-as-a-service direction. However, > I want to recognize the trade-offs where they exist and have answers to > some of the thorny questions, including those surrounding platform > support, before committing. > > Cheers, > > - Ben > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4843 bytes Desc: S/MIME Cryptographic Signature URL: From m at tweag.io Thu Oct 12 16:27:27 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 12 Oct 2017 18:27:27 +0200 Subject: [GHC DevOps Group] CI In-Reply-To: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: Some extra points by Jonas, that I'm just forwarding: - CircleCI & Appveyor Cons: dependent on third-party for feature development. - Funding: on one side we have free RackSpace servers, on the other side CircleCI might be willing to fund as well. - Incident report regarding Jenkins: even with sandboxing, the security record of Jenkins has been patchy. One of our clients (before we started managing the instance) ended up with a Bitcoin miner on their rig. The security flaw they exploited was this one: https://www.cvedetails.com/cve/CVE-2016-0792/. For our client, firewalling Jenkins was possible, but in this case anyone will want direct access to the build logs. These are comments about solutions though, not requirements. -- Mathieu Boespflug Founder at http://tweag.io. On 12 October 2017 at 15:18, Ben Gamari wrote: > Manuel M T Chakravarty writes: > >> As promised, I have taken a first cut at listing the requirements and >> the pros and cons of the main contenders on a Trac page: >> >> https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration >> > I think this list is being a bit generous to the hosted option. > > Other costs of this approach might include: > > * Under this heterogeneous scheme we will have to maintain two or more > distinct CI systems, each requiring some degree of setup and > maintenance. > > * Using qemu for building on/for a non-Linux/amd64 platforms requires a > non-negligible amount of additional complexity (see rust's CI > implementation [1]) > > * It's unclear whether testing GHC via qemu is even practical given > computational constraints. > > * We lose the ability to prioritize jobs, requiring more hardware to > maintain similar build turnaround > > * We are utterly dependent on our CI service(s) to behave well; for > instance, here are two examples that the Rust infrastructure team > related to me, > > * They have been struggling to keep Travis the tail of their build > turnaround time distribution in check, with some builds taking > over 8 hours to complete. Despite raising the issue with Travis > customer support they are still having trouble, despite being a > paying customer. > > * They have noticed that Travis has a tendency to simply drop builds > in mid-flight, losing hours of work. Again, despite working with > upstream they haven't been able to resolve the problem > > * They have been strongly affected by apparent instability in > Travis' OS X infrastructure which goes down, to quote, "*a lot*" > > Of course, both of these are picking on Travis in particular as that > is the example we have available. However, in general the message > here is that by giving up our own infrastructure we are at the mercy > of the services that we use. Unfortunately, sometimes those services > are not accustomed to testing projects of the scale of GHC or rustc. > At this point you have little recourse but to minimize the damage. > > We avoid all of this by self-hosting (at, of course, the expense of > administration time). Furthermore, we continue to benefit from hardware > provided by a multitude of sources including users, Rackspace (and other > VPS providers if we wanted), and programs like OSU OSL. It is important > to remember that until recently we were operating under the assumption > that these were the only resources available to us for testing. > > It's still quite unclear to me what a CircleCI/Appveyor solution will > ultimately cost, but will almost certainly not be free. Assuming there > are users who are willing to foot that bill, this is of course fine. > However, it's quite contrary to the assumptions we have been working > with for much of this process. > > > Lastly: If I understand the point correctly, the "the set up is not > forkable" "con" of Jenkins is not accurate. Under Jenkins the build > configuration resides in the repository being tested. A user can easily > modify it and submit a PR, which will be tested just like any other > change. > > > [1] https://github.com/rust-lang/rust/tree/master/src/ci > > >> Maybe I am biased, but is there any advantage to Jenkins other than >> that we can run builds and tests on exotic platforms? > > Some of these "exotic" platforms might also be called "the most populous > architecture in the world" (ARM), "the operating system that feeds a > third of the world's Internet traffic (FreeBSD), and "the operating > system that powers much of the world's financial system" (AIX). I'm not > sure that the "exotic" label really does these platforms justice. > > More importantly, all of these platforms have contributors working on > their support in GHC. Historically, GHC HQ has tried to recognize their > efforts by allowing porters to submit binary distributions which are > distributed alongside GHC HQ distributions. Recently I have tried to > pursue a different model, handling some of these binary builds myself in > the name of consistency and reduced release overhead (as previously we > incurred a full round-trip through binary build contributors every time > we released). > > The desire to scale our release process up to handle the breadth of > platforms that GHC supports, with either Tier 1 or what is currently > Tier 2 support, was one motivation for the new CI effort. While I don't > consider testing any one of these platforms to be a primary goal, I do > think it is important to have a viable plan by which they might be > covered in the future for this reason. > > > To be clear, I am supportive of the CI-as-a-service direction. However, > I want to recognize the trade-offs where they exist and have answers to > some of the thorny questions, including those surrounding platform > support, before committing. > > Cheers, > > - Ben > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From m at tweag.io Thu Oct 12 16:31:54 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 12 Oct 2017 18:31:54 +0200 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: <878tggjqa7.fsf@ben-laptop.smart-cactus.org> References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <0C224F82-EB38-49A3-A121-C66B6D6F8D0E@tweag.io> <1D196DC9-B9A1-462B-B688-7C3469D24EC5@tweag.io> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> <878tggjqa7.fsf@ben-laptop.smart-cactus.org> Message-ID: Oh, I was thinking of a middle ground where you build a target compiler, on the host, then build the test suite (not the compiler itself) from inside the target environment, using the compiler you built on the host, and run the test suite. -- Mathieu Boespflug Founder at http://tweag.io. On 12 October 2017 at 18:15, Ben Gamari wrote: > "Boespflug, Mathieu" writes: > >> Hi Ben, >> >> On 11 October 2017 at 19:03, Ben Gamari wrote: >>> >>> Some are shallow; some are less so. For instance, Template Haskell is >>> one of the larger issues at the moment. However, happily, it's possible >>> Angerman will be able to fix this for 8.4 (see D3608). >> >> Right. I was speaking to Moritz last week (I'm including him in CC), >> who forwarded the experience from Sergei Trofimovich on cross >> compiling GHC. He did mention this ongoing template-haskell work. But >> this doesn't apply to the specific thing we're trying to do, right? >> Since we're cross compiling GHC itself and then we'd be running GHC in >> the target environment? >> > Depending upon how you approach it, perhaps not. I had assumed > that you would want to do as much compilation as possible on the host > and then run the resulting binaries on the target. As I understand it > this is what the Rustaceans do wherever possible to make testing under > qemu feasible. > > Of course, if you want to run the whole testsuite, including all > compilation, under qemu, then naturally aren't affected by the TH issue. > I would be very curious to know just how slow this is. > > Cheers, > > - Ben From ben at well-typed.com Thu Oct 12 17:02:29 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 13:02:29 -0400 Subject: [GHC DevOps Group] DevOps: Next steps In-Reply-To: References: <87fubmzypd.fsf@ben-laptop.smart-cactus.org> <098E3F6A-5556-4E78-AF27-979CBF973060@tweag.io> <87mv5hv9ji.fsf@ben-laptop.smart-cactus.org> <87zi9ht63w.fsf@ben-laptop.smart-cactus.org> <3AD4D37A-65C3-46D1-A2DD-7D049EEEE0F2@tweag.io> <87wp4lt5t4.fsf@ben-laptop.smart-cactus.org> <658252EF-04F2-4DA3-BAF2-6270180BE5FA@tweag.io> <87efqirfb0.fsf@ben-laptop.smart-cactus.org> <87d15vm3hh.fsf@ben-laptop.smart-cactus.org> <87o9pdloe9.fsf@ben-laptop.smart-cactus.org> <87d15tliqw.fsf@ben-laptop.smart-cactus.org> <878tggjqa7.fsf@ben-laptop.smart-cactus.org> Message-ID: <87y3ogi9ju.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Oh, I was thinking of a middle ground where you build a target > compiler, on the host, then build the test suite (not the compiler > itself) from inside the target environment, using the compiler you > built on the host, and run the test suite. Yes, that is another approach. You are right that it would likely be a fair bit cheaper; I'd be curious to see how much. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From ben at well-typed.com Thu Oct 12 17:07:02 2017 From: ben at well-typed.com (Ben Gamari) Date: Thu, 12 Oct 2017 13:07:02 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: <87vajki9c9.fsf@ben-laptop.smart-cactus.org> "Boespflug, Mathieu" writes: > Some extra points by Jonas, that I'm just forwarding: > > - CircleCI & Appveyor Cons: dependent on third-party for feature development. > - Funding: on one side we have free RackSpace servers, on the other > side CircleCI might be willing to fund as well. Right, I think one of us should get in touch with CircleCI to see whether they would be willing to offer us an extension of their usual open-source offer. That would certainly change the equation a bit. Perhaps Manuel or I could do this? > - Incident report regarding Jenkins: even with sandboxing, the > security record of Jenkins has been patchy. This is very true. Since we started looking at Jenkins I've been subscribed to their security list. The number and severity of issues that I see announced on that list does indeed give me pause. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From m at tweag.io Thu Oct 12 17:22:52 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 12 Oct 2017 19:22:52 +0200 Subject: [GHC DevOps Group] CI In-Reply-To: <87vajki9c9.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87vajki9c9.fsf@ben-laptop.smart-cactus.org> Message-ID: On 12 October 2017 at 19:07, Ben Gamari wrote: > "Boespflug, Mathieu" writes: > >> Some extra points by Jonas, that I'm just forwarding: >> >> - CircleCI & Appveyor Cons: dependent on third-party for feature development. >> - Funding: on one side we have free RackSpace servers, on the other >> side CircleCI might be willing to fund as well. > > Right, I think one of us should get in touch with CircleCI to see > whether they would be willing to offer us an extension of their usual > open-source offer. That would certainly change the equation a bit. > Perhaps Manuel or I could do this? To Greg's point, if we have specific data that would justify a specific level of build job parallelism that we'd like to achieve, then that might be worth a shot, yes. From marlowsd at gmail.com Sun Oct 15 09:49:38 2017 From: marlowsd at gmail.com (Simon Marlow) Date: Sun, 15 Oct 2017 10:49:38 +0100 Subject: [GHC DevOps Group] FreeBSD in Tier 1 In-Reply-To: References: <87r2ucnz23.fsf@ben-laptop.smart-cactus.org> <87a80zm2in.fsf@ben-laptop.smart-cactus.org> <87tvz4k3sq.fsf@ben-laptop.smart-cactus.org> Message-ID: On 12 October 2017 at 12:31, Boespflug, Mathieu wrote: > > However, I was under the impression that CircleCI doesn't allow for this > sort of usage. Perhaps I am mistaken? > > In their ToS you mean? Not that I've seen. What we did was, spawn an > AWS machine at the start of a job, start a watch dog timer for the > machine to self destruct and then run commands remotely on the distant > AWS machine from CircleCI via SSH. Very low-tech. Not particularly > robust. There are other ways to do this, hopefully more robust. > So we could have CircleCI call Jenkins? :) Cheers Simon > -- > Mathieu Boespflug > Founder at http://tweag.io. > > > On 12 October 2017 at 13:23, Ben Gamari wrote: > > Facundo Domínguez writes: > > > >>> If someone were to step up to maintain FreeBSD, or any other > non-Linux/amd64 platform, the day after we adopt CircleCI, what would we > tell them? > >> > >> There is the possibility to have someone contribute a machine where > >> the CI boxes can build the code and run the tests remotely. > >> > > If that is true then that certainly makes for a much nicer story. > > However, I was under the impression that CircleCI doesn't allow for this > > sort of usage. Perhaps I am mistaken? > > > > Cheers, > > > > - Ben > > > > _______________________________________________ > > Ghc-devops-group mailing list > > Ghc-devops-group at haskell.org > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Mon Oct 16 04:32:56 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 16 Oct 2017 15:32:56 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87vajki9c9.fsf@ben-laptop.smart-cactus.org> Message-ID: <26DB94DD-BD45-453E-BB0C-5D287F8637AB@tweag.io> Ben in an earlier email had a rough estimate, which I added to the Trac page: https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration#Usageestimate > 13.10.2017, 04:22 Boespflug, Mathieu : > > On 12 October 2017 at 19:07, Ben Gamari wrote: >> "Boespflug, Mathieu" writes: >> >>> Some extra points by Jonas, that I'm just forwarding: >>> >>> - CircleCI & Appveyor Cons: dependent on third-party for feature development. >>> - Funding: on one side we have free RackSpace servers, on the other >>> side CircleCI might be willing to fund as well. >> >> Right, I think one of us should get in touch with CircleCI to see >> whether they would be willing to offer us an extension of their usual >> open-source offer. That would certainly change the equation a bit. >> Perhaps Manuel or I could do this? > > To Greg's point, if we have specific data that would justify a > specific level of build job parallelism that we'd like to achieve, > then that might be worth a shot, yes. From manuel.chakravarty at tweag.io Mon Oct 16 04:39:12 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 16 Oct 2017 15:39:12 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: <1453499E-BA4E-456D-9B67-276D8678ACBB@tweag.io> I added the first point as a con to the hosted solution and noted the security concerns with Jenkins even with sandboxing. Re funding, Ben, please correct me if I am wrong, but I don’t think it is clear that the RackSpace server are sufficient for all the pre-merge testing. > 13.10.2017, 03:27 Boespflug, Mathieu : > > Some extra points by Jonas, that I'm just forwarding: > > - CircleCI & Appveyor Cons: dependent on third-party for feature development. > - Funding: on one side we have free RackSpace servers, on the other > side CircleCI might be willing to fund as well. > - Incident report regarding Jenkins: even with sandboxing, the > security record of Jenkins has been patchy. One of our clients (before > we started managing the instance) ended up with a Bitcoin miner on > their rig. The security flaw they exploited was this one: > https://www.cvedetails.com/cve/CVE-2016-0792/. For our client, > firewalling Jenkins was possible, but in this case anyone will want > direct access to the build logs. > > These are comments about solutions though, not requirements. > -- > Mathieu Boespflug > Founder at http://tweag.io. > > > On 12 October 2017 at 15:18, Ben Gamari wrote: >> Manuel M T Chakravarty writes: >> >>> As promised, I have taken a first cut at listing the requirements and >>> the pros and cons of the main contenders on a Trac page: >>> >>> https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration >>> >> I think this list is being a bit generous to the hosted option. >> >> Other costs of this approach might include: >> >> * Under this heterogeneous scheme we will have to maintain two or more >> distinct CI systems, each requiring some degree of setup and >> maintenance. >> >> * Using qemu for building on/for a non-Linux/amd64 platforms requires a >> non-negligible amount of additional complexity (see rust's CI >> implementation [1]) >> >> * It's unclear whether testing GHC via qemu is even practical given >> computational constraints. >> >> * We lose the ability to prioritize jobs, requiring more hardware to >> maintain similar build turnaround >> >> * We are utterly dependent on our CI service(s) to behave well; for >> instance, here are two examples that the Rust infrastructure team >> related to me, >> >> * They have been struggling to keep Travis the tail of their build >> turnaround time distribution in check, with some builds taking >> over 8 hours to complete. Despite raising the issue with Travis >> customer support they are still having trouble, despite being a >> paying customer. >> >> * They have noticed that Travis has a tendency to simply drop builds >> in mid-flight, losing hours of work. Again, despite working with >> upstream they haven't been able to resolve the problem >> >> * They have been strongly affected by apparent instability in >> Travis' OS X infrastructure which goes down, to quote, "*a lot*" >> >> Of course, both of these are picking on Travis in particular as that >> is the example we have available. However, in general the message >> here is that by giving up our own infrastructure we are at the mercy >> of the services that we use. Unfortunately, sometimes those services >> are not accustomed to testing projects of the scale of GHC or rustc. >> At this point you have little recourse but to minimize the damage. >> >> We avoid all of this by self-hosting (at, of course, the expense of >> administration time). Furthermore, we continue to benefit from hardware >> provided by a multitude of sources including users, Rackspace (and other >> VPS providers if we wanted), and programs like OSU OSL. It is important >> to remember that until recently we were operating under the assumption >> that these were the only resources available to us for testing. >> >> It's still quite unclear to me what a CircleCI/Appveyor solution will >> ultimately cost, but will almost certainly not be free. Assuming there >> are users who are willing to foot that bill, this is of course fine. >> However, it's quite contrary to the assumptions we have been working >> with for much of this process. >> >> >> Lastly: If I understand the point correctly, the "the set up is not >> forkable" "con" of Jenkins is not accurate. Under Jenkins the build >> configuration resides in the repository being tested. A user can easily >> modify it and submit a PR, which will be tested just like any other >> change. >> >> >> [1] https://github.com/rust-lang/rust/tree/master/src/ci >> >> >>> Maybe I am biased, but is there any advantage to Jenkins other than >>> that we can run builds and tests on exotic platforms? >> >> Some of these "exotic" platforms might also be called "the most populous >> architecture in the world" (ARM), "the operating system that feeds a >> third of the world's Internet traffic (FreeBSD), and "the operating >> system that powers much of the world's financial system" (AIX). I'm not >> sure that the "exotic" label really does these platforms justice. >> >> More importantly, all of these platforms have contributors working on >> their support in GHC. Historically, GHC HQ has tried to recognize their >> efforts by allowing porters to submit binary distributions which are >> distributed alongside GHC HQ distributions. Recently I have tried to >> pursue a different model, handling some of these binary builds myself in >> the name of consistency and reduced release overhead (as previously we >> incurred a full round-trip through binary build contributors every time >> we released). >> >> The desire to scale our release process up to handle the breadth of >> platforms that GHC supports, with either Tier 1 or what is currently >> Tier 2 support, was one motivation for the new CI effort. While I don't >> consider testing any one of these platforms to be a primary goal, I do >> think it is important to have a viable plan by which they might be >> covered in the future for this reason. >> >> >> To be clear, I am supportive of the CI-as-a-service direction. However, >> I want to recognize the trade-offs where they exist and have answers to >> some of the thorny questions, including those surrounding platform >> support, before committing. >> >> Cheers, >> >> - Ben >> >> _______________________________________________ >> Ghc-devops-group mailing list >> Ghc-devops-group at haskell.org >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group >> From manuel.chakravarty at tweag.io Mon Oct 16 06:11:59 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 16 Oct 2017 17:11:59 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: > 13.10.2017, 00:18 schrieb Ben Gamari : > Manuel M T Chakravarty writes: > >> As promised, I have taken a first cut at listing the requirements and >> the pros and cons of the main contenders on a Trac page: >> >> https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration >> > I think this list is being a bit generous to the hosted option. > > Other costs of this approach might include: > > * Under this heterogeneous scheme we will have to maintain two or more > distinct CI systems, each requiring some degree of setup and > maintenance. As Mathieu mentioned in an earlier post, most of the code is the same. It is essentially just the CI-specific config files that vary. Given how quickly Mathieu wrote the one for CircleCI, I doubt that this is much of an overhead. Anyway, I added a point about having to deal with two CI providers. > * Using qemu for building on/for a non-Linux/amd64 platforms requires a > non-negligible amount of additional complexity (see rust's CI > implementation [1]) > > * It's unclear whether testing GHC via qemu is even practical given > computational constraints. This is part of the biggest disadvantage of hosted CI — i.e., part of the first con of hosted CI. > * We lose the ability to prioritize jobs, requiring more hardware to > maintain similar build turnaround I am not sure. Is that inherently so? > * We are utterly dependent on our CI service(s) to behave well; for > instance, here are two examples that the Rust infrastructure team > related to me, > > * They have been struggling to keep Travis the tail of their build > turnaround time distribution in check, with some builds taking > over 8 hours to complete. Despite raising the issue with Travis > customer support they are still having trouble, despite being a > paying customer. > > * They have noticed that Travis has a tendency to simply drop builds > in mid-flight, losing hours of work. Again, despite working with > upstream they haven't been able to resolve the problem > > * They have been strongly affected by apparent instability in > Travis' OS X infrastructure which goes down, to quote, "*a lot*" > > Of course, both of these are picking on Travis in particular as that > is the example we have available. However, in general the message > here is that by giving up our own infrastructure we are at the mercy > of the services that we use. Unfortunately, sometimes those services > are not accustomed to testing projects of the scale of GHC or rustc. > At this point you have little recourse but to minimize the damage. I think, the issues with large, long running jobs is why Mathieu proposed CicleCI over Travis. But you are right, of course, if we outsource work, we need to trust the people who we outsource to to do a good job. On the other hand, I assume that CircleCI, has a response team that jumps in when bad things happen. In contrast, I don’t think, we want to hand you a pager so we can notify you if some urgent maintenance is needed in the middle of the night. > We avoid all of this by self-hosting (at, of course, the expense of > administration time). Furthermore, we continue to benefit from hardware > provided by a multitude of sources including users, Rackspace (and other > VPS providers if we wanted), and programs like OSU OSL. It is important > to remember that until recently we were operating under the assumption > that these were the only resources available to us for testing. > > It's still quite unclear to me what a CircleCI/Appveyor solution will > ultimately cost, but will almost certainly not be free. Assuming there > are users who are willing to foot that bill, this is of course fine. > However, it's quite contrary to the assumptions we have been working > with for much of this process. Yes, you are right. That we have sponsors for the CI costs changes the situation wrt to the previous planning. And it is precisely one of the reasons why we founded the GHC DevOps group: to unlock new resources. I am sorry that this comes in the middle of the existing effort. I can see how this is annoying. However, all the work on getting GHC’s build in shape and the scripts to generate artefacts are all still needed. > Lastly: If I understand the point correctly, the "the set up is not > forkable" "con" of Jenkins is not accurate. Under Jenkins the build > configuration resides in the repository being tested. A user can easily > modify it and submit a PR, which will be tested just like any other > change. That is not what I mean by forkable, because this still requires that user to use the central infrastructure. Forkable here means that they can run CI on, e.g., their own CircleCI account. That makes things more scalable as the user doesn’t count towards our limits and doesn’t put stress on our infrastructure (including cluttering things with PRs, which may not really be for integration (yet), but just for testing). A user can even experiment with varying the CI set up on their own without involving us. With Jenkins that is much harder, because they need to recreate the CI infrastructure. > [1] https://github.com/rust-lang/rust/tree/master/src/ci > > >> Maybe I am biased, but is there any advantage to Jenkins other than >> that we can run builds and tests on exotic platforms? > > Some of these "exotic" platforms might also be called "the most populous > architecture in the world” (ARM), You keep mentioning ARM. I don’t understand. We can run Android and iOS CI on CircleCI. (Any other OS on ARM, I would categorise as exotic, though.) > "the operating system that feeds a > third of the world's Internet traffic (FreeBSD), and "the operating > system that powers much of the world's financial system" (AIX). I'm not > sure that the ”exotic" label really does these platforms justice. AFAIK virtually nobody runs GHC on those — i.e., wrt to this specific discussion, these are exotic platforms. > More importantly, all of these platforms have contributors working on > their support in GHC. Historically, GHC HQ has tried to recognize their > efforts by allowing porters to submit binary distributions which are > distributed alongside GHC HQ distributions. Recently I have tried to > pursue a different model, handling some of these binary builds myself in > the name of consistency and reduced release overhead (as previously we > incurred a full round-trip through binary build contributors every time > we released). It is nice to support all contributors, but I think, we shouldn’t do it at the expense of the main platforms. I think, we all agree that we need really proper CI and we need fully automatic release builds. Putting those in place as quickly and with as little effort as possible ought to be our main goal IMHO. > The desire to scale our release process up to handle the breadth of > platforms that GHC supports, with either Tier 1 or what is currently > Tier 2 support, was one motivation for the new CI effort. While I don't > consider testing any one of these platforms to be a primary goal, I do > think it is important to have a viable plan by which they might be > covered in the future for this reason. > > > To be clear, I am supportive of the CI-as-a-service direction. However, > I want to recognize the trade-offs where they exist and have answers to > some of the thorny questions, including those surrounding platform > support, before committing. We absolutely want to make a rationale choice on the basis of all the facts. However, I strongly think that some considerations have to have more weight than others. Simply what I have learnt about Jenkins security and the amount of *your* time that a Jenkins setup appears to costs gives me pause. I will happily incur more complexity for building and testing exotic platforms in exchange for that. Cheers, Manuel From m at tweag.io Mon Oct 16 08:58:40 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Mon, 16 Oct 2017 10:58:40 +0200 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: Hi Manuel, On 16 October 2017 at 08:11, Manuel M T Chakravarty wrote: > > [...] > >> * We lose the ability to prioritize jobs, requiring more hardware to >> maintain similar build turnaround > > I am not sure. Is that inherently so? It is. To some extent. That said, we do not to think carefully about what our requirements really are and why. CircleCI has a notion of "workflow". This is extremely powerful. But what it means in this context is that you can always run the e.g. the Linux 64-bit validate job before the 32-bit one, and decide to run the latter only if the former succeeds. Currently macOS has not been integrated into this workflow support (it's a new feature), so macOS builds will trigger in parallel to Linux builds. We'll want to do Windows builds on Appveyor I think. Those won't be part of any CircleCI workflow, not without some hacking. Does this matter? Well, why have prioritization in the first place? To avoid tying up multiple build resources if the build is very likely to fail anyways, and therefore run the build on just one instance first to save on resource usage? Early feedback to the user? The user will get to know the build failed as soon as it fails on any platform, so I don't think prioritization of one particular platform helps. So that leaves us with resource usage minimization. macOS has its own quota of parallel builds, separate from Linux. And separate from Windows as well. The exact quota depends on our plan (free plans allow 2-4 parallel builds, paid plans allow for more). We could actually try to play games to avoid failing builds from clogging the build queue. But this is yet another case of - just throw more money at it to get more slots and keep it simple. That way, no prioritization necessary. KISS saves human time, hence saves money overall. >> * We are utterly dependent on our CI service(s) to behave well; for >> instance, here are two examples that the Rust infrastructure team >> related to me, >> >> * They have been struggling to keep Travis the tail of their build >> turnaround time distribution in check, with some builds taking >> over 8 hours to complete. Despite raising the issue with Travis >> customer support they are still having trouble, despite being a >> paying customer. >> >> * They have noticed that Travis has a tendency to simply drop builds >> in mid-flight, losing hours of work. Again, despite working with >> upstream they haven't been able to resolve the problem >> >> * They have been strongly affected by apparent instability in >> Travis' OS X infrastructure which goes down, to quote, "*a lot*" >> >> Of course, both of these are picking on Travis in particular as that >> is the example we have available. However, in general the message >> here is that by giving up our own infrastructure we are at the mercy >> of the services that we use. Unfortunately, sometimes those services >> are not accustomed to testing projects of the scale of GHC or rustc. >> At this point you have little recourse but to minimize the damage. > > I think, the issues with large, long running jobs is why Mathieu proposed CicleCI over Travis. But you are right, of course, if we outsource work, we need to trust the people who we outsource to to do a good job. > > On the other hand, I assume that CircleCI, has a response team that jumps in when bad things happen. In contrast, I don’t think, we want to hand you a pager so we can notify you if some urgent maintenance is needed in the middle of the night. FWIW, we've had projects with 200+ builds a month (per project) on CircleCI for some time without these kinds of issues. Our experience with Travis CI isn't as extensive as the Rust team. But I get the impression Travis prioritize non-paying open source projects lower than projects with paid plans. For me the main reason for CircleCI is mainly the availability of faster build hardware (and yes, lower queuing times). But Travis CI should work just fine too (let's work out the math). We won't yet be at Rust's scale any time soon (3 supported platforms vs 35+ platforms supported). Best, Mathieu From ben at well-typed.com Mon Oct 16 16:11:51 2017 From: ben at well-typed.com (Ben Gamari) Date: Mon, 16 Oct 2017 12:11:51 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: <1453499E-BA4E-456D-9B67-276D8678ACBB@tweag.io> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <1453499E-BA4E-456D-9B67-276D8678ACBB@tweag.io> Message-ID: <878tgbrs1k.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: > I added the first point as a con to the hosted solution and noted the > security concerns with Jenkins even with sandboxing. > > Re funding, Ben, please correct me if I am wrong, but I don’t think it > is clear that the RackSpace server are sufficient for all the > pre-merge testing. > That is true; we may indeed need more capacity if we were to go that route. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From ben at well-typed.com Thu Oct 19 01:29:24 2017 From: ben at well-typed.com (Ben Gamari) Date: Wed, 18 Oct 2017 21:29:24 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> Message-ID: <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: >> 13.10.2017, 00:18 schrieb Ben Gamari : >> >> I think this list is being a bit generous to the hosted option. >> >> Other costs of this approach might include: >> >> * Under this heterogeneous scheme we will have to maintain two or more >> distinct CI systems, each requiring some degree of setup and >> maintenance. > > As Mathieu mentioned in an earlier post, most of the code is the same. > It is essentially just the CI-specific config files that vary. Given > how quickly Mathieu wrote the one for CircleCI, I doubt that this is > much of an overhead. Anyway, I added a point about having to deal with > two CI providers. > Indeed we shall see. You may very well be right. > I think, the issues with large, long running jobs is why Mathieu > proposed CicleCI over Travis. But you are right, of course, if we > outsource work, we need to trust the people who we outsource to to do > a good job. > > On the other hand, I assume that CircleCI, has a response team that > jumps in when bad things happen. Indeed, in speaking to the Rust folks they said that they were generally fairly impressed with CircleCI and have been considering moving. > In contrast, I don’t think, we want to hand you a pager so we can > notify you if some urgent maintenance is needed in the middle of the > night. I appreciate that; I wouldn't want that either :) >> It's still quite unclear to me what a CircleCI/Appveyor solution will >> ultimately cost, but will almost certainly not be free. Assuming there >> are users who are willing to foot that bill, this is of course fine. >> However, it's quite contrary to the assumptions we have been working >> with for much of this process. > > Yes, you are right. That we have sponsors for the CI costs changes the > situation wrt to the previous planning. And it is precisely one of the > reasons why we founded the GHC DevOps group: to unlock new resources. > > I am sorry that this comes in the middle of the existing effort. I can > see how this is annoying. However, all the work on getting GHC’s build > in shape and the scripts to generate artefacts are all still needed. Indeed the timing was slightly suboptimal but I'm nevertheless very glad you brought this up. You have definitely helped me better understand the trade-offs at play and I think at this point I can say that, despite the costs, moving to CircleCI/Appveyor is the right decision. This is especially true in light of today's news that Rackspace will be ending their open source support program at the end of this year, which makes sustaining any self-hosted solution significantly harder resource-wise. Let's figure out how to make this happen. This week is turning out to be a bit busy for me, but perhaps this weekend or next week I'll see what can be done to hook up Phabricator and try to get the build automation in place for i386 and CentOS. However, if you have time do feel free to start plugging away at this yourself in the meantime. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From manuel.chakravarty at tweag.io Thu Oct 19 02:02:03 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Thu, 19 Oct 2017 13:02:03 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> Message-ID: <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> > 19.10.2017, 12:29 schrieb Ben Gamari : > Manuel M T Chakravarty writes: >>> 13.10.2017, 00:18 schrieb Ben Gamari : >>> It's still quite unclear to me what a CircleCI/Appveyor solution will >>> ultimately cost, but will almost certainly not be free. Assuming there >>> are users who are willing to foot that bill, this is of course fine. >>> However, it's quite contrary to the assumptions we have been working >>> with for much of this process. >> >> Yes, you are right. That we have sponsors for the CI costs changes the >> situation wrt to the previous planning. And it is precisely one of the >> reasons why we founded the GHC DevOps group: to unlock new resources. >> >> I am sorry that this comes in the middle of the existing effort. I can >> see how this is annoying. However, all the work on getting GHC’s build >> in shape and the scripts to generate artefacts are all still needed. > > Indeed the timing was slightly suboptimal but I'm nevertheless very glad > you brought this up. You have definitely helped me better understand the > trade-offs at play and I think at this point I can say that, despite the > costs, moving to CircleCI/Appveyor is the right decision. Ok, great. Thank you for pushing us to think through all the various issues. > This is especially true in light of today's news that Rackspace will be > ending their open source support program at the end of this year, which > makes sustaining any self-hosted solution significantly harder > resource-wise. Oh, I didn’t know that. Will that affect haskell.org as well? > Let's figure out how to make this happen. This week is turning out to be > a bit busy for me, but perhaps this weekend or next week I'll see what > can be done to hook up Phabricator and try to get the build automation > in place for i386 and CentOS. However, if you have time do feel free to > start plugging away at this yourself in the meantime. Jonas spent some time on the macOS build on CircleCI. It might be useful to make a list of all the things that need to be done and add it to the CI Trac page. Should we take a stab at that next week (when your schedule calms down a bit)? Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Thu Oct 19 02:33:28 2017 From: ben at well-typed.com (Ben Gamari) Date: Wed, 18 Oct 2017 22:33:28 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> Message-ID: <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: >> 19.10.2017, 12:29 schrieb Ben Gamari : > >> This is especially true in light of today's news that Rackspace will be >> ending their open source support program at the end of this year, which >> makes sustaining any self-hosted solution significantly harder >> resource-wise. > > Oh, I didn’t know that. Will that affect haskell.org > as well? > I'm afraid so. >> Let's figure out how to make this happen. This week is turning out to be >> a bit busy for me, but perhaps this weekend or next week I'll see what >> can be done to hook up Phabricator and try to get the build automation >> in place for i386 and CentOS. However, if you have time do feel free to >> start plugging away at this yourself in the meantime. > > Jonas spent some time on the macOS build on CircleCI. > > It might be useful to make a list of all the things that need to be > done and add it to the CI Trac page. Should we take a stab at that > next week (when your schedule calms down a bit)? > Let's make sure that there are no objections from the Simons (or anyone else on the list, for that matter). If not, then sure, let's do it. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From manuel.chakravarty at tweag.io Thu Oct 19 03:02:43 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Thu, 19 Oct 2017 14:02:43 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> Message-ID: <2BEE73BA-0A02-4CD2-B433-0C11A8A8EE3C@tweag.io> > 19.10.2017, 13:33 Ben Gamari : > Manuel M T Chakravarty writes: >>> 19.10.2017, 12:29 schrieb Ben Gamari : >>> Let's figure out how to make this happen. This week is turning out to be >>> a bit busy for me, but perhaps this weekend or next week I'll see what >>> can be done to hook up Phabricator and try to get the build automation >>> in place for i386 and CentOS. However, if you have time do feel free to >>> start plugging away at this yourself in the meantime. >> >> Jonas spent some time on the macOS build on CircleCI. >> >> It might be useful to make a list of all the things that need to be >> done and add it to the CI Trac page. Should we take a stab at that >> next week (when your schedule calms down a bit)? >> > Let's make sure that there are no objections from the Simons (or anyone > else on the list, for that matter). If not, then sure, let's do it. Yes, absolutely, let’s make it explicit. Are there any remaining objections to us going forward with the hosted option? Manuel From simonpj at microsoft.com Thu Oct 19 08:06:55 2017 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Thu, 19 Oct 2017 08:06:55 +0000 Subject: [GHC DevOps Group] CI In-Reply-To: <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> Message-ID: | > It might be useful to make a list of all the things that need to be | > done and add it to the CI Trac page. Should we take a stab at that | > next week (when your schedule calms down a bit)? | > | Let's make sure that there are no objections from the Simons (or | anyone else on the list, for that matter). If not, then sure, let's do | it. Yes... I lack the expertise and capacity to follow this debate, but once you have come to a collective view it would be very helpful to write a summary of the proposed course of action, and reasoning. I can say in advance that I'm extremely unlikely to object, but it's a good discipline, if only to share the thinking with a broader audience. Simon | -----Original Message----- | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] | On Behalf Of Ben Gamari | Sent: 19 October 2017 03:33 | To: Manuel M T Chakravarty | Cc: ghc-devops-group at haskell.org | Subject: Re: [GHC DevOps Group] CI | | Manuel M T Chakravarty writes: | | >> 19.10.2017, 12:29 schrieb Ben Gamari : | > | >> This is especially true in light of today's news that Rackspace | will | >> be ending their open source support program at the end of this | year, | >> which makes sustaining any self-hosted solution significantly | harder | >> resource-wise. | > | > Oh, I didn’t know that. Will that affect haskell.org | > | as well? | > | I'm afraid so. | >> Let's figure out how to make this happen. This week is turning out | to | >> be a bit busy for me, but perhaps this weekend or next week I'll | see | >> what can be done to hook up Phabricator and try to get the build | >> automation in place for i386 and CentOS. However, if you have time | do | >> feel free to start plugging away at this yourself in the meantime. | > | > Jonas spent some time on the macOS build on CircleCI. | > | > It might be useful to make a list of all the things that need to be | > done and add it to the CI Trac page. Should we take a stab at that | > next week (when your schedule calms down a bit)? | > | Let's make sure that there are no objections from the Simons (or | anyone else on the list, for that matter). If not, then sure, let's do | it. | | Cheers, | | - Ben From marlowsd at gmail.com Thu Oct 19 09:41:01 2017 From: marlowsd at gmail.com (Simon Marlow) Date: Thu, 19 Oct 2017 10:41:01 +0100 Subject: [GHC DevOps Group] CI In-Reply-To: <2BEE73BA-0A02-4CD2-B433-0C11A8A8EE3C@tweag.io> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> <2BEE73BA-0A02-4CD2-B433-0C11A8A8EE3C@tweag.io> Message-ID: On 19 October 2017 at 04:02, Manuel M T Chakravarty < manuel.chakravarty at tweag.io> wrote: > > 19.10.2017, 13:33 Ben Gamari : > > Manuel M T Chakravarty writes: > >>> 19.10.2017, 12:29 schrieb Ben Gamari : > >>> Let's figure out how to make this happen. This week is turning out to > be > >>> a bit busy for me, but perhaps this weekend or next week I'll see what > >>> can be done to hook up Phabricator and try to get the build automation > >>> in place for i386 and CentOS. However, if you have time do feel free to > >>> start plugging away at this yourself in the meantime. > >> > >> Jonas spent some time on the macOS build on CircleCI. > >> > >> It might be useful to make a list of all the things that need to be > >> done and add it to the CI Trac page. Should we take a stab at that > >> next week (when your schedule calms down a bit)? > >> > > Let's make sure that there are no objections from the Simons (or anyone > > else on the list, for that matter). If not, then sure, let's do it. > > Yes, absolutely, let’s make it explicit. Are there any remaining > objections to us going forward with the hosted option? Not from me, but just to point out that CircleCI has Phabricator integration (https://circleci.com/docs/1.0/phabricator/) and it looks pretty straightforward to set up. I don't know if AppVeyor has something similar, but if it does then we can decouple the choice of CI from code-review tool, and move more slowly on the latter issue. I think that's important, because changing existing workflows will have more impact on existing developers, and we need a longer period to discuss that. Also there are a number of technical issues with adopting GitHub that haven't been discussed yet (existing commit hooks in particular). Cheers Simon Manuel > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m at tweag.io Thu Oct 19 10:20:05 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Thu, 19 Oct 2017 12:20:05 +0200 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> <2BEE73BA-0A02-4CD2-B433-0C11A8A8EE3C@tweag.io> Message-ID: On 19 October 2017 at 11:41, Simon Marlow wrote: > > [...] > > Not from me, but just to point out that CircleCI has Phabricator integration > (https://circleci.com/docs/1.0/phabricator/) and it looks pretty > straightforward to set up. I don't know if AppVeyor has something similar, > but if it does then we can decouple the choice of CI from code-review tool, > and move more slowly on the latter issue. Quite possibly, yes. Though the page you mention is for CircleCI 1.0, when we'll probably want to be using the 2.0 infrastructure. What this would amount to doing is swap out HarborMaster, but pretty much not any of the rest. From manuel.chakravarty at tweag.io Fri Oct 20 00:45:38 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Fri, 20 Oct 2017 11:45:38 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> <2BEE73BA-0A02-4CD2-B433-0C11A8A8EE3C@tweag.io> Message-ID: I had a quick look at the CircleCI docs and it seems as if the CircleCI 1.0 documentation might still be applicable as Phabricator integration is based on the CircleCI API on the CircleCI side. That API is v1.1 right now and supposedly works with both CircleCI 1.0 and 2.0 as per https://circleci.com/docs/api/ Manuel > 19.10.2017, 21:20 Boespflug, Mathieu : > > On 19 October 2017 at 11:41, Simon Marlow wrote: >> >> [...] >> >> Not from me, but just to point out that CircleCI has Phabricator integration >> (https://circleci.com/docs/1.0/phabricator/) and it looks pretty >> straightforward to set up. I don't know if AppVeyor has something similar, >> but if it does then we can decouple the choice of CI from code-review tool, >> and move more slowly on the latter issue. > > Quite possibly, yes. Though the page you mention is for CircleCI 1.0, > when we'll probably want to be using the 2.0 infrastructure. What this > would amount to doing is swap out HarborMaster, but pretty much not > any of the rest. > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Fri Oct 20 03:09:03 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Fri, 20 Oct 2017 14:09:03 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> Message-ID: <968F7F67-F526-4E74-B413-7FC65BEE2F60@tweag.io> I summarised at https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration#Discussionsummary Please let me know if I got anything wrong. Thanks, Manuel > 19.10.2017, 19:06 Simon Peyton Jones : > > | > It might be useful to make a list of all the things that need to be > | > done and add it to the CI Trac page. Should we take a stab at that > | > next week (when your schedule calms down a bit)? > | > > | Let's make sure that there are no objections from the Simons (or > | anyone else on the list, for that matter). If not, then sure, let's do > | it. > > Yes... I lack the expertise and capacity to follow this debate, but once you have come to a collective view it would be very helpful to write a summary of the proposed course of action, and reasoning. I can say in advance that I'm extremely unlikely to object, but it's a good discipline, if only to share the thinking with a broader audience. > > Simon > > | -----Original Message----- > | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] > | On Behalf Of Ben Gamari > | Sent: 19 October 2017 03:33 > | To: Manuel M T Chakravarty > | Cc: ghc-devops-group at haskell.org > | Subject: Re: [GHC DevOps Group] CI > | > | Manuel M T Chakravarty writes: > | > | >> 19.10.2017, 12:29 schrieb Ben Gamari : > | > > | >> This is especially true in light of today's news that Rackspace > | will > | >> be ending their open source support program at the end of this > | year, > | >> which makes sustaining any self-hosted solution significantly > | harder > | >> resource-wise. > | > > | > Oh, I didn’t know that. Will that affect haskell.org > | > > | | ll.org%2F&data=02%7C01%7Csimonpj%40microsoft.com%7C3bb4951e754a4a60dbf > | 008d51699d2a6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63643977227 > | 7237617&sdata=%2FhbvBLvW6AnYy4QPa73uKRHl4cD1uM5Z5ei3brXg1w8%3D&reserve > | d=0> as well? > | > > | I'm afraid so. > | >> Let's figure out how to make this happen. This week is turning out > | to > | >> be a bit busy for me, but perhaps this weekend or next week I'll > | see > | >> what can be done to hook up Phabricator and try to get the build > | >> automation in place for i386 and CentOS. However, if you have time > | do > | >> feel free to start plugging away at this yourself in the meantime. > | > > | > Jonas spent some time on the macOS build on CircleCI. > | > > | > It might be useful to make a list of all the things that need to be > | > done and add it to the CI Trac page. Should we take a stab at that > | > next week (when your schedule calms down a bit)? > | > > | Let's make sure that there are no objections from the Simons (or > | anyone else on the list, for that matter). If not, then sure, let's do > | it. > | > | Cheers, > | > | - Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Fri Oct 20 11:52:40 2017 From: ben at well-typed.com (Ben Gamari) Date: Fri, 20 Oct 2017 07:52:40 -0400 Subject: [GHC DevOps Group] CI In-Reply-To: <968F7F67-F526-4E74-B413-7FC65BEE2F60@tweag.io> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> <968F7F67-F526-4E74-B413-7FC65BEE2F60@tweag.io> Message-ID: <87fuaeox2v.fsf@ben-laptop.smart-cactus.org> Manuel M T Chakravarty writes: > I summarised at > > https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration#Discussionsummary > > Please let me know if I got anything wrong. > Thanks Manuel! Yes, this looks fairly comprehensive. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From manuel.chakravarty at tweag.io Mon Oct 23 05:31:01 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 23 Oct 2017 16:31:01 +1100 Subject: [GHC DevOps Group] CI In-Reply-To: <87fuaeox2v.fsf@ben-laptop.smart-cactus.org> References: <87mv4wjyhr.fsf@ben-laptop.smart-cactus.org> <87zi8oorgr.fsf@ben-laptop.smart-cactus.org> <3659802E-AC0B-4086-94F0-CF7E4BA9C9CF@tweag.io> <87wp3rq32f.fsf@ben-laptop.smart-cactus.org> <968F7F67-F526-4E74-B413-7FC65BEE2F60@tweag.io> <87fuaeox2v.fsf@ben-laptop.smart-cactus.org> Message-ID: > Ben Gamari : > Manuel M T Chakravarty writes: > >> I summarised at >> >> https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration#Discussionsummary >> >> Please let me know if I got anything wrong. >> > Thanks Manuel! Yes, this looks fairly comprehensive. I took a first very rough stab at planning out the concrete steps to be taken: https://ghc.haskell.org/trac/ghc/wiki/ContinuousIntegration#Todolist Could you have a look at and add/change/improve whatever comes to your mind? If you think, we should have a call about this, let me know. Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Mon Oct 23 05:49:04 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 23 Oct 2017 16:49:04 +1100 Subject: [GHC DevOps Group] New release process Message-ID: One of the stated goals of our effort is to move to calendar-based 6-monthly releases of GHC. To this end, I had a conversation with Ben a while ago, where discussed the following schedule for v6.4 & v6.6: 6.4 release planned for Feb - Branch in Nov 6.6 release ought to then be in Aug - Branch in May Pre-release schedule - On cutting the branch, alpha release - Then, a beta every two weeks until 4 weeks before the targeted release date - RC1 four weeks before the targeted release date - RC2 two weeks before the targeted release date For this to be realistic, we do need to have the automatic release artefact building in place by the time of cutting the v6.4 branch. This doesn’t leave us much time to get this up and running. Ben, this also requires us to settle https://github.com/snowleopard/hadrian/issues/440 soon. We need to discuss a policy of what can go into the release branch after it has been cut. IMHO, it cannot be major features, but only small changes and fixes until RC1. Then, only fixes. What do you all think? Cheers, Manuel -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at snoyman.com Mon Oct 23 08:55:51 2017 From: michael at snoyman.com (Michael Snoyman) Date: Mon, 23 Oct 2017 11:55:51 +0300 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: I'm likely going to be a contrary opinion here: I'd prefer _less_ frequent GHC releases, not more frequent. Or at the very least: if we're talking about major releases with intentional breakage, I see multiple problems: * Many libraries are subscribing to a "3 versions" support policy. Having a breaking release every 6 months means that libraries will be phasing out support for a GHC release after only 1.5 years. * From a Stackage perspective, I see how long it takes for the ecosystem to catch up to a new GHC release. Depending on what it means for the ecosystem to have caught up, I'd put that number anywhere from 2-4 months. Personally, I'd be much more interested in seeing: * More frequent point releases, with more guarantees of backwards compatibility between releases. We've had situations in the past where point releases have broken a significant number of packages, and it would be great if we could work towards avoiding that with automated testing. * Perhaps some plans allowing for introducing new functionality in point releases without breaking backwards compatibility. >From the standpoints of a package author, a tooling maintainer, and someone helping companies with commercial rollouts of Haskell code, I've grown to fear GHC releases. I'd rather we fix those problems before increasing frequency. On Mon, Oct 23, 2017 at 8:49 AM, Manuel M T Chakravarty < manuel.chakravarty at tweag.io> wrote: > One of the stated goals of our effort is to move to calendar-based > 6-monthly releases of GHC. To this end, I had a conversation with Ben a > while ago, where discussed the following schedule for v6.4 & v6.6: > > 6.4 release planned for Feb > - Branch in Nov > > 6.6 release ought to then be in Aug > - Branch in May > > Pre-release schedule > - On cutting the branch, alpha release > - Then, a beta every two weeks until 4 weeks before the targeted release > date > - RC1 four weeks before the targeted release date > - RC2 two weeks before the targeted release date > > For this to be realistic, we do need to have the automatic release > artefact building in place by the time of cutting the v6.4 branch. This > doesn’t leave us much time to get this up and running. > > Ben, this also requires us to settle > > https://github.com/snowleopard/hadrian/issues/440 > > soon. > > We need to discuss a policy of what can go into the release branch after > it has been cut. IMHO, it cannot be major features, but only small changes > and fixes until RC1. Then, only fixes. > > What do you all think? > > Cheers, > Manuel > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m at tweag.io Mon Oct 23 22:12:27 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 24 Oct 2017 00:12:27 +0200 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: Hi Manuel, whoa 6.4... throwback from the past! ;) The main point of more frequent releases is to get new features in the hand of users faster. This in turn leads to a more reliable process, because there is less pressure to integrate features late in the release cycle. If I understand the proposed release calendar, the lifecycle of a new feature is: * Get a Diff reviewed and merged into master. * Merge needs to happen 3-4 months (according to the dates you mention) before the next release. * If the merge happens immediately after the release branch is cut, then the feature technically has to wait out up to 10 months (if on a 6 month release cycle) before making it into a release. Am I understanding this right? It would be nice to have a fleshed out branching model on a wiki page. On 23 October 2017 at 07:49, Manuel M T Chakravarty wrote: > One of the stated goals of our effort is to move to calendar-based 6-monthly > releases of GHC. To this end, I had a conversation with Ben a while ago, > where discussed the following schedule for v6.4 & v6.6: > > 6.4 release planned for Feb > - Branch in Nov > > 6.6 release ought to then be in Aug > - Branch in May > > Pre-release schedule > - On cutting the branch, alpha release > - Then, a beta every two weeks until 4 weeks before the targeted release > date > - RC1 four weeks before the targeted release date > - RC2 two weeks before the targeted release date > > For this to be realistic, we do need to have the automatic release artefact > building in place by the time of cutting the v6.4 branch. This doesn’t leave > us much time to get this up and running. > > Ben, this also requires us to settle > > https://github.com/snowleopard/hadrian/issues/440 > > soon. > > We need to discuss a policy of what can go into the release branch after it > has been cut. IMHO, it cannot be major features, but only small changes and > fixes until RC1. Then, only fixes. > > What do you all think? > > Cheers, > Manuel > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From m at tweag.io Mon Oct 23 22:14:22 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 24 Oct 2017 00:14:22 +0200 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: > It would be nice to have a fleshed out branching model on a wiki page. And I forgot to mention... one that explains what the branching model and schedule mean for backwards compatibility. -- Mathieu Boespflug Founder at http://tweag.io. On 24 October 2017 at 00:12, Boespflug, Mathieu wrote: > Hi Manuel, > > whoa 6.4... throwback from the past! ;) > > The main point of more frequent releases is to get new features in the > hand of users faster. This in turn leads to a more reliable process, > because there is less pressure to integrate features late in the > release cycle. > > If I understand the proposed release calendar, the lifecycle of a new > feature is: > > * Get a Diff reviewed and merged into master. > * Merge needs to happen 3-4 months (according to the dates you > mention) before the next release. > * If the merge happens immediately after the release branch is cut, > then the feature technically has to wait out up to 10 months (if on a > 6 month release cycle) before making it into a release. > > Am I understanding this right? > > It would be nice to have a fleshed out branching model on a wiki page. > > > On 23 October 2017 at 07:49, Manuel M T Chakravarty > wrote: >> One of the stated goals of our effort is to move to calendar-based 6-monthly >> releases of GHC. To this end, I had a conversation with Ben a while ago, >> where discussed the following schedule for v6.4 & v6.6: >> >> 6.4 release planned for Feb >> - Branch in Nov >> >> 6.6 release ought to then be in Aug >> - Branch in May >> >> Pre-release schedule >> - On cutting the branch, alpha release >> - Then, a beta every two weeks until 4 weeks before the targeted release >> date >> - RC1 four weeks before the targeted release date >> - RC2 two weeks before the targeted release date >> >> For this to be realistic, we do need to have the automatic release artefact >> building in place by the time of cutting the v6.4 branch. This doesn’t leave >> us much time to get this up and running. >> >> Ben, this also requires us to settle >> >> https://github.com/snowleopard/hadrian/issues/440 >> >> soon. >> >> We need to discuss a policy of what can go into the release branch after it >> has been cut. IMHO, it cannot be major features, but only small changes and >> fixes until RC1. Then, only fixes. >> >> What do you all think? >> >> Cheers, >> Manuel >> >> >> _______________________________________________ >> Ghc-devops-group mailing list >> Ghc-devops-group at haskell.org >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group >> From simonpj at microsoft.com Mon Oct 23 22:43:57 2017 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Mon, 23 Oct 2017 22:43:57 +0000 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: | * Get a Diff reviewed and merged into master. | * Merge needs to happen 3-4 months (according to the dates you | mention) before the next release. | * If the merge happens immediately after the release branch is cut, then | the feature technically has to wait out up to 10 months (if on a | 6 month release cycle) before making it into a release. 10 months is long. I thought we were aiming for 6 months max, so as to reduce pressure to "just get this in"? If we fork 4 months before release, does that mean that nothing makes it into the release branch except bug fixes? If we have better CI do we really need 4 months to stablise? S | -----Original Message----- | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] On | Behalf Of Boespflug, Mathieu | Sent: 23 October 2017 23:12 | To: Manuel M T Chakravarty | Cc: ghc-devops-group at haskell.org | Subject: Re: [GHC DevOps Group] New release process | | Hi Manuel, | | whoa 6.4... throwback from the past! ;) | | The main point of more frequent releases is to get new features in the | hand of users faster. This in turn leads to a more reliable process, | because there is less pressure to integrate features late in the release | cycle. | | If I understand the proposed release calendar, the lifecycle of a new | feature is: | | * Get a Diff reviewed and merged into master. | * Merge needs to happen 3-4 months (according to the dates you | mention) before the next release. | * If the merge happens immediately after the release branch is cut, then | the feature technically has to wait out up to 10 months (if on a | 6 month release cycle) before making it into a release. | | Am I understanding this right? | | It would be nice to have a fleshed out branching model on a wiki page. | | | On 23 October 2017 at 07:49, Manuel M T Chakravarty | wrote: | > One of the stated goals of our effort is to move to calendar-based | > 6-monthly releases of GHC. To this end, I had a conversation with Ben | > a while ago, where discussed the following schedule for v6.4 & v6.6: | > | > 6.4 release planned for Feb | > - Branch in Nov | > | > 6.6 release ought to then be in Aug | > - Branch in May | > | > Pre-release schedule | > - On cutting the branch, alpha release | > - Then, a beta every two weeks until 4 weeks before the targeted | > release date | > - RC1 four weeks before the targeted release date | > - RC2 two weeks before the targeted release date | > | > For this to be realistic, we do need to have the automatic release | > artefact building in place by the time of cutting the v6.4 branch. | > This doesn’t leave us much time to get this up and running. | > | > Ben, this also requires us to settle | > | > | > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu | > b.com%2Fsnowleopard%2Fhadrian%2Fissues%2F440&data=02%7C01%7Csimonpj%40 | > microsoft.com%7C29cefad3d8874c85cfcf08d51a632d28%7C72f988bf86f141af91a | > b2d7cd011db47%7C1%7C0%7C636443935617660899&sdata=VtDfZIKncpArPK0etDYFM | > YCHiqM9LbwAnnco0TuCV7o%3D&reserved=0 | > | > soon. | > | > We need to discuss a policy of what can go into the release branch | > after it has been cut. IMHO, it cannot be major features, but only | > small changes and fixes until RC1. Then, only fixes. | > | > What do you all think? | > | > Cheers, | > Manuel | > | > | > _______________________________________________ | > Ghc-devops-group mailing list | > Ghc-devops-group at haskell.org | > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group | > | _______________________________________________ | Ghc-devops-group mailing list | Ghc-devops-group at haskell.org | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From m at tweag.io Mon Oct 23 22:46:23 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 24 Oct 2017 00:46:23 +0200 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: Hi Michael, we frequently have to resort to awkwardly pinning builds to custom versions of GHC because the latest major release broke things but at the same time contain essential new functionality. So I agree that more frequent point releases would be desirable. A compounding factor is that many bugs that could have been caught before a release are not. Presumably like you, the last few major releases have all broken at least one of our packages (even the Stackage ones). That is why like you, we fear GHC major releases. At the same time, the current release model is that no new functionality makes into point releases. That makes sense: many new features make pervasive changes to the compiler that interact with others in non-trivial ways, so releasing them in batch at known intervals saves time. It's also easier for downstream packages to track which compiler versions they're compatible with. That is why Manuel presented a PoC at HIW this year for adding an extra channel in Stackage that tracks GHC HEAD, to detect breakage in the ecosystem early before the release. Releasing more frequently means changes between major versions are more incremental, easier to adapt to, and amount to "some plans allowing for introducing new functionality [earlier] without breaking backwards compatibility", *depending on how we amend the BC policy*. Leveraging Stackage means BC issues can be caught very early in a change's lifecycle - long before a new release is even cut. Best, -- Mathieu Boespflug Founder at http://tweag.io. On 23 October 2017 at 10:55, Michael Snoyman wrote: > I'm likely going to be a contrary opinion here: I'd prefer _less_ frequent > GHC releases, not more frequent. Or at the very least: if we're talking > about major releases with intentional breakage, I see multiple problems: > > * Many libraries are subscribing to a "3 versions" support policy. Having a > breaking release every 6 months means that libraries will be phasing out > support for a GHC release after only 1.5 years. > * From a Stackage perspective, I see how long it takes for the ecosystem to > catch up to a new GHC release. Depending on what it means for the ecosystem > to have caught up, I'd put that number anywhere from 2-4 months. > > Personally, I'd be much more interested in seeing: > > * More frequent point releases, with more guarantees of backwards > compatibility between releases. We've had situations in the past where point > releases have broken a significant number of packages, and it would be great > if we could work towards avoiding that with automated testing. > * Perhaps some plans allowing for introducing new functionality in point > releases without breaking backwards compatibility. > > From the standpoints of a package author, a tooling maintainer, and someone > helping companies with commercial rollouts of Haskell code, I've grown to > fear GHC releases. I'd rather we fix those problems before increasing > frequency. > > On Mon, Oct 23, 2017 at 8:49 AM, Manuel M T Chakravarty > wrote: >> >> One of the stated goals of our effort is to move to calendar-based >> 6-monthly releases of GHC. To this end, I had a conversation with Ben a >> while ago, where discussed the following schedule for v6.4 & v6.6: >> >> 6.4 release planned for Feb >> - Branch in Nov >> >> 6.6 release ought to then be in Aug >> - Branch in May >> >> Pre-release schedule >> - On cutting the branch, alpha release >> - Then, a beta every two weeks until 4 weeks before the targeted release >> date >> - RC1 four weeks before the targeted release date >> - RC2 two weeks before the targeted release date >> >> For this to be realistic, we do need to have the automatic release >> artefact building in place by the time of cutting the v6.4 branch. This >> doesn’t leave us much time to get this up and running. >> >> Ben, this also requires us to settle >> >> https://github.com/snowleopard/hadrian/issues/440 >> >> soon. >> >> We need to discuss a policy of what can go into the release branch after >> it has been cut. IMHO, it cannot be major features, but only small changes >> and fixes until RC1. Then, only fixes. >> >> What do you all think? >> >> Cheers, >> Manuel >> >> >> _______________________________________________ >> Ghc-devops-group mailing list >> Ghc-devops-group at haskell.org >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group >> > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From m at tweag.io Mon Oct 23 22:52:00 2017 From: m at tweag.io (Boespflug, Mathieu) Date: Tue, 24 Oct 2017 00:52:00 +0200 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: On 24 October 2017 at 00:43, Simon Peyton Jones wrote: > > [...] > > 10 months is long. I thought we were aiming for 6 months max, so as to reduce pressure to "just get this in"? > > If we fork 4 months before release, does that mean that nothing makes it into the release branch except bug fixes? If we have better CI do we really need 4 months to stablise? My thoughts exactly. I think a key ingredient here will be having a Stackage channel that tracks HEAD, as Manuel and I proposed at HIW, and Herbert's head.hackage, as we've heard about recently. From manuel.chakravarty at tweag.io Tue Oct 24 02:01:23 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 24 Oct 2017 13:01:23 +1100 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: Sorry, I meant to write 8.4 and 8.6… > Manuel M T Chakravarty : > > One of the stated goals of our effort is to move to calendar-based 6-monthly releases of GHC. To this end, I had a conversation with Ben a while ago, where discussed the following schedule for v6.4 & v6.6: > > 6.4 release planned for Feb > - Branch in Nov > > 6.6 release ought to then be in Aug > - Branch in May > > Pre-release schedule > - On cutting the branch, alpha release > - Then, a beta every two weeks until 4 weeks before the targeted release date > - RC1 four weeks before the targeted release date > - RC2 two weeks before the targeted release date > > For this to be realistic, we do need to have the automatic release artefact building in place by the time of cutting the v6.4 branch. This doesn’t leave us much time to get this up and running. > > Ben, this also requires us to settle > > https://github.com/snowleopard/hadrian/issues/440 > > soon. > > We need to discuss a policy of what can go into the release branch after it has been cut. IMHO, it cannot be major features, but only small changes and fixes until RC1. Then, only fixes. > > What do you all think? > > Cheers, > Manuel > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 24 03:08:55 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 24 Oct 2017 14:08:55 +1100 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: <59B061EE-C959-4F91-ABBD-4E9F423A33C6@tweag.io> Hi Michael, Thanks for bringing up these points. As you, and also Mathieu highlight, we need to think carefully about the exact process and the guarantees that releases come with. The overarching goal here is to get more predictable GHC releases, both in terms of when releases happen and how solid they are — and, we should add to that, as you rightly point out, what changes they bring. This is why I initiated this discussion, to ensure that we end up with something that helps everybody, GHC developers, package authors, and GHC users. > 23.10.2017, 19:55, Michael Snoyman : > > I'm likely going to be a contrary opinion here: I'd prefer _less_ frequent GHC releases, not more frequent. Or at the very least: if we're talking about major releases with intentional breakage, I see multiple problems: > > * Many libraries are subscribing to a "3 versions" support policy. Having a breaking release every 6 months means that libraries will be phasing out support for a GHC release after only 1.5 years. Yes, Ben raised this point earlier as well. If we do more releases, we need to adapt this policy such that we again end up with about 3 years support; e.g., for a release every 6 month, we would need to have a 6 versions support policy. > * From a Stackage perspective, I see how long it takes for the ecosystem to catch up to a new GHC release. Depending on what it means for the ecosystem to have caught up, I’d put that number anywhere from 2-4 months. Yes, and that is something that I really like to see changing. Given that you have the greatest insight into Stackage of all of us, I’d be really keen to hear your opinion on the reasons for that. I actually think that the three main factors are (1) the long time between GHC releases, (2) the number of problems with RCs and the first release, and (3) the lack of CI, in generally, and automated testing against packages, in particular. Re (1), this leads more fundamental and multiple changes in GHC; i.e., more breakage per package and release. This leads to dread on the package authors side. Re (2), for many package authors, the RCs are unusable, due to problems with the RCs and because packages they depend on don’t build. Unfortunately, even the first release often has similar problems. Re (3), feedback about what things a GHC release breaks gets to package authors way too late. This is compounded by the currently very long time between releases. What do you think? > Personally, I'd be much more interested in seeing: > > * More frequent point releases, with more guarantees of backwards compatibility between releases. We've had situations in the past where point releases have broken a significant number of packages, and it would be great if we could work towards avoiding that with automated testing. > * Perhaps some plans allowing for introducing new functionality in point releases without breaking backwards compatibility. > > From the standpoints of a package author, a tooling maintainer, and someone helping companies with commercial rollouts of Haskell code, I've grown to fear GHC releases. I’d rather we fix those problems before increasing frequency. You write that you ”fear GHC releases”; so, let’s work towards changing the GHC release process such that you don’t fear them anymore. I guess, we all agree that we want more automated testing to avoid a whole range of breakage. This includes breaking changes in point releases, as you write, and also having initial release candidates that are of such high quality that package authors will test against them without dread. In particular, as Mathieu has mentioned, we have been working on a scheme to use Stackage for regression testing of GHC during development. (It’s one thing for packages to break, because of a genuine and planned change of the language or libraries, but any accidental breakage is ultimately avoidable.) However, for the moment, we first need to get the new CI story (which we discussed in a separate thread) set up. This is something, we all want and we will push it forward. If we can take the dread out of GHC releases, I think, more frequent release will help. Here is why. There will be fewer significant changes per release. Hence, fewer packages break and fewer broken package dependencies will have authors wait for fixes in dependencies before they can fix their packages. This coupled with having release candidates earlier in the process, makes me hopeful that we can have a stable package ecosystem on top of a new compiler release shortly after its release. (Once we regression test against Stackage, maybe even at the same time as the GHC release.) Cheers, Manuel PS: BTW, somewhat related to this, there is currently a discussion on the GHC Steering Committee about regulating the stability of LANGUAGE extensions. I think, this has a clear relation to our discussion here: https://github.com/ghc-proposals/ghc-proposals/pull/85 (please do voice your opinion on this proposal thread) > On Mon, Oct 23, 2017 at 8:49 AM, Manuel M T Chakravarty > wrote: > One of the stated goals of our effort is to move to calendar-based 6-monthly releases of GHC. To this end, I had a conversation with Ben a while ago, where discussed the following schedule for v6.4 & v6.6: > > 6.4 release planned for Feb > - Branch in Nov > > 6.6 release ought to then be in Aug > - Branch in May > > Pre-release schedule > - On cutting the branch, alpha release > - Then, a beta every two weeks until 4 weeks before the targeted release date > - RC1 four weeks before the targeted release date > - RC2 two weeks before the targeted release date > > For this to be realistic, we do need to have the automatic release artefact building in place by the time of cutting the v6.4 branch. This doesn’t leave us much time to get this up and running. > > Ben, this also requires us to settle > > https://github.com/snowleopard/hadrian/issues/440 > > soon. > > We need to discuss a policy of what can go into the release branch after it has been cut. IMHO, it cannot be major features, but only small changes and fixes until RC1. Then, only fixes. > > What do you all think? > > Cheers, > Manuel > > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.chakravarty at tweag.io Tue Oct 24 03:11:27 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 24 Oct 2017 14:11:27 +1100 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: <40BA990C-FC40-42F2-A82E-59FF8611E544@tweag.io> Yes, I will start a wiki page on that. Manuel > Boespflug, Mathieu : > > Hi Manuel, > > whoa 6.4... throwback from the past! ;) > > The main point of more frequent releases is to get new features in the > hand of users faster. This in turn leads to a more reliable process, > because there is less pressure to integrate features late in the > release cycle. > > If I understand the proposed release calendar, the lifecycle of a new > feature is: > > * Get a Diff reviewed and merged into master. > * Merge needs to happen 3-4 months (according to the dates you > mention) before the next release. > * If the merge happens immediately after the release branch is cut, > then the feature technically has to wait out up to 10 months (if on a > 6 month release cycle) before making it into a release. > > Am I understanding this right? > > It would be nice to have a fleshed out branching model on a wiki page. > > > On 23 October 2017 at 07:49, Manuel M T Chakravarty > wrote: >> One of the stated goals of our effort is to move to calendar-based 6-monthly >> releases of GHC. To this end, I had a conversation with Ben a while ago, >> where discussed the following schedule for v6.4 & v6.6: >> >> 6.4 release planned for Feb >> - Branch in Nov >> >> 6.6 release ought to then be in Aug >> - Branch in May >> >> Pre-release schedule >> - On cutting the branch, alpha release >> - Then, a beta every two weeks until 4 weeks before the targeted release >> date >> - RC1 four weeks before the targeted release date >> - RC2 two weeks before the targeted release date >> >> For this to be realistic, we do need to have the automatic release artefact >> building in place by the time of cutting the v6.4 branch. This doesn’t leave >> us much time to get this up and running. >> >> Ben, this also requires us to settle >> >> https://github.com/snowleopard/hadrian/issues/440 >> >> soon. >> >> We need to discuss a policy of what can go into the release branch after it >> has been cut. IMHO, it cannot be major features, but only small changes and >> fixes until RC1. Then, only fixes. >> >> What do you all think? >> >> Cheers, >> Manuel >> >> >> _______________________________________________ >> Ghc-devops-group mailing list >> Ghc-devops-group at haskell.org >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group >> > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From manuel.chakravarty at tweag.io Tue Oct 24 03:27:00 2017 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Tue, 24 Oct 2017 14:27:00 +1100 Subject: [GHC DevOps Group] New release process In-Reply-To: References: Message-ID: <7785A69F-3796-48C1-8F7B-072312E1027F@tweag.io> > Simon Peyton Jones : > > | * Get a Diff reviewed and merged into master. > | * Merge needs to happen 3-4 months (according to the dates you > | mention) before the next release. > | * If the merge happens immediately after the release branch is cut, then > | the feature technically has to wait out up to 10 months (if on a > | 6 month release cycle) before making it into a release. > > 10 months is long. I thought we were aiming for 6 months max, so as to reduce pressure to ”just get this in"? I think, the plan was to branch 3 months before the release, so 9 months. This is significantly better than the current situation if you include the current release delays. > If we fork 4 months before release, does that mean that nothing makes it into the release branch except bug fixes? If we have better CI do we really need 4 months to stablise? This is something we need to discuss and fix. There are various options. A common one is to have beta releases for a while before the first RC. Different projects have different policies, but during beta releases new features are often still allowed in if they don’t destabilise too much. (Usually these are only features that were previously already planned for that release, but didn’t quite make it in time for the branching.) However, under no circumstances will a release be delayed only to wait for a new feature. Then, during RCs, it is strictly only bug fixing. In other words, a rewritten type checker really ought to be in *before* the release branch is cut. However, a new language extension that has no impact when it is not enabled, could well land during beta. We need to determine the exact rules. As Mathieu suggested, I will start a Wiki page, where we refine this until we are all happy. Manuel > S > > | -----Original Message----- > | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] On > | Behalf Of Boespflug, Mathieu > | Sent: 23 October 2017 23:12 > | To: Manuel M T Chakravarty > | Cc: ghc-devops-group at haskell.org > | Subject: Re: [GHC DevOps Group] New release process > | > | Hi Manuel, > | > | whoa 6.4... throwback from the past! ;) > | > | The main point of more frequent releases is to get new features in the > | hand of users faster. This in turn leads to a more reliable process, > | because there is less pressure to integrate features late in the release > | cycle. > | > | If I understand the proposed release calendar, the lifecycle of a new > | feature is: > | > | * Get a Diff reviewed and merged into master. > | * Merge needs to happen 3-4 months (according to the dates you > | mention) before the next release. > | * If the merge happens immediately after the release branch is cut, then > | the feature technically has to wait out up to 10 months (if on a > | 6 month release cycle) before making it into a release. > | > | Am I understanding this right? > | > | It would be nice to have a fleshed out branching model on a wiki page. > | > | > | On 23 October 2017 at 07:49, Manuel M T Chakravarty > | wrote: > | > One of the stated goals of our effort is to move to calendar-based > | > 6-monthly releases of GHC. To this end, I had a conversation with Ben > | > a while ago, where discussed the following schedule for v6.4 & v6.6: > | > > | > 6.4 release planned for Feb > | > - Branch in Nov > | > > | > 6.6 release ought to then be in Aug > | > - Branch in May > | > > | > Pre-release schedule > | > - On cutting the branch, alpha release > | > - Then, a beta every two weeks until 4 weeks before the targeted > | > release date > | > - RC1 four weeks before the targeted release date > | > - RC2 two weeks before the targeted release date > | > > | > For this to be realistic, we do need to have the automatic release > | > artefact building in place by the time of cutting the v6.4 branch. > | > This doesn’t leave us much time to get this up and running. > | > > | > Ben, this also requires us to settle > | > > | > > | > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu > | > b.com%2Fsnowleopard%2Fhadrian%2Fissues%2F440&data=02%7C01%7Csimonpj%40 > | > microsoft.com%7C29cefad3d8874c85cfcf08d51a632d28%7C72f988bf86f141af91a > | > b2d7cd011db47%7C1%7C0%7C636443935617660899&sdata=VtDfZIKncpArPK0etDYFM > | > YCHiqM9LbwAnnco0TuCV7o%3D&reserved=0 > | > > | > soon. > | > > | > We need to discuss a policy of what can go into the release branch > | > after it has been cut. IMHO, it cannot be major features, but only > | > small changes and fixes until RC1. Then, only fixes. > | > > | > What do you all think? > | > > | > Cheers, > | > Manuel > | > > | > > | > _______________________________________________ > | > Ghc-devops-group mailing list > | > Ghc-devops-group at haskell.org > | > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > | > > | _______________________________________________ > | Ghc-devops-group mailing list > | Ghc-devops-group at haskell.org > | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group