From ben at well-typed.com Sat Feb 3 17:28:47 2018 From: ben at well-typed.com (Ben Gamari) Date: Sat, 03 Feb 2018 12:28:47 -0500 Subject: [GHC DevOps Group] CI effort status In-Reply-To: <87o9lb87wz.fsf@smart-cactus.org> References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> Message-ID: <87k1vu0ypi.fsf@smart-cactus.org> Ben Gamari writes: > Manuel M T Chakravarty writes: > >> Hi Ben, >> >> I meant to post on https://github.com/appveyor/ci/issues/517 >> to request an increased >> limit, but didn’t get around to it yet. If you’d be able to put a >> request on that issue, that’d be great. >> > Sure, I'm on it. > I have created a new GHCAppveyor (due to name length constraints) Appveyor [1] project and configured it to pull from the ghc/ghc GitHub mirror. I have also requested, and was granted, the typical build time limit extension to 90 minutes. Unfortunately, it seems that even 90 minutes is insufficient to even finish a build, much less run the testsuite, under Appveyor's build environment. Given where the build was terminated, I would guess that it would need at least another 10 minutes of compilation to make it to the testsuite. On top of this the testsuite will require another ~35 minutes (as it is quite heavy on process spawning, which is very expensive on Windows). I haven't yet inquired as to whether a further build time extension would be possible. However, I am not hopeful that our plan of using Appveyor will be feasible without purchasing build time. On the CircleCI front, I have been continuing work to clear up the remaining build failures. At this point only two remain: * I have a patch (D4360) to fix T11489 by running our build jobs as an unprivileged user * scc01 appears to be slightly non-deterministic; I am investigating this. Unfortunately the CircleCI infrastructure is still exhibiting a fair amount of flakiness. See, for instance, this build which is shown to be "Cancelled" despite having finished (and having apparently been run at least twice). Judging from the build history, this seems to be a fairly regular occurrence. I have contacted CircleCI about this but have not yet heard back. I am also occassionally seeing rather extreme variance in test times. In particular the linux-llvm target usually completes in around 4 hours 20 minutes, but sometimes takes over 5 hours, resulting in the build timing out. It appears that the build hangs during the testsuite run (e.g. [2]); it's not impossible that this is due to a bug in the testsuite driver but I have been able to reproduce this neither locally nor remotely on CircleCI infrastructure so it has proved to be a tough nut to crack. Cheers, - Ben [1] https://ci.appveyor.com/project/GHCAppveyor/ghc [2] https://circleci.com/gh/ghc/ghc/1558 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From gershomb at gmail.com Mon Feb 5 02:41:36 2018 From: gershomb at gmail.com (Gershom B) Date: Sun, 4 Feb 2018 21:41:36 -0500 Subject: [GHC DevOps Group] CI effort status In-Reply-To: <87k1vu0ypi.fsf@smart-cactus.org> References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: A question from an observer here -- my understanding was that part of the plan with the shift in CI infrastructure was that the burden would be lifted from Ben's exclusive shoulders here and there would be some greater division of labor, which is made possible in part by using shared standard services rather than self-hosted solutions. But at the moment I see reports largely of Ben continuing to try to resolve issues and move this plan forward on his own. Is there still some medium-term plan have a more collective effort in this transition? --Gershom On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: > Ben Gamari writes: > >> Manuel M T Chakravarty writes: >> >>> Hi Ben, >>> >>> I meant to post on https://github.com/appveyor/ci/issues/517 >>> to request an increased >>> limit, but didn’t get around to it yet. If you’d be able to put a >>> request on that issue, that’d be great. >>> >> Sure, I'm on it. >> > I have created a new GHCAppveyor (due to name length constraints) > Appveyor [1] project and configured it to pull from the ghc/ghc GitHub > mirror. I have also requested, and was granted, the typical build time > limit extension to 90 minutes. > > Unfortunately, it seems that even 90 minutes is insufficient to even > finish a build, much less run the testsuite, under Appveyor's build > environment. Given where the build was terminated, I would guess that > it would need at least another 10 minutes of compilation to make it to > the testsuite. On top of this the testsuite will require another ~35 > minutes (as it is quite heavy on process spawning, which is very > expensive on Windows). > > I haven't yet inquired as to whether a further build time extension > would be possible. However, I am not hopeful that our plan of using > Appveyor will be feasible without purchasing build time. > > > On the CircleCI front, I have been continuing work to clear up the > remaining build failures. At this point only two remain: > > * I have a patch (D4360) to fix T11489 by running our build jobs as an > unprivileged user > > * scc01 appears to be slightly non-deterministic; I am investigating > this. > > Unfortunately the CircleCI infrastructure is still exhibiting a fair > amount of flakiness. See, for instance, this build which is shown to be > "Cancelled" despite having finished (and having apparently been run at > least twice). Judging from the build history, this seems to be a fairly > regular occurrence. I have contacted CircleCI about this but have not > yet heard back. > > I am also occassionally seeing rather extreme variance in test times. > In particular the linux-llvm target usually completes in around 4 hours > 20 minutes, but sometimes takes over 5 hours, resulting in the build > timing out. It appears that the build hangs during the testsuite run > (e.g. [2]); it's not impossible that this is due to a bug in the > testsuite driver but I have been able to reproduce this neither locally > nor remotely on CircleCI infrastructure so it has proved to be a tough > nut to crack. > > Cheers, > > - Ben > > [1] https://ci.appveyor.com/project/GHCAppveyor/ghc > [2] https://circleci.com/gh/ghc/ghc/1558 > > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From manuel.chakravarty at tweag.io Mon Feb 5 03:12:32 2018 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 5 Feb 2018 14:12:32 +1100 Subject: [GHC DevOps Group] CI effort status In-Reply-To: <87k1vu0ypi.fsf@smart-cactus.org> References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: Hi Ben, Thanks a lot for the summary. > 04.02.2018 04:28 Ben Gamari : > > Ben Gamari writes: > >> Manuel M T Chakravarty writes: >> >>> Hi Ben, >>> >>> I meant to post on https://github.com/appveyor/ci/issues/517 >>> to request an increased >>> limit, but didn’t get around to it yet. If you’d be able to put a >>> request on that issue, that’d be great. >>> >> Sure, I'm on it. >> > I have created a new GHCAppveyor (due to name length constraints) > Appveyor [1] project and configured it to pull from the ghc/ghc GitHub > mirror. I have also requested, and was granted, the typical build time > limit extension to 90 minutes. > > Unfortunately, it seems that even 90 minutes is insufficient to even > finish a build, much less run the testsuite, under Appveyor's build > environment. Given where the build was terminated, I would guess that > it would need at least another 10 minutes of compilation to make it to > the testsuite. On top of this the testsuite will require another ~35 > minutes (as it is quite heavy on process spawning, which is very > expensive on Windows). > > I haven't yet inquired as to whether a further build time extension > would be possible. However, I am not hopeful that our plan of using > Appveyor will be feasible without purchasing build time. As we previously discussed, if we need to purchase build time, then so be it. However, I have been wondering about the following approach. Given that we want to eventually run the testsuite from a vanilla distribution produced by the build process, would it be feasible to split the building and running the testsuite into two runs? > On the CircleCI front, I have been continuing work to clear up the > remaining build failures. At this point only two remain: > > * I have a patch (D4360) to fix T11489 by running our build jobs as an > unprivileged user > > * scc01 appears to be slightly non-deterministic; I am investigating > this. > > Unfortunately the CircleCI infrastructure is still exhibiting a fair > amount of flakiness. See, for instance, this build which is shown to be > "Cancelled" despite having finished (and having apparently been run at > least twice). Judging from the build history, this seems to be a fairly > regular occurrence. I have contacted CircleCI about this but have not > yet heard back. I’ll ask our devs, what their experience has been lately. Cheers, Manuel > I am also occassionally seeing rather extreme variance in test times. > In particular the linux-llvm target usually completes in around 4 hours > 20 minutes, but sometimes takes over 5 hours, resulting in the build > timing out. It appears that the build hangs during the testsuite run > (e.g. [2]); it's not impossible that this is due to a bug in the > testsuite driver but I have been able to reproduce this neither locally > nor remotely on CircleCI infrastructure so it has proved to be a tough > nut to crack. > > Cheers, > > - Ben > > [1] https://ci.appveyor.com/project/GHCAppveyor/ghc > [2] https://circleci.com/gh/ghc/ghc/1558 > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From manuel.chakravarty at tweag.io Mon Feb 5 03:35:36 2018 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Mon, 5 Feb 2018 14:35:36 +1100 Subject: [GHC DevOps Group] CI effort status In-Reply-To: References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: Hi Gershom, Ben is surely the main actor and he has put considerable effort into this. However, we (Tweag) did help out initially writing some of the original CI configurations. Having said that, it would be absolutely fabulous if other developers could help out. Please let Ben and me know if you know anybody who would be happy to help! Cheers, Manuel > 05.02.2018 13:41 Gershom B : > > A question from an observer here -- my understanding was that part of > the plan with the shift in CI infrastructure was that the burden > would be lifted from Ben's exclusive shoulders here and there would be > some greater division of labor, which is made possible in part by > using shared standard services rather than self-hosted solutions. But > at the moment I see reports largely of Ben continuing to try to > resolve issues and move this plan forward on his own. Is there still > some medium-term plan have a more collective effort in this > transition? > > --Gershom > > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: >> Ben Gamari writes: >> >>> Manuel M T Chakravarty writes: >>> >>>> Hi Ben, >>>> >>>> I meant to post on https://github.com/appveyor/ci/issues/517 >>>> to request an increased >>>> limit, but didn’t get around to it yet. If you’d be able to put a >>>> request on that issue, that’d be great. >>>> >>> Sure, I'm on it. >>> >> I have created a new GHCAppveyor (due to name length constraints) >> Appveyor [1] project and configured it to pull from the ghc/ghc GitHub >> mirror. I have also requested, and was granted, the typical build time >> limit extension to 90 minutes. >> >> Unfortunately, it seems that even 90 minutes is insufficient to even >> finish a build, much less run the testsuite, under Appveyor's build >> environment. Given where the build was terminated, I would guess that >> it would need at least another 10 minutes of compilation to make it to >> the testsuite. On top of this the testsuite will require another ~35 >> minutes (as it is quite heavy on process spawning, which is very >> expensive on Windows). >> >> I haven't yet inquired as to whether a further build time extension >> would be possible. However, I am not hopeful that our plan of using >> Appveyor will be feasible without purchasing build time. >> >> >> On the CircleCI front, I have been continuing work to clear up the >> remaining build failures. At this point only two remain: >> >> * I have a patch (D4360) to fix T11489 by running our build jobs as an >> unprivileged user >> >> * scc01 appears to be slightly non-deterministic; I am investigating >> this. >> >> Unfortunately the CircleCI infrastructure is still exhibiting a fair >> amount of flakiness. See, for instance, this build which is shown to be >> "Cancelled" despite having finished (and having apparently been run at >> least twice). Judging from the build history, this seems to be a fairly >> regular occurrence. I have contacted CircleCI about this but have not >> yet heard back. >> >> I am also occassionally seeing rather extreme variance in test times. >> In particular the linux-llvm target usually completes in around 4 hours >> 20 minutes, but sometimes takes over 5 hours, resulting in the build >> timing out. It appears that the build hangs during the testsuite run >> (e.g. [2]); it's not impossible that this is due to a bug in the >> testsuite driver but I have been able to reproduce this neither locally >> nor remotely on CircleCI infrastructure so it has proved to be a tough >> nut to crack. >> >> Cheers, >> >> - Ben >> >> [1] https://ci.appveyor.com/project/GHCAppveyor/ghc >> [2] https://circleci.com/gh/ghc/ghc/1558 >> >> _______________________________________________ >> Ghc-devops-group mailing list >> Ghc-devops-group at haskell.org >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group >> > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From gershomb at gmail.com Mon Feb 5 03:57:31 2018 From: gershomb at gmail.com (Gershom B) Date: Sun, 4 Feb 2018 21:57:31 -0600 Subject: [GHC DevOps Group] CI effort status In-Reply-To: References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: Let my articulate my question a bit more clearly. Looking at the devops group charter (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says the following about the goals: The mission of the GHC DevOps Group is to * to take leadership of the devops aspects of GHC, * to resource it better, and * to broaden the sense of community ownership and control of GHC. Further it says under “Resources”: "The GHC DevOps Group identifies the ongoing and one-off devops requirements of GHC. It develops and manages the strategies and projects to implement the needed tools, processes, and documentation to meet those requirements. To that end and on the basis of actionable project plans, it seeks to obtain the necessary resources from organisations that rely on GHC as a production-ready tool. By doing this, we aim to unlock more resources than are currently available. At the same time, we seek broad community ownership to minimise the load on any single contributor and to avoid a single point of failure." My concern is at the moment there has been discussion regarding devops aspects, and perhaps a broadened sense of community ownership and control. But I do not see better resourcing, although the initial contributions of CI configurations were certainly a good kickstart. As such, I do not see community ownership in the sense of the latter paragraph — i.e. in the sense that it will “minimise the load on any single contributor” and thus “avoid a single point of failure.” The way this works, as I understand it, is a quid-pro-quo. In order to accomplish goals with regards to regularity of GHC releases, streamlined processes, etc., there needs to be at least some infusion of resources, presumably “unlocked” from "organisations that rely on GHC as a production-ready tool”. Otherwise this quickly becomes expecting a variety of new work from the same cast of characters, just with more voices on a mailinglist chiming in with proposals as to what they would like see accomplished. I am well aware that assembling resources and pulling them together is _hard_, and many attempts to do so founder. I’ve been participant in any number of foundered attempts myself over the years, or attempts that have accomplished a few useful things, but far from even the modest initial goals they set out with. But I do not want this aspect of the DevOps Group charter to fade from consciousness — getting these resources is not automatic. It requires constant shaking of tree branches, and constant attempts to reformulate problems and break them down in ways that make more collaboration amenable — as well as not-infrequent followup on partial commitments or indications towards such in the past, to try to pin down their concrete implementation. What I am seeing right now is that there is a danger of settling into a “new status quo” with no new resources, and I think that would be a not good thing for the future prospects of the DevOps Group, and probably quickly lead to it being yet another stillborn effort. I am not offering anything at the moment here — I have nothing _to_ offer. But this is my attempt to provide a gentle “poke” to all those on the list who thought they might have some ability to play a role in this to please be forthcoming with some proposals as to how they might help. One important aspect to bear in mind is that at this point, it seems to me that money is _not_ the issue. That is to say, if there was a volunteer of some skilled-ops time which had experience wrangling with CI, that would be idea. But if there was a proffer of some such time, but from e.g. some associated contractor who would need to be paid to help out, then it would probably be feasible to fund that as well, from any number of sources. The plan for migrating CI is only partially complete. The working hypothesis is that when it is complete, it will mean less work in the long-run. But in my opinion, dealing with someone else’s flaky boxes (e.g. those of CircleCI) is not much better than dealing with your own flaky boxes, except you have to now bother other people to figure out more stuff for you. So if we could get someone with experience to be a deputy CircleCI ombudsman or the like, and take charge of some aspect of this work, I think we would have a much greater chance of A) success, and B) genuinely distributing the workload more widely. Best, Gershom On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote: > Hi Gershom, > > Ben is surely the main actor and he has put considerable effort into this. However, we (Tweag) did help out initially writing some of the original CI configurations. > > Having said that, it would be absolutely fabulous if other developers could help out. Please let Ben and me know if you know anybody who would be happy to help! > > Cheers, > Manuel > > > 05.02.2018 13:41 Gershom B : > > > > A question from an observer here -- my understanding was that part of > > the plan with the shift in CI infrastructure was that the burden > > would be lifted from Ben's exclusive shoulders here and there would be > > some greater division of labor, which is made possible in part by > > using shared standard services rather than self-hosted solutions. But > > at the moment I see reports largely of Ben continuing to try to > > resolve issues and move this plan forward on his own. Is there still > > some medium-term plan have a more collective effort in this > > transition? > > > > --Gershom > > > > > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: > >> Ben Gamari writes: > >> > >>> Manuel M T Chakravarty writes: > >>> > >>>> Hi Ben, > >>>> > >>>> I meant to post on https://github.com/appveyor/ci/issues/517 > >>>> to request an increased > >>>> limit, but didn’t get around to it yet. If you’d be able to put a > >>>> request on that issue, that’d be great. > >>>> > >>> Sure, I'm on it. > >>> > >> I have created a new GHCAppveyor (due to name length constraints) > >> Appveyor [1] project and configured it to pull from the ghc/ghc GitHub > >> mirror. I have also requested, and was granted, the typical build time > >> limit extension to 90 minutes. > >> > >> Unfortunately, it seems that even 90 minutes is insufficient to even > >> finish a build, much less run the testsuite, under Appveyor's build > >> environment. Given where the build was terminated, I would guess that > >> it would need at least another 10 minutes of compilation to make it to > >> the testsuite. On top of this the testsuite will require another ~35 > >> minutes (as it is quite heavy on process spawning, which is very > >> expensive on Windows). > >> > >> I haven't yet inquired as to whether a further build time extension > >> would be possible. However, I am not hopeful that our plan of using > >> Appveyor will be feasible without purchasing build time. > >> > >> > >> On the CircleCI front, I have been continuing work to clear up the > >> remaining build failures. At this point only two remain: > >> > >> * I have a patch (D4360) to fix T11489 by running our build jobs as an > >> unprivileged user > >> > >> * scc01 appears to be slightly non-deterministic; I am investigating > >> this. > >> > >> Unfortunately the CircleCI infrastructure is still exhibiting a fair > >> amount of flakiness. See, for instance, this build which is shown to be > >> "Cancelled" despite having finished (and having apparently been run at > >> least twice). Judging from the build history, this seems to be a fairly > >> regular occurrence. I have contacted CircleCI about this but have not > >> yet heard back. > >> > >> I am also occassionally seeing rather extreme variance in test times. > >> In particular the linux-llvm target usually completes in around 4 hours > >> 20 minutes, but sometimes takes over 5 hours, resulting in the build > >> timing out. It appears that the build hangs during the testsuite run > >> (e.g. [2]); it's not impossible that this is due to a bug in the > >> testsuite driver but I have been able to reproduce this neither locally > >> nor remotely on CircleCI infrastructure so it has proved to be a tough > >> nut to crack. > >> > >> Cheers, > >> > >> - Ben > >> > >> [1] https://ci.appveyor.com/project/GHCAppveyor/ghc > >> [2] https://circleci.com/gh/ghc/ghc/1558 > >> > >> _______________________________________________ > >> Ghc-devops-group mailing list > >> Ghc-devops-group at haskell.org > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > >> > > _______________________________________________ > > Ghc-devops-group mailing list > > Ghc-devops-group at haskell.org > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > From simonpj at microsoft.com Tue Feb 6 13:31:02 2018 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Tue, 6 Feb 2018 13:31:02 +0000 Subject: [GHC DevOps Group] CI effort status In-Reply-To: References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: Thanks Gershom. I think of the devops group as 1 Broadening "ownership" of GHC's development and release processes, so that a larger group of people feel that they can influence and contribute to GHC's development, and hence feel more comfortable making GHC mission-critical to their business or other plans 2 Making it more likely that what we do with GHC actually matches what GHC's users want 3 Broadening and deepening the pool of stakeholders who are willing to contribute time and/or money to making GHC into the solidly reliable tool that they need. (Currently we have Microsoft, Facebook, IOHK contributing directly, I think.) I think Gershom's message is really about (3). To me, progress on (1) and (2) will help to make the case for (3). But I don’t want to lose sight of (3). The factor that precipitated the devops group's formation was a sudden awareness about how vulnerable we are, as a community, to a very small number supporters. Discussion at ICFP made me think that several other companies would consider making donations, if (a) we had a compelling case that it'd be money well spent, and (b) the actual process worked. For (b) I think some would prefer a central fund; others might prefer a specific task or set of tasks to fund. The discussion on mechanism is a bit stalled I think. We don't currently have a crisis. But I think there may already be things for which application of money might help: e.g. paying for CircleCI cycles rather than spending Ben's time trying to shoehorn everything into the for-free limits. Maybe Appveyor is similar. So I would very much welcome it if the Devops group could take, as an important task (even if it is not day-to-day urgent) task, working out a sustainable model for GHC's maintenance, support, CI, and releases. Simon | -----Original Message----- | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] | On Behalf Of Gershom B | Sent: 05 February 2018 03:58 | To: Manuel Chakravarty | Cc: ghc-devops-group at haskell.org | Subject: Re: [GHC DevOps Group] CI effort status | | Let my articulate my question a bit more clearly. Looking at the | devops group charter | (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says | the following about the goals: | | | The mission of the GHC DevOps Group is to | | * to take leadership of the devops aspects of GHC, | * to resource it better, and | * to broaden the sense of community ownership and control of GHC. | | | Further it says under “Resources”: | | "The GHC DevOps Group identifies the ongoing and one-off devops | requirements of GHC. It develops and manages the strategies and | projects to implement the needed tools, processes, and documentation | to meet those requirements. To that end and on the basis of actionable | project plans, it seeks to obtain the necessary resources from | organisations that rely on GHC as a production-ready tool. By doing | this, we aim to unlock more resources than are currently available. At | the same time, we seek broad community ownership to minimise the load | on any single contributor and to avoid a single point of failure." | | My concern is at the moment there has been discussion regarding devops | aspects, and perhaps a broadened sense of community ownership and | control. | | But I do not see better resourcing, although the initial contributions | of CI configurations were certainly a good kickstart. As such, I do | not see community ownership in the sense of the latter paragraph — | i.e. in the sense that it will “minimise the load on any single | contributor” and thus “avoid a single point of failure.” | | The way this works, as I understand it, is a quid-pro-quo. In order to | accomplish goals with regards to regularity of GHC releases, | streamlined processes, etc., there needs to be at least some infusion | of resources, presumably “unlocked” from "organisations that rely on | GHC as a production-ready tool”. | | Otherwise this quickly becomes expecting a variety of new work from | the same cast of characters, just with more voices on a mailinglist | chiming in with proposals as to what they would like see accomplished. | | I am well aware that assembling resources and pulling them together is | _hard_, and many attempts to do so founder. I’ve been participant in | any number of foundered attempts myself over the years, or attempts | that have accomplished a few useful things, but far from even the | modest initial goals they set out with. | | But I do not want this aspect of the DevOps Group charter to fade from | consciousness — getting these resources is not automatic. It requires | constant shaking of tree branches, and constant attempts to | reformulate problems and break them down in ways that make more | collaboration amenable — as well as not-infrequent followup on partial | commitments or indications towards such in the past, to try to pin | down their concrete implementation. | | What I am seeing right now is that there is a danger of settling into | a “new status quo” with no new resources, and I think that would be a | not good thing for the future prospects of the DevOps Group, and | probably quickly lead to it being yet another stillborn effort. | | I am not offering anything at the moment here — I have nothing _to_ | offer. But this is my attempt to provide a gentle “poke” to all those | on the list who thought they might have some ability to play a role in | this to please be forthcoming with some proposals as to how they might | help. One important aspect to bear in mind is that at this point, it | seems to me that money is _not_ the issue. That is to say, if there | was a volunteer of some skilled-ops time which had experience | wrangling with CI, that would be idea. But if there was a proffer of | some such time, but from e.g. some associated contractor who would | need to be paid to help out, then it would probably be feasible to | fund that as well, from any number of sources. | | The plan for migrating CI is only partially complete. The working | hypothesis is that when it is complete, it will mean less work in the | long-run. But in my opinion, dealing with someone else’s flaky boxes | (e.g. those of CircleCI) is not much better than dealing with your own | flaky boxes, except you have to now bother other people to figure out | more stuff for you. So if we could get someone with experience to be a | deputy CircleCI ombudsman or the like, and take charge of some aspect | of this work, I think we would have a much greater chance of A) | success, and B) genuinely distributing the workload more widely. | | Best, | Gershom | | | On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty | (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote: | | > Hi Gershom, | > | > Ben is surely the main actor and he has put considerable effort into | this. However, we (Tweag) did help out initially writing some of the | original CI configurations. | > | > Having said that, it would be absolutely fabulous if other | developers could help out. Please let Ben and me know if you know | anybody who would be happy to help! | > | > Cheers, | > Manuel | > | > > 05.02.2018 13:41 Gershom B : | > > | > > A question from an observer here -- my understanding was that part | > > of the plan with the shift in CI infrastructure was that the | burden | > > would be lifted from Ben's exclusive shoulders here and there | would | > > be some greater division of labor, which is made possible in part | by | > > using shared standard services rather than self-hosted solutions. | > > But at the moment I see reports largely of Ben continuing to try | to | > > resolve issues and move this plan forward on his own. Is there | still | > > some medium-term plan have a more collective effort in this | > > transition? | > > | > > --Gershom | > > | > > | > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: | > >> Ben Gamari writes: | > >> | > >>> Manuel M T Chakravarty writes: | > >>> | > >>>> Hi Ben, | > >>>> | > >>>> I meant to post on | > >>>> | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F | > >>>> | github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj% | > >>>> | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73 | > >>>> | 4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj | > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0 | > >>>> to request an increased | > >>>> limit, but didn’t get around to it yet. If you’d be able to put | a | > >>>> request on that issue, that’d be great. | > >>>> | > >>> Sure, I'm on it. | > >>> | > >> I have created a new GHCAppveyor (due to name length constraints) | > >> Appveyor [1] project and configured it to pull from the ghc/ghc | > >> GitHub mirror. I have also requested, and was granted, the | typical | > >> build time limit extension to 90 minutes. | > >> | > >> Unfortunately, it seems that even 90 minutes is insufficient to | > >> even finish a build, much less run the testsuite, under | Appveyor's | > >> build environment. Given where the build was terminated, I would | > >> guess that it would need at least another 10 minutes of | compilation | > >> to make it to the testsuite. On top of this the testsuite will | > >> require another ~35 minutes (as it is quite heavy on process | > >> spawning, which is very expensive on Windows). | > >> | > >> I haven't yet inquired as to whether a further build time | extension | > >> would be possible. However, I am not hopeful that our plan of | using | > >> Appveyor will be feasible without purchasing build time. | > >> | > >> | > >> On the CircleCI front, I have been continuing work to clear up | the | > >> remaining build failures. At this point only two remain: | > >> | > >> * I have a patch (D4360) to fix T11489 by running our build jobs | as | > >> an unprivileged user | > >> | > >> * scc01 appears to be slightly non-deterministic; I am | > >> investigating this. | > >> | > >> Unfortunately the CircleCI infrastructure is still exhibiting a | > >> fair amount of flakiness. See, for instance, this build which is | > >> shown to be "Cancelled" despite having finished (and having | > >> apparently been run at least twice). Judging from the build | > >> history, this seems to be a fairly regular occurrence. I have | > >> contacted CircleCI about this but have not yet heard back. | > >> | > >> I am also occassionally seeing rather extreme variance in test | times. | > >> In particular the linux-llvm target usually completes in around 4 | > >> hours | > >> 20 minutes, but sometimes takes over 5 hours, resulting in the | > >> build timing out. It appears that the build hangs during the | > >> testsuite run (e.g. [2]); it's not impossible that this is due to | a | > >> bug in the testsuite driver but I have been able to reproduce | this | > >> neither locally nor remotely on CircleCI infrastructure so it has | > >> proved to be a tough nut to crack. | > >> | > >> Cheers, | > >> | > >> - Ben | > >> | > >> [1] | > >> | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci | > >> | .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj% | > >> | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b | > >> | 0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc | > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0 | > >> [2] | > >> | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci | > >> | rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso | > >> | ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd | > >> | 847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe | > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0 | > >> | > >> _______________________________________________ | > >> Ghc-devops-group mailing list | > >> Ghc-devops-group at haskell.org | > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops- | group | > >> | > > _______________________________________________ | > > Ghc-devops-group mailing list | > > Ghc-devops-group at haskell.org | > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group | > | _______________________________________________ | Ghc-devops-group mailing list | Ghc-devops-group at haskell.org | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From manuel.chakravarty at tweag.io Wed Feb 7 02:34:20 2018 From: manuel.chakravarty at tweag.io (Manuel M T Chakravarty) Date: Wed, 7 Feb 2018 13:34:20 +1100 Subject: [GHC DevOps Group] CI effort status In-Reply-To: References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> Message-ID: <6D861F35-0C50-4BCA-8CF1-0D3EC5871DEA@tweag.io> I would more than welcome concrete offers of resources or suggestions on how to get more resources. Mathieu and I have worked towards getting additional resources since we announced the group, but these things (apparently) take time. We surely could use the help of everybody involved in this group! Cheers, Manuel PS: Just a gentle reminder that several Tweag people (including me) have spent Tweag time on this effort. This certainly doesn’t match the investment of Facebook or Microsoft, but it serves as constructive proof that this is not just for large firms. > 07.02.2018 00:31 Simon Peyton Jones : > > Thanks Gershom. > > I think of the devops group as > > 1 Broadening "ownership" of GHC's development and release processes, > so that a larger group of people feel that they can influence and > contribute to GHC's development, and hence feel more comfortable > making GHC mission-critical to their business or other plans > > 2 Making it more likely that what we do with GHC actually matches > what GHC's users want > > 3 Broadening and deepening the pool of stakeholders who are > willing to contribute time and/or money to making GHC into the > solidly reliable tool that they need. (Currently we have > Microsoft, Facebook, IOHK contributing directly, I think.) > > I think Gershom's message is really about (3). To me, progress on > (1) and (2) will help to make the case for (3). But I don’t want > to lose sight of (3). The factor that precipitated the devops group's > formation was a sudden awareness about how vulnerable we are, as a > community, to a very small number supporters. > > Discussion at ICFP made me think that several other companies would > consider making donations, if (a) we had a compelling case that it'd > be money well spent, and (b) the actual process worked. For (b) I think > some would prefer a central fund; others might prefer a specific task > or set of tasks to fund. The discussion on mechanism is a bit stalled > I think. > > We don't currently have a crisis. But I think there may already be > things for which application of money might help: e.g. paying for CircleCI > cycles rather than spending Ben's time trying to shoehorn everything into > the for-free limits. Maybe Appveyor is similar. > > So I would very much welcome it if the Devops group could take, as an > important task (even if it is not day-to-day urgent) task, working out a > sustainable model for GHC's maintenance, support, CI, and releases. > > Simon > > > | -----Original Message----- > | From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org] > | On Behalf Of Gershom B > | Sent: 05 February 2018 03:58 > | To: Manuel Chakravarty > | Cc: ghc-devops-group at haskell.org > | Subject: Re: [GHC DevOps Group] CI effort status > | > | Let my articulate my question a bit more clearly. Looking at the > | devops group charter > | (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says > | the following about the goals: > | > | > | The mission of the GHC DevOps Group is to > | > | * to take leadership of the devops aspects of GHC, > | * to resource it better, and > | * to broaden the sense of community ownership and control of GHC. > | > | > | Further it says under “Resources”: > | > | "The GHC DevOps Group identifies the ongoing and one-off devops > | requirements of GHC. It develops and manages the strategies and > | projects to implement the needed tools, processes, and documentation > | to meet those requirements. To that end and on the basis of actionable > | project plans, it seeks to obtain the necessary resources from > | organisations that rely on GHC as a production-ready tool. By doing > | this, we aim to unlock more resources than are currently available. At > | the same time, we seek broad community ownership to minimise the load > | on any single contributor and to avoid a single point of failure." > | > | My concern is at the moment there has been discussion regarding devops > | aspects, and perhaps a broadened sense of community ownership and > | control. > | > | But I do not see better resourcing, although the initial contributions > | of CI configurations were certainly a good kickstart. As such, I do > | not see community ownership in the sense of the latter paragraph — > | i.e. in the sense that it will “minimise the load on any single > | contributor” and thus “avoid a single point of failure.” > | > | The way this works, as I understand it, is a quid-pro-quo. In order to > | accomplish goals with regards to regularity of GHC releases, > | streamlined processes, etc., there needs to be at least some infusion > | of resources, presumably “unlocked” from "organisations that rely on > | GHC as a production-ready tool”. > | > | Otherwise this quickly becomes expecting a variety of new work from > | the same cast of characters, just with more voices on a mailinglist > | chiming in with proposals as to what they would like see accomplished. > | > | I am well aware that assembling resources and pulling them together is > | _hard_, and many attempts to do so founder. I’ve been participant in > | any number of foundered attempts myself over the years, or attempts > | that have accomplished a few useful things, but far from even the > | modest initial goals they set out with. > | > | But I do not want this aspect of the DevOps Group charter to fade from > | consciousness — getting these resources is not automatic. It requires > | constant shaking of tree branches, and constant attempts to > | reformulate problems and break them down in ways that make more > | collaboration amenable — as well as not-infrequent followup on partial > | commitments or indications towards such in the past, to try to pin > | down their concrete implementation. > | > | What I am seeing right now is that there is a danger of settling into > | a “new status quo” with no new resources, and I think that would be a > | not good thing for the future prospects of the DevOps Group, and > | probably quickly lead to it being yet another stillborn effort. > | > | I am not offering anything at the moment here — I have nothing _to_ > | offer. But this is my attempt to provide a gentle “poke” to all those > | on the list who thought they might have some ability to play a role in > | this to please be forthcoming with some proposals as to how they might > | help. One important aspect to bear in mind is that at this point, it > | seems to me that money is _not_ the issue. That is to say, if there > | was a volunteer of some skilled-ops time which had experience > | wrangling with CI, that would be idea. But if there was a proffer of > | some such time, but from e.g. some associated contractor who would > | need to be paid to help out, then it would probably be feasible to > | fund that as well, from any number of sources. > | > | The plan for migrating CI is only partially complete. The working > | hypothesis is that when it is complete, it will mean less work in the > | long-run. But in my opinion, dealing with someone else’s flaky boxes > | (e.g. those of CircleCI) is not much better than dealing with your own > | flaky boxes, except you have to now bother other people to figure out > | more stuff for you. So if we could get someone with experience to be a > | deputy CircleCI ombudsman or the like, and take charge of some aspect > | of this work, I think we would have a much greater chance of A) > | success, and B) genuinely distributing the workload more widely. > | > | Best, > | Gershom > | > | > | On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty > | (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote: > | > | > Hi Gershom, > | > > | > Ben is surely the main actor and he has put considerable effort into > | this. However, we (Tweag) did help out initially writing some of the > | original CI configurations. > | > > | > Having said that, it would be absolutely fabulous if other > | developers could help out. Please let Ben and me know if you know > | anybody who would be happy to help! > | > > | > Cheers, > | > Manuel > | > > | > > 05.02.2018 13:41 Gershom B : > | > > > | > > A question from an observer here -- my understanding was that part > | > > of the plan with the shift in CI infrastructure was that the > | burden > | > > would be lifted from Ben's exclusive shoulders here and there > | would > | > > be some greater division of labor, which is made possible in part > | by > | > > using shared standard services rather than self-hosted solutions. > | > > But at the moment I see reports largely of Ben continuing to try > | to > | > > resolve issues and move this plan forward on his own. Is there > | still > | > > some medium-term plan have a more collective effort in this > | > > transition? > | > > > | > > --Gershom > | > > > | > > > | > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: > | > >> Ben Gamari writes: > | > >> > | > >>> Manuel M T Chakravarty writes: > | > >>> > | > >>>> Hi Ben, > | > >>>> > | > >>>> I meant to post on > | > >>>> > | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F > | > >>>> > | github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj% > | > >>>> > | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73 > | > >>>> > | 4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj > | > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0 > | > >>>> to request an increased > | > >>>> limit, but didn’t get around to it yet. If you’d be able to put > | a > | > >>>> request on that issue, that’d be great. > | > >>>> > | > >>> Sure, I'm on it. > | > >>> > | > >> I have created a new GHCAppveyor (due to name length constraints) > | > >> Appveyor [1] project and configured it to pull from the ghc/ghc > | > >> GitHub mirror. I have also requested, and was granted, the > | typical > | > >> build time limit extension to 90 minutes. > | > >> > | > >> Unfortunately, it seems that even 90 minutes is insufficient to > | > >> even finish a build, much less run the testsuite, under > | Appveyor's > | > >> build environment. Given where the build was terminated, I would > | > >> guess that it would need at least another 10 minutes of > | compilation > | > >> to make it to the testsuite. On top of this the testsuite will > | > >> require another ~35 minutes (as it is quite heavy on process > | > >> spawning, which is very expensive on Windows). > | > >> > | > >> I haven't yet inquired as to whether a further build time > | extension > | > >> would be possible. However, I am not hopeful that our plan of > | using > | > >> Appveyor will be feasible without purchasing build time. > | > >> > | > >> > | > >> On the CircleCI front, I have been continuing work to clear up > | the > | > >> remaining build failures. At this point only two remain: > | > >> > | > >> * I have a patch (D4360) to fix T11489 by running our build jobs > | as > | > >> an unprivileged user > | > >> > | > >> * scc01 appears to be slightly non-deterministic; I am > | > >> investigating this. > | > >> > | > >> Unfortunately the CircleCI infrastructure is still exhibiting a > | > >> fair amount of flakiness. See, for instance, this build which is > | > >> shown to be "Cancelled" despite having finished (and having > | > >> apparently been run at least twice). Judging from the build > | > >> history, this seems to be a fairly regular occurrence. I have > | > >> contacted CircleCI about this but have not yet heard back. > | > >> > | > >> I am also occassionally seeing rather extreme variance in test > | times. > | > >> In particular the linux-llvm target usually completes in around 4 > | > >> hours > | > >> 20 minutes, but sometimes takes over 5 hours, resulting in the > | > >> build timing out. It appears that the build hangs during the > | > >> testsuite run (e.g. [2]); it's not impossible that this is due to > | a > | > >> bug in the testsuite driver but I have been able to reproduce > | this > | > >> neither locally nor remotely on CircleCI infrastructure so it has > | > >> proved to be a tough nut to crack. > | > >> > | > >> Cheers, > | > >> > | > >> - Ben > | > >> > | > >> [1] > | > >> > | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci > | > >> > | .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj% > | > >> > | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b > | > >> > | 0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc > | > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0 > | > >> [2] > | > >> > | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci > | > >> > | rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso > | > >> > | ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd > | > >> > | 847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe > | > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0 > | > >> > | > >> _______________________________________________ > | > >> Ghc-devops-group mailing list > | > >> Ghc-devops-group at haskell.org > | > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops- > | group > | > >> > | > > _______________________________________________ > | > > Ghc-devops-group mailing list > | > > Ghc-devops-group at haskell.org > | > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > | > > | _______________________________________________ > | Ghc-devops-group mailing list > | Ghc-devops-group at haskell.org > | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group > _______________________________________________ > Ghc-devops-group mailing list > Ghc-devops-group at haskell.org > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From simonpj at microsoft.com Wed Feb 7 08:39:25 2018 From: simonpj at microsoft.com (Simon Peyton Jones) Date: Wed, 7 Feb 2018 08:39:25 +0000 Subject: [GHC DevOps Group] CI effort status In-Reply-To: <6D861F35-0C50-4BCA-8CF1-0D3EC5871DEA@tweag.io> References: <87y3kk9kqo.fsf@smart-cactus.org> <4E897B05-1CC5-4BFA-AC53-E8D9DFB4DA3D@tweag.io> <87o9lb87wz.fsf@smart-cactus.org> <87k1vu0ypi.fsf@smart-cactus.org> <6D861F35-0C50-4BCA-8CF1-0D3EC5871DEA@tweag.io> Message-ID: Manuel, my apologies: I should certainly have included Tweag in the list, which I typed too hurriedly. It has been a huge relief to me to have a new /proactive/ source of leadership on GHC. That's a really big contribution. Thank you Tweag! My main point was that that we should seek to widen the group that contributes, whether it's leadership, time, compute resources, or money, so that the burden does not fall too heavily on a few, such as yourself. Simon | -----Original Message----- | From: Manuel M T Chakravarty [mailto:manuel.chakravarty at tweag.io] | Sent: 07 February 2018 02:34 | To: Simon Peyton Jones | Cc: Gershom Bazerman ; ghc-devops- | group at haskell.org | Subject: Re: [GHC DevOps Group] CI effort status | | I would more than welcome concrete offers of resources or suggestions | on how to get more resources. Mathieu and I have worked towards | getting additional resources since we announced the group, but these | things (apparently) take time. We surely could use the help of | everybody involved in this group! | | Cheers, | Manuel | | PS: Just a gentle reminder that several Tweag people (including me) | have spent Tweag time on this effort. This certainly doesn’t match the | investment of Facebook or Microsoft, but it serves as constructive | proof that this is not just for large firms. | | > 07.02.2018 00:31 Simon Peyton Jones : | > | > Thanks Gershom. | > | > I think of the devops group as | > | > 1 Broadening "ownership" of GHC's development and release processes, | > so that a larger group of people feel that they can influence and | > contribute to GHC's development, and hence feel more comfortable | > making GHC mission-critical to their business or other plans | > | > 2 Making it more likely that what we do with GHC actually matches | > what GHC's users want | > | > 3 Broadening and deepening the pool of stakeholders who are willing | > to contribute time and/or money to making GHC into the | > solidly reliable tool that they need. (Currently we have | > Microsoft, Facebook, IOHK contributing directly, I think.) | > | > I think Gershom's message is really about (3). To me, progress on | > (1) and (2) will help to make the case for (3). But I don’t want to | > lose sight of (3). The factor that precipitated the devops group's | > formation was a sudden awareness about how vulnerable we are, as a | > community, to a very small number supporters. | > | > Discussion at ICFP made me think that several other companies would | > consider making donations, if (a) we had a compelling case that it'd | > be money well spent, and (b) the actual process worked. For (b) I | > think some would prefer a central fund; others might prefer a | specific | > task or set of tasks to fund. The discussion on mechanism is a bit | > stalled I think. | > | > We don't currently have a crisis. But I think there may already be | > things for which application of money might help: e.g. paying for | > CircleCI cycles rather than spending Ben's time trying to shoehorn | > everything into the for-free limits. Maybe Appveyor is similar. | > | > So I would very much welcome it if the Devops group could take, as | an | > important task (even if it is not day-to-day urgent) task, working | out | > a sustainable model for GHC's maintenance, support, CI, and | releases. | > | > Simon | > | > | > | -----Original Message----- | > | From: Ghc-devops-group | > | [mailto:ghc-devops-group-bounces at haskell.org] | > | On Behalf Of Gershom B | > | Sent: 05 February 2018 03:58 | > | To: Manuel Chakravarty | > | Cc: ghc-devops-group at haskell.org | > | Subject: Re: [GHC DevOps Group] CI effort status | > | | > | Let my articulate my question a bit more clearly. Looking at the | > | devops group charter | > | (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it | says | > | the following about the goals: | > | | > | | > | The mission of the GHC DevOps Group is to | > | | > | * to take leadership of the devops aspects of GHC, | > | * to resource it better, and | > | * to broaden the sense of community ownership and control of GHC. | > | | > | | > | Further it says under “Resources”: | > | | > | "The GHC DevOps Group identifies the ongoing and one-off devops | > | requirements of GHC. It develops and manages the strategies and | > | projects to implement the needed tools, processes, and | documentation | > | to meet those requirements. To that end and on the basis of | > | actionable project plans, it seeks to obtain the necessary | > | resources from organisations that rely on GHC as a production- | ready | > | tool. By doing this, we aim to unlock more resources than are | > | currently available. At the same time, we seek broad community | > | ownership to minimise the load on any single contributor and to | avoid a single point of failure." | > | | > | My concern is at the moment there has been discussion regarding | > | devops aspects, and perhaps a broadened sense of community | > | ownership and control. | > | | > | But I do not see better resourcing, although the initial | > | contributions of CI configurations were certainly a good | kickstart. | > | As such, I do not see community ownership in the sense of the | > | latter paragraph — i.e. in the sense that it will “minimise the | > | load on any single contributor” and thus “avoid a single point of | failure.” | > | | > | The way this works, as I understand it, is a quid-pro-quo. In | order | > | to accomplish goals with regards to regularity of GHC releases, | > | streamlined processes, etc., there needs to be at least some | > | infusion of resources, presumably “unlocked” from "organisations | > | that rely on GHC as a production-ready tool”. | > | | > | Otherwise this quickly becomes expecting a variety of new work | from | > | the same cast of characters, just with more voices on a | mailinglist | > | chiming in with proposals as to what they would like see | accomplished. | > | | > | I am well aware that assembling resources and pulling them | together | > | is _hard_, and many attempts to do so founder. I’ve been | > | participant in any number of foundered attempts myself over the | > | years, or attempts that have accomplished a few useful things, | but | > | far from even the modest initial goals they set out with. | > | | > | But I do not want this aspect of the DevOps Group charter to fade | > | from consciousness — getting these resources is not automatic. It | > | requires constant shaking of tree branches, and constant attempts | > | to reformulate problems and break them down in ways that make | more | > | collaboration amenable — as well as not-infrequent followup on | > | partial commitments or indications towards such in the past, to | try | > | to pin down their concrete implementation. | > | | > | What I am seeing right now is that there is a danger of settling | > | into a “new status quo” with no new resources, and I think that | > | would be a not good thing for the future prospects of the DevOps | > | Group, and probably quickly lead to it being yet another | stillborn effort. | > | | > | I am not offering anything at the moment here — I have nothing | _to_ | > | offer. But this is my attempt to provide a gentle “poke” to all | > | those on the list who thought they might have some ability to | play | > | a role in this to please be forthcoming with some proposals as to | > | how they might help. One important aspect to bear in mind is that | > | at this point, it seems to me that money is _not_ the issue. That | > | is to say, if there was a volunteer of some skilled-ops time | which | > | had experience wrangling with CI, that would be idea. But if | there | > | was a proffer of some such time, but from e.g. some associated | > | contractor who would need to be paid to help out, then it would | > | probably be feasible to fund that as well, from any number of | sources. | > | | > | The plan for migrating CI is only partially complete. The working | > | hypothesis is that when it is complete, it will mean less work in | > | the long-run. But in my opinion, dealing with someone else’s | flaky | > | boxes (e.g. those of CircleCI) is not much better than dealing | with | > | your own flaky boxes, except you have to now bother other people | to | > | figure out more stuff for you. So if we could get someone with | > | experience to be a deputy CircleCI ombudsman or the like, and | take | > | charge of some aspect of this work, I think we would have a much | > | greater chance of A) success, and B) genuinely distributing the | workload more widely. | > | | > | Best, | > | Gershom | > | | > | | > | On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty | > | (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote: | > | | > | > Hi Gershom, | > | > | > | > Ben is surely the main actor and he has put considerable effort | > | into this. However, we (Tweag) did help out initially writing | some | > | of the original CI configurations. | > | > | > | > Having said that, it would be absolutely fabulous if other | > | developers could help out. Please let Ben and me know if you know | > | anybody who would be happy to help! | > | > | > | > Cheers, | > | > Manuel | > | > | > | > > 05.02.2018 13:41 Gershom B : | > | > > | > | > > A question from an observer here -- my understanding was that | > | part > > of the plan with the shift in CI infrastructure was that | > | the burden > > would be lifted from Ben's exclusive shoulders | here | > | and there would > > be some greater division of labor, which is | > | made possible in part by > > using shared standard services | rather | > | than self-hosted solutions. | > | > > But at the moment I see reports largely of Ben continuing to | > | try to > > resolve issues and move this plan forward on his own. | > | Is there still > > some medium-term plan have a more collective | > | effort in this > > transition? | > | > > | > | > > --Gershom | > | > > | > | > > | > | > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote: | > | > >> Ben Gamari writes: | > | > >> | > | > >>> Manuel M T Chakravarty writes: | > | > >>> | > | > >>>> Hi Ben, | > | > >>>> | > | > >>>> I meant to post on | > | > >>>> | > | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F | > | > >>>> | > | github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj% | > | > >>>> | > | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73 | > | > >>>> | > | 4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj | > | > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0 | > | > >>>> to request an increased | > | > >>>> limit, but didn’t get around to it yet. If you’d be able | to | > | put a > >>>> request on that issue, that’d be great. | > | > >>>> | > | > >>> Sure, I'm on it. | > | > >>> | > | > >> I have created a new GHCAppveyor (due to name length | > | constraints) > >> Appveyor [1] project and configured it to pull | > | from the ghc/ghc > >> GitHub mirror. I have also requested, and | was | > | granted, the typical > >> build time limit extension to 90 | > | minutes. | > | > >> | > | > >> Unfortunately, it seems that even 90 minutes is insufficient | > | to > >> even finish a build, much less run the testsuite, under | > | Appveyor's > >> build environment. Given where the build was | > | terminated, I would > >> guess that it would need at least | another | > | 10 minutes of compilation > >> to make it to the testsuite. On | top | > | of this the testsuite will > >> require another ~35 minutes (as | it | > | is quite heavy on process > >> spawning, which is very expensive | on | > | Windows). | > | > >> | > | > >> I haven't yet inquired as to whether a further build time | > | extension > >> would be possible. However, I am not hopeful that | > | our plan of using > >> Appveyor will be feasible without | > | purchasing build time. | > | > >> | > | > >> | > | > >> On the CircleCI front, I have been continuing work to clear | up | > | the > >> remaining build failures. At this point only two remain: | > | > >> | > | > >> * I have a patch (D4360) to fix T11489 by running our build | > | jobs as > >> an unprivileged user > >> > >> * scc01 appears to | > | be slightly non-deterministic; I am > >> investigating this. | > | > >> | > | > >> Unfortunately the CircleCI infrastructure is still | exhibiting | > | a > >> fair amount of flakiness. See, for instance, this build | > | which is > >> shown to be "Cancelled" despite having finished | (and | > | having > >> apparently been run at least twice). Judging from the | > | build > >> history, this seems to be a fairly regular occurrence. | I | > | have > >> contacted CircleCI about this but have not yet heard | back. | > | > >> | > | > >> I am also occassionally seeing rather extreme variance in | test | > | times. | > | > >> In particular the linux-llvm target usually completes in | > | around 4 > >> hours > >> 20 minutes, but sometimes takes over 5 | > | hours, resulting in the > >> build timing out. It appears that | the | > | build hangs during the > >> testsuite run (e.g. [2]); it's not | > | impossible that this is due to a > >> bug in the testsuite | driver | > | but I have been able to reproduce this > >> neither locally nor | > | remotely on CircleCI infrastructure so it has > >> proved to be a | > | tough nut to crack. | > | > >> | > | > >> Cheers, | > | > >> | > | > >> - Ben | > | > >> | > | > >> [1] | > | > >> | > | | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci | > | > >> | > | | .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj% | > | > >> | > | | 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b | > | > >> | > | | 0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc | > | > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0 | > | > >> [2] | > | > >> | > | | https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci | > | > >> | > | | rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso | > | > >> | > | | ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd | > | > >> | > | | 847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe | > | > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0 | > | > >> | > | > >> _______________________________________________ | > | > >> Ghc-devops-group mailing list | > | > >> Ghc-devops-group at haskell.org | > | > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc- | devops- | > | group | > | > >> | > | > > _______________________________________________ | > | > > Ghc-devops-group mailing list | > | > > Ghc-devops-group at haskell.org | > | > > | > | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group | > | > | > | _______________________________________________ | > | Ghc-devops-group mailing list | > | Ghc-devops-group at haskell.org | > | https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops- | group | > _______________________________________________ | > Ghc-devops-group mailing list | > Ghc-devops-group at haskell.org | > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group From facundo.dominguez at tweag.io Fri Feb 16 16:23:05 2018 From: facundo.dominguez at tweag.io (=?UTF-8?Q?Facundo_Dom=C3=ADnguez?=) Date: Fri, 16 Feb 2018 13:23:05 -0300 Subject: [GHC DevOps Group] CI status update and questions Message-ID: Hello, Increasing the amount of threads in circleci brings the builds+tests near to 2.5 hours. We don't need to split the build in CircleCI. Now we need to fix the failing tests. Appveyor is more limited. We can build the compiler with the quick flavor within the 90 minutes. Is this enough, or do we really need to build an optimized compiler? Second question: should appveyor run all the tests that we already run in circleci? Best, Facundo From ben at well-typed.com Fri Feb 16 22:26:36 2018 From: ben at well-typed.com (Ben Gamari) Date: Fri, 16 Feb 2018 17:26:36 -0500 Subject: [GHC DevOps Group] CI status update and questions In-Reply-To: References: Message-ID: <87r2pkvaeg.fsf@smart-cactus.org> Facundo Domínguez writes: > Hello, > > Increasing the amount of threads in circleci brings the builds+tests > near to 2.5 hours. We don't need to split the build in CircleCI. Now > we need to fix the failing tests. > > Appveyor is more limited. We can build the compiler with the quick > flavor within the 90 minutes. Is this enough, or do we really need to > build an optimized compiler? > If we are going to use our CI infrastructure to build binary distributions then it is necessary to build an optimized compiler. Moreover, I would be quite weary of running our usual CI in anything other than the configuration which it will be deployed in. > Second question: should appveyor run all the tests that we already run > in circleci? > Yes. A large part of the motivation of this rework is to improve our CI coverage. Unfortunately, compilers are notoriously environment-sensitive (see, for instance, #14675) so taking short-cuts here would be quite harmful to that goal. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: