[GHC DevOps Group] CI effort status
Simon Peyton Jones
simonpj at microsoft.com
Tue Feb 6 13:31:02 UTC 2018
Thanks Gershom.
I think of the devops group as
1 Broadening "ownership" of GHC's development and release processes,
so that a larger group of people feel that they can influence and
contribute to GHC's development, and hence feel more comfortable
making GHC mission-critical to their business or other plans
2 Making it more likely that what we do with GHC actually matches
what GHC's users want
3 Broadening and deepening the pool of stakeholders who are
willing to contribute time and/or money to making GHC into the
solidly reliable tool that they need. (Currently we have
Microsoft, Facebook, IOHK contributing directly, I think.)
I think Gershom's message is really about (3). To me, progress on
(1) and (2) will help to make the case for (3). But I don’t want
to lose sight of (3). The factor that precipitated the devops group's
formation was a sudden awareness about how vulnerable we are, as a
community, to a very small number supporters.
Discussion at ICFP made me think that several other companies would
consider making donations, if (a) we had a compelling case that it'd
be money well spent, and (b) the actual process worked. For (b) I think
some would prefer a central fund; others might prefer a specific task
or set of tasks to fund. The discussion on mechanism is a bit stalled
I think.
We don't currently have a crisis. But I think there may already be
things for which application of money might help: e.g. paying for CircleCI
cycles rather than spending Ben's time trying to shoehorn everything into
the for-free limits. Maybe Appveyor is similar.
So I would very much welcome it if the Devops group could take, as an
important task (even if it is not day-to-day urgent) task, working out a
sustainable model for GHC's maintenance, support, CI, and releases.
Simon
| -----Original Message-----
| From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org]
| On Behalf Of Gershom B
| Sent: 05 February 2018 03:58
| To: Manuel Chakravarty <mchakravarty at me.com>
| Cc: ghc-devops-group at haskell.org
| Subject: Re: [GHC DevOps Group] CI effort status
|
| Let my articulate my question a bit more clearly. Looking at the
| devops group charter
| (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says
| the following about the goals:
|
|
| The mission of the GHC DevOps Group is to
|
| * to take leadership of the devops aspects of GHC,
| * to resource it better, and
| * to broaden the sense of community ownership and control of GHC.
|
|
| Further it says under “Resources”:
|
| "The GHC DevOps Group identifies the ongoing and one-off devops
| requirements of GHC. It develops and manages the strategies and
| projects to implement the needed tools, processes, and documentation
| to meet those requirements. To that end and on the basis of actionable
| project plans, it seeks to obtain the necessary resources from
| organisations that rely on GHC as a production-ready tool. By doing
| this, we aim to unlock more resources than are currently available. At
| the same time, we seek broad community ownership to minimise the load
| on any single contributor and to avoid a single point of failure."
|
| My concern is at the moment there has been discussion regarding devops
| aspects, and perhaps a broadened sense of community ownership and
| control.
|
| But I do not see better resourcing, although the initial contributions
| of CI configurations were certainly a good kickstart. As such, I do
| not see community ownership in the sense of the latter paragraph —
| i.e. in the sense that it will “minimise the load on any single
| contributor” and thus “avoid a single point of failure.”
|
| The way this works, as I understand it, is a quid-pro-quo. In order to
| accomplish goals with regards to regularity of GHC releases,
| streamlined processes, etc., there needs to be at least some infusion
| of resources, presumably “unlocked” from "organisations that rely on
| GHC as a production-ready tool”.
|
| Otherwise this quickly becomes expecting a variety of new work from
| the same cast of characters, just with more voices on a mailinglist
| chiming in with proposals as to what they would like see accomplished.
|
| I am well aware that assembling resources and pulling them together is
| _hard_, and many attempts to do so founder. I’ve been participant in
| any number of foundered attempts myself over the years, or attempts
| that have accomplished a few useful things, but far from even the
| modest initial goals they set out with.
|
| But I do not want this aspect of the DevOps Group charter to fade from
| consciousness — getting these resources is not automatic. It requires
| constant shaking of tree branches, and constant attempts to
| reformulate problems and break them down in ways that make more
| collaboration amenable — as well as not-infrequent followup on partial
| commitments or indications towards such in the past, to try to pin
| down their concrete implementation.
|
| What I am seeing right now is that there is a danger of settling into
| a “new status quo” with no new resources, and I think that would be a
| not good thing for the future prospects of the DevOps Group, and
| probably quickly lead to it being yet another stillborn effort.
|
| I am not offering anything at the moment here — I have nothing _to_
| offer. But this is my attempt to provide a gentle “poke” to all those
| on the list who thought they might have some ability to play a role in
| this to please be forthcoming with some proposals as to how they might
| help. One important aspect to bear in mind is that at this point, it
| seems to me that money is _not_ the issue. That is to say, if there
| was a volunteer of some skilled-ops time which had experience
| wrangling with CI, that would be idea. But if there was a proffer of
| some such time, but from e.g. some associated contractor who would
| need to be paid to help out, then it would probably be feasible to
| fund that as well, from any number of sources.
|
| The plan for migrating CI is only partially complete. The working
| hypothesis is that when it is complete, it will mean less work in the
| long-run. But in my opinion, dealing with someone else’s flaky boxes
| (e.g. those of CircleCI) is not much better than dealing with your own
| flaky boxes, except you have to now bother other people to figure out
| more stuff for you. So if we could get someone with experience to be a
| deputy CircleCI ombudsman or the like, and take charge of some aspect
| of this work, I think we would have a much greater chance of A)
| success, and B) genuinely distributing the workload more widely.
|
| Best,
| Gershom
|
|
| On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty
| (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote:
|
| > Hi Gershom,
| >
| > Ben is surely the main actor and he has put considerable effort into
| this. However, we (Tweag) did help out initially writing some of the
| original CI configurations.
| >
| > Having said that, it would be absolutely fabulous if other
| developers could help out. Please let Ben and me know if you know
| anybody who would be happy to help!
| >
| > Cheers,
| > Manuel
| >
| > > 05.02.2018 13:41 Gershom B :
| > >
| > > A question from an observer here -- my understanding was that part
| > > of the plan with the shift in CI infrastructure was that the
| burden
| > > would be lifted from Ben's exclusive shoulders here and there
| would
| > > be some greater division of labor, which is made possible in part
| by
| > > using shared standard services rather than self-hosted solutions.
| > > But at the moment I see reports largely of Ben continuing to try
| to
| > > resolve issues and move this plan forward on his own. Is there
| still
| > > some medium-term plan have a more collective effort in this
| > > transition?
| > >
| > > --Gershom
| > >
| > >
| > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote:
| > >> Ben Gamari writes:
| > >>
| > >>> Manuel M T Chakravarty writes:
| > >>>
| > >>>> Hi Ben,
| > >>>>
| > >>>> I meant to post on
| > >>>>
| https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
| > >>>>
| github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj%
| > >>>>
| 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73
| > >>>>
| 4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj
| > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0
| > >>>> to request an increased
| > >>>> limit, but didn’t get around to it yet. If you’d be able to put
| a
| > >>>> request on that issue, that’d be great.
| > >>>>
| > >>> Sure, I'm on it.
| > >>>
| > >> I have created a new GHCAppveyor (due to name length constraints)
| > >> Appveyor [1] project and configured it to pull from the ghc/ghc
| > >> GitHub mirror. I have also requested, and was granted, the
| typical
| > >> build time limit extension to 90 minutes.
| > >>
| > >> Unfortunately, it seems that even 90 minutes is insufficient to
| > >> even finish a build, much less run the testsuite, under
| Appveyor's
| > >> build environment. Given where the build was terminated, I would
| > >> guess that it would need at least another 10 minutes of
| compilation
| > >> to make it to the testsuite. On top of this the testsuite will
| > >> require another ~35 minutes (as it is quite heavy on process
| > >> spawning, which is very expensive on Windows).
| > >>
| > >> I haven't yet inquired as to whether a further build time
| extension
| > >> would be possible. However, I am not hopeful that our plan of
| using
| > >> Appveyor will be feasible without purchasing build time.
| > >>
| > >>
| > >> On the CircleCI front, I have been continuing work to clear up
| the
| > >> remaining build failures. At this point only two remain:
| > >>
| > >> * I have a patch (D4360) to fix T11489 by running our build jobs
| as
| > >> an unprivileged user
| > >>
| > >> * scc01 appears to be slightly non-deterministic; I am
| > >> investigating this.
| > >>
| > >> Unfortunately the CircleCI infrastructure is still exhibiting a
| > >> fair amount of flakiness. See, for instance, this build which is
| > >> shown to be "Cancelled" despite having finished (and having
| > >> apparently been run at least twice). Judging from the build
| > >> history, this seems to be a fairly regular occurrence. I have
| > >> contacted CircleCI about this but have not yet heard back.
| > >>
| > >> I am also occassionally seeing rather extreme variance in test
| times.
| > >> In particular the linux-llvm target usually completes in around 4
| > >> hours
| > >> 20 minutes, but sometimes takes over 5 hours, resulting in the
| > >> build timing out. It appears that the build hangs during the
| > >> testsuite run (e.g. [2]); it's not impossible that this is due to
| a
| > >> bug in the testsuite driver but I have been able to reproduce
| this
| > >> neither locally nor remotely on CircleCI infrastructure so it has
| > >> proved to be a tough nut to crack.
| > >>
| > >> Cheers,
| > >>
| > >> - Ben
| > >>
| > >> [1]
| > >>
| https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
| > >>
| .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj%
| > >>
| 40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b
| > >>
| 0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc
| > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0
| > >> [2]
| > >>
| https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
| > >>
| rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso
| > >>
| ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd
| > >>
| 847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe
| > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0
| > >>
| > >> _______________________________________________
| > >> Ghc-devops-group mailing list
| > >> Ghc-devops-group at haskell.org
| > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-
| group
| > >>
| > > _______________________________________________
| > > Ghc-devops-group mailing list
| > > Ghc-devops-group at haskell.org
| > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
| >
| _______________________________________________
| Ghc-devops-group mailing list
| Ghc-devops-group at haskell.org
| https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
More information about the Ghc-devops-group
mailing list