[GHC DevOps Group] CI effort status

Simon Peyton Jones simonpj at microsoft.com
Tue Feb 6 13:31:02 UTC 2018


Thanks Gershom.

I think of the devops group as

1 Broadening "ownership" of GHC's development and release processes,
  so that a larger group of people feel that they can influence and
  contribute to GHC's development, and hence feel more comfortable
  making GHC mission-critical to their business or other plans

2 Making it more likely that what we do with GHC actually matches
  what GHC's users want

3 Broadening and deepening the pool of stakeholders who are
  willing to contribute time and/or money to making GHC into the
  solidly reliable tool that they need.   (Currently we have
  Microsoft, Facebook, IOHK contributing directly, I think.)

I think Gershom's message is really about (3).  To me, progress on
(1) and (2) will help to make the case for (3).  But I don’t want
to lose sight of (3).  The factor that precipitated the devops group's
formation was a sudden awareness about how vulnerable we are, as a 
community, to a very small number supporters.

Discussion at ICFP made me think that several other companies would
consider making donations, if (a) we had a compelling case that it'd
be money well spent, and (b) the actual process worked. For (b) I think
some would prefer a central fund; others might prefer a specific task
or set of tasks to fund.  The discussion on mechanism is a bit stalled
I think.

We don't currently have a crisis.  But I think there may already be
things for which application of money might help: e.g. paying for CircleCI
cycles rather than spending Ben's time trying to shoehorn everything into
the for-free limits.  Maybe Appveyor is similar.

So I would very much welcome it if the Devops group could take, as an
important task (even if it is not day-to-day urgent) task, working out a
sustainable model for GHC's maintenance, support, CI, and releases.

Simon
 

|  -----Original Message-----
|  From: Ghc-devops-group [mailto:ghc-devops-group-bounces at haskell.org]
|  On Behalf Of Gershom B
|  Sent: 05 February 2018 03:58
|  To: Manuel Chakravarty <mchakravarty at me.com>
|  Cc: ghc-devops-group at haskell.org
|  Subject: Re: [GHC DevOps Group] CI effort status
|  
|  Let my articulate my question a bit more clearly. Looking at the
|  devops group charter
|  (https://ghc.haskell.org/trac/ghc/wiki/DevOpsGroupCharter), it says
|  the following about the goals:
|  
|  
|  The mission of the GHC DevOps Group is to
|  
|  * to take leadership of the devops aspects of GHC,
|  * to resource it better, and
|  * to broaden the sense of community ownership and control of GHC.
|  
|  
|  Further it says under “Resources”:
|  
|  "The GHC DevOps Group identifies the ongoing and one-off devops
|  requirements of GHC. It develops and manages the strategies and
|  projects to implement the needed tools, processes, and documentation
|  to meet those requirements. To that end and on the basis of actionable
|  project plans, it seeks to obtain the necessary resources from
|  organisations that rely on GHC as a production-ready tool. By doing
|  this, we aim to unlock more resources than are currently available. At
|  the same time, we seek broad community ownership to minimise the load
|  on any single contributor and to avoid a single point of failure."
|  
|  My concern is at the moment there has been discussion regarding devops
|  aspects, and perhaps a broadened sense of community ownership and
|  control.
|  
|  But I do not see better resourcing, although the initial contributions
|  of CI configurations were certainly a good kickstart. As such, I do
|  not see community ownership in the sense of the latter paragraph —
|  i.e. in the sense that it will “minimise the load on any single
|  contributor” and thus “avoid a single point of failure.”
|  
|  The way this works, as I understand it, is a quid-pro-quo. In order to
|  accomplish goals with regards to regularity of GHC releases,
|  streamlined processes, etc., there needs to be at least some infusion
|  of resources, presumably “unlocked” from "organisations that rely on
|  GHC as a production-ready tool”.
|  
|  Otherwise this quickly becomes expecting a variety of new work from
|  the same cast of characters, just with more voices on a mailinglist
|  chiming in with proposals as to what they would like see accomplished.
|  
|  I am well aware that assembling resources and pulling them together is
|  _hard_, and many attempts to do so founder. I’ve been participant in
|  any number of foundered attempts myself over the years, or attempts
|  that have accomplished a few useful things, but far from even the
|  modest initial goals they set out with.
|  
|  But I do not want this aspect of the DevOps Group charter to fade from
|  consciousness — getting these resources is not automatic. It requires
|  constant shaking of tree branches, and constant attempts to
|  reformulate problems and break them down in ways that make more
|  collaboration amenable — as well as not-infrequent followup on partial
|  commitments or indications towards such in the past, to try to pin
|  down their concrete implementation.
|  
|  What I am seeing right now is that there is a danger of settling into
|  a “new status quo” with no new resources, and I think that would be a
|  not good thing for the future prospects of the DevOps Group, and
|  probably quickly lead to it being yet another stillborn effort.
|  
|  I am not offering anything at the moment here — I have nothing _to_
|  offer. But this is my attempt to provide a gentle “poke” to all those
|  on the list who thought they might have some ability to play a role in
|  this to please be forthcoming with some proposals as to how they might
|  help. One important aspect to bear in mind is that at this point, it
|  seems to me that money is _not_ the issue. That is to say, if there
|  was a volunteer of some skilled-ops time which had experience
|  wrangling with CI, that would be idea. But if there was a proffer of
|  some such time, but from e.g. some associated contractor who would
|  need to be paid to help out, then it would probably be feasible to
|  fund that as well, from any number of sources.
|  
|  The plan for migrating CI is only partially complete. The working
|  hypothesis is that when it is complete, it will mean less work in the
|  long-run. But in my opinion, dealing with someone else’s flaky boxes
|  (e.g. those of CircleCI) is not much better than dealing with your own
|  flaky boxes, except you have to now bother other people to figure out
|  more stuff for you. So if we could get someone with experience to be a
|  deputy CircleCI ombudsman or the like, and take charge of some aspect
|  of this work, I think we would have a much greater chance of A)
|  success, and B) genuinely distributing the workload more widely.
|  
|  Best,
|  Gershom
|  
|  
|  On February 4, 2018 at 10:19:51 PM, Manuel Chakravarty
|  (mchakravarty at me.com(mailto:mchakravarty at me.com)) wrote:
|  
|  > Hi Gershom,
|  >
|  > Ben is surely the main actor and he has put considerable effort into
|  this. However, we (Tweag) did help out initially writing some of the
|  original CI configurations.
|  >
|  > Having said that, it would be absolutely fabulous if other
|  developers could help out. Please let Ben and me know if you know
|  anybody who would be happy to help!
|  >
|  > Cheers,
|  > Manuel
|  >
|  > > 05.02.2018 13:41 Gershom B :
|  > >
|  > > A question from an observer here -- my understanding was that part
|  > > of the plan with the shift in CI infrastructure was that the
|  burden
|  > > would be lifted from Ben's exclusive shoulders here and there
|  would
|  > > be some greater division of labor, which is made possible in part
|  by
|  > > using shared standard services rather than self-hosted solutions.
|  > > But at the moment I see reports largely of Ben continuing to try
|  to
|  > > resolve issues and move this plan forward on his own. Is there
|  still
|  > > some medium-term plan have a more collective effort in this
|  > > transition?
|  > >
|  > > --Gershom
|  > >
|  > >
|  > > On Sat, Feb 3, 2018 at 12:28 PM, Ben Gamari wrote:
|  > >> Ben Gamari writes:
|  > >>
|  > >>> Manuel M T Chakravarty writes:
|  > >>>
|  > >>>> Hi Ben,
|  > >>>>
|  > >>>> I meant to post on
|  > >>>>
|  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
|  > >>>>
|  github.com%2Fappveyor%2Fci%2Fissues%2F517&data=02%7C01%7Csimonpj%
|  > >>>>
|  40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb73
|  > >>>>
|  4b0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=THCgzdMyj
|  > >>>> CjJH554JKyJZ%2FLyHSYhpH7NRBlCQI%2BiFlM%3D&reserved=0
|  > >>>> to request an increased
|  > >>>> limit, but didn’t get around to it yet. If you’d be able to put
|  a
|  > >>>> request on that issue, that’d be great.
|  > >>>>
|  > >>> Sure, I'm on it.
|  > >>>
|  > >> I have created a new GHCAppveyor (due to name length constraints)
|  > >> Appveyor [1] project and configured it to pull from the ghc/ghc
|  > >> GitHub mirror. I have also requested, and was granted, the
|  typical
|  > >> build time limit extension to 90 minutes.
|  > >>
|  > >> Unfortunately, it seems that even 90 minutes is insufficient to
|  > >> even finish a build, much less run the testsuite, under
|  Appveyor's
|  > >> build environment. Given where the build was terminated, I would
|  > >> guess that it would need at least another 10 minutes of
|  compilation
|  > >> to make it to the testsuite. On top of this the testsuite will
|  > >> require another ~35 minutes (as it is quite heavy on process
|  > >> spawning, which is very expensive on Windows).
|  > >>
|  > >> I haven't yet inquired as to whether a further build time
|  extension
|  > >> would be possible. However, I am not hopeful that our plan of
|  using
|  > >> Appveyor will be feasible without purchasing build time.
|  > >>
|  > >>
|  > >> On the CircleCI front, I have been continuing work to clear up
|  the
|  > >> remaining build failures. At this point only two remain:
|  > >>
|  > >> * I have a patch (D4360) to fix T11489 by running our build jobs
|  as
|  > >> an unprivileged user
|  > >>
|  > >> * scc01 appears to be slightly non-deterministic; I am
|  > >> investigating this.
|  > >>
|  > >> Unfortunately the CircleCI infrastructure is still exhibiting a
|  > >> fair amount of flakiness. See, for instance, this build which is
|  > >> shown to be "Cancelled" despite having finished (and having
|  > >> apparently been run at least twice). Judging from the build
|  > >> history, this seems to be a fairly regular occurrence. I have
|  > >> contacted CircleCI about this but have not yet heard back.
|  > >>
|  > >> I am also occassionally seeing rather extreme variance in test
|  times.
|  > >> In particular the linux-llvm target usually completes in around 4
|  > >> hours
|  > >> 20 minutes, but sometimes takes over 5 hours, resulting in the
|  > >> build timing out. It appears that the build hangs during the
|  > >> testsuite run (e.g. [2]); it's not impossible that this is due to
|  a
|  > >> bug in the testsuite driver but I have been able to reproduce
|  this
|  > >> neither locally nor remotely on CircleCI infrastructure so it has
|  > >> proved to be a tough nut to crack.
|  > >>
|  > >> Cheers,
|  > >>
|  > >> - Ben
|  > >>
|  > >> [1]
|  > >>
|  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
|  > >>
|  .appveyor.com%2Fproject%2FGHCAppveyor%2Fghc&data=02%7C01%7Csimonpj%
|  > >>
|  40microsoft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b
|  > >>
|  0c8589bcd847f1c277%7C1%7C0%7C636533998742980043&sdata=mDcBqtT9QibXc
|  > >> ozn%2FWCPxr2mnHAKjPL3uP2mTZRIXt0%3D&reserved=0
|  > >> [2]
|  > >>
|  https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci
|  > >>
|  rcleci.com%2Fgh%2Fghc%2Fghc%2F1558&data=02%7C01%7Csimonpj%40microso
|  > >>
|  ft.com%7C9ccf53542ebf4d6fde6b08d56c4c9ff3%7Cee3303d7fb734b0c8589bcd
|  > >>
|  847f1c277%7C1%7C0%7C636533998742980043&sdata=nXIxvQCdxoV9U8mbsVdLNe
|  > >> gX2cIKgoEUwkIbvFtODAM%3D&reserved=0
|  > >>
|  > >> _______________________________________________
|  > >> Ghc-devops-group mailing list
|  > >> Ghc-devops-group at haskell.org
|  > >> https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-
|  group
|  > >>
|  > > _______________________________________________
|  > > Ghc-devops-group mailing list
|  > > Ghc-devops-group at haskell.org
|  > > https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group
|  >
|  _______________________________________________
|  Ghc-devops-group mailing list
|  Ghc-devops-group at haskell.org
|  https://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devops-group


More information about the Ghc-devops-group mailing list