[GHC DevOps Group] CI effort status

Ben Gamari ben at well-typed.com
Sat Feb 3 17:28:47 UTC 2018


Ben Gamari <ben at well-typed.com> writes:

> Manuel M T Chakravarty <manuel.chakravarty at tweag.io> writes:
>
>> Hi Ben,
>>
>> I meant to post on https://github.com/appveyor/ci/issues/517
>> <https://github.com/appveyor/ci/issues/517> to request an increased
>> limit, but didn’t get around to it yet. If you’d be able to put a
>> request on that issue, that’d be great.
>>
> Sure, I'm on it.
>
I have created a new GHCAppveyor (due to name length constraints)
Appveyor [1] project and configured it to pull from the ghc/ghc GitHub
mirror. I have also requested, and was granted, the typical build time
limit extension to 90 minutes.

Unfortunately, it seems that even 90 minutes is insufficient to even
finish a build, much less run the testsuite, under Appveyor's build
environment. Given where the build was terminated, I would guess that
it would need at least another 10 minutes of compilation to make it to
the testsuite. On top of this the testsuite will require another ~35
minutes (as it is quite heavy on process spawning, which is very
expensive on Windows).

I haven't yet inquired as to whether a further build time extension
would be possible. However, I am not hopeful that our plan of using
Appveyor will be feasible without purchasing build time.


On the CircleCI front, I have been continuing work to clear up the
remaining build failures. At this point only two remain:

 * I have a patch (D4360) to fix T11489 by running our build jobs as an
   unprivileged user

 * scc01 appears to be slightly non-deterministic; I am investigating
   this.

Unfortunately the CircleCI infrastructure is still exhibiting a fair
amount of flakiness. See, for instance, this build which is shown to be
"Cancelled" despite having finished (and having apparently been run at
least twice). Judging from the build history, this seems to be a fairly
regular occurrence. I have contacted CircleCI about this but have not
yet heard back.

I am also occassionally seeing rather extreme variance in test times.
In particular the linux-llvm target usually completes in around 4 hours
20 minutes, but sometimes takes over 5 hours, resulting in the build
timing out. It appears that the build hangs during the testsuite run
(e.g. [2]); it's not impossible that this is due to a bug in the
testsuite driver but I have been able to reproduce this neither locally
nor remotely on CircleCI infrastructure so it has proved to be a tough
nut to crack.

Cheers,

- Ben

[1] https://ci.appveyor.com/project/GHCAppveyor/ghc
[2] https://circleci.com/gh/ghc/ghc/1558
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devops-group/attachments/20180203/d878c0af/attachment.sig>


More information about the Ghc-devops-group mailing list