Post-mortem for last-night's GitLab outage
ben at well-typed.com
Fri Mar 19 13:45:41 UTC 2021
It appears that gitlab.haskell.org's GitLab services went down around
nine hours ago (around midnight EST). Surprisingly, the outage appears
to be entirely unrelated to yesterday's upgrade. Rather, the problem was
merely that the docker repository had grown to fill the entirety of the
server's data volume. I have fixed this (and prevented future
occurrences of the same issue) by moving our Docker images to a new
volume. Services should be now once again fully operational.
Disk usage is something that we have struggled with in the past, in part
due the relatively small local disk capacity of our servers and previous
unreliability of our hosting provider's iSCSI block storage
infrastructure. The latter has previously prompted us to avoid using
iSCSI volumes for operation-critical data, while the former has meant
that we had to keep bulk data size like Docker images in careful check,
lest we run out of local storage.
At this point, it has been over half a year since we have experienced
any trouble with iSCSI. For this reason, I have moved the Docker images
back to iSCSI. This should eliminate this failure mode in the future.
Meanwhile the GitLab database remains on local storage, also minimizing
the potential for downtime due to future iSCSI failures.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 487 bytes
Desc: not available
More information about the ghc-devs