[GHC] #13194: Concurrent modification of package.cache is not safe

Fri Jan 27 16:42:38 UTC 2017

#13194: Concurrent modification of package.cache is not safe
-------------------------------------+-------------------------------------
           Reporter:  arybczak       |             Owner:
               Type:  bug            |            Status:  new
           Priority:  normal         |         Milestone:
          Component:  ghc-pkg        |           Version:  8.0.1
           Keywords:                 |  Operating System:  Unknown/Multiple
       Architecture:                 |   Type of failure:  None/Unknown
  Unknown/Multiple                   |
          Test Case:                 |        Blocked By:
           Blocking:                 |   Related Tickets:
Differential Rev(s):                 |         Wiki Page:
-------------------------------------+-------------------------------------
 There are a couple of different issues here.

 1. On Linux, issuing `ghc-pkg register` for multiple packages in parallel
 might result in lost updates to package database because of how
 `registerPackage` function works - it reads existing package databases,
 picks the one to modify, then checks that package info for the package to
 register is fine and replaces package database with what was read in the
 beginning + new package info.

 Therefore, if updates interleave, it might happen that process1 reads the
 database, then process2 updates it while process1 still has the old
 version and uses it for its update later, so update made by process2 is
 lost.

 2. On Windows, update to package database might fail - the issue is that
 GHC attempts to update it using rename trick, which fails whenever any
 other process has file to be replaced open for reading. Combine that with
 the fact that GHC reads package database when compiling packages and you
 get problems in both Stack
 (https://github.com/commercialhaskell/stack/issues/2617) and Cabal
 (https://github.com/haskell/cabal/issues/4005).

 BTW, rename trick (used for atomic database updates) not only doesn't work
 on Windows, it's also not atomic e.g. on NFS
 (https://stackoverflow.com/questions/41362016/rename-atomicity-and-nfs).

 The solution to both problems is to use OS specific features to lock
 database file (in shared mode when reading and in exclusive mode when
 writing). This can be done on Windows using LockFileEx. Unfortunately for
 POSIX things are a bit more complicated.

 There are two ways to lock a file on Linux:
 1. Using fcntl(F_SET_LK) (POSIX API)

 2. Using flock (BSD API)

 However, fcntl locks have a serious limitation:

        The record locks described above are associated with the process
        (unlike the open file description locks described below).  This has
        some unfortunate consequences:

        *  If a process closes any file descriptor referring to a file,
 then
           all of the process's locks on that file are released, regardless
           of the file descriptor(s) on which the locks were obtained.
 This
           is bad: it means that a process can lose its locks on a file
 such
           as /etc/passwd or /etc/mtab when for some reason a library
           function decides to open, read, and close the same file.

        *  The threads in a process share locks.  In other words, a
           multithreaded program can't use record locking to ensure that
           threads don't simultaneously access the same region of a file.

 Whereas flock is not guaranteed to work with NFS, according to
 https://en.wikipedia.org/wiki/File_locking#Problems:

   Whether and how flock locks work on network filesystems, such as NFS, is
 implementation dependent. On BSD systems, flock calls on a file descriptor
 open to a file on an NFS-mounted partition are successful no-ops. On Linux
 prior to 2.6.12, flock calls on NFS files would act only locally. Kernel
 2.6.12 and above implement flock calls on NFS files using POSIX byte-range
 locks. These locks will be visible to other NFS clients that implement
 fcntl-style POSIX locks, but invisible to those that do not.[4]

 Assuming that the solution would be to go with locking the database, we
 would need to:

 1. In `registerPackage`, lock all read databases in shared mode except for
 the database that will later be modified, which has to be locked in
 exclusive mode. The handle also would need to be kept open and passed to
 `changeDB` later and used for rewriting the database with updated version
 in `GHC.PackageDb.writePackageDb` instead of `writeFileAtomic` (which is
 not actually unconditionally atomic, as demonstrated above).

 2. `GHC.PackageDb.decodeFromFile` would lock a file in appropriate mode
 and return the handle to open file if appropriate.

 3. Add support for locking a file. This should be fairly easy to do in
 GHC.IO.Handle.FD by extending function `openFile'` with appropriate
 parameters and then adding wrapper function `openLockedFile` or something.
 We can add both blocking and non-blocking locking to make ghc-pkg show
 information about waiting for locked package database if appropriate.

 Alternatively we could add a function similar to the following: `hLock ::
 Handle -> LockMode -> Bool{-block-} -> IO Bool`, but that requires
 extracting file descriptor from Handle, which as far as I see is
 problematic.

 Is going with locking an acceptable solution here?

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/13194>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler