openFile gives "file is locked" error on Linux when creating a non-existing file

Mon Jul 7 00:12:30 UTC 2025

It is likely this GHC issue: https://gitlab.haskell.org/ghc/ghc/-/issues/18832 .

On Sat, 28 Jun 2025 at 17:56, Harendra Kumar <harendra.kumar at gmail.com> wrote:
>
> We instrumented the GHC locking mechanism as suggested by Viktor and
> deployed the instrumented GHC in CI. We got our first crash after 4 months!
>
> Summary from the preliminary investigation of the crash:
>
> * The problem is not filesystem dependent or inode reuse related.
> * After understanding it better, now I am able to reproduce it on the
> local machine.
> * The problem may be related to hClose getting interrupted by an async
> exception.
>
> The GHC panic message from the instrumented code is as follows:
>
> 2025-06-27T09:11:41.8177372Z lockFile: first lock: pid 21804, tid
> 21806, id 17 dev 2065 ino 262152 write 1
>   2025-06-27T09:11:41.8178053Z FileSystem.Event: internal error:
> close: lock exists: pid 21804, tid 21806, fd 17
> 2025-06-27T09:11:41.8178492Z
> 2025-06-27T09:11:41.8178604Z     (GHC version 9.2.8 for x86_64_unknown_linux)
> 2025-06-27T09:11:41.8179160Z     Please report this as a GHC bug:
> https://www.haskell.org/ghc/reportabug
>
> The message is coming from the "close" system call wrapper that I
> created. It means we are closing the fd without releasing the lock first.
>
> Based on the test logs I figured that the "close" call in the panic message is
> happening from this code:
>
> createFile :: FilePath -> FilePath -> IO ()
> createFile file parent = openFile (parent </> file) WriteMode >>= hClose
>
> Now the interesting thing is that in this particular test, we are
> running two threads. In one thread, we are watching for file system
> events related to this file, and in the other thread we are running
> the createFile code snippet given above. As soon as the file gets
> created we receive a file CREATED event in the first thread and we use
> throwTo to send a ThreadAbort async exception to the createFile thread to
> kill it.
>
> My theory is that if the exception is delivered at a certain point
> during hClose it returns without releasing the lock but closes the
> file. To test that theory I wrote the createFile thread code as shown
> below, so that I keep doing open and close in a loop, to make it more
> likely to interrupt hClose by the exception:
>
> createFile :: FilePath -> FilePath -> IO ()
> createFile file parent = go
>     where
>     go =
>         do
>             h <- openFile (parent </> file) WriteMode
>             hClose h
>             go
>
> Voila! With this code I am able to reproduce it locally now, though it
> takes many runs (in a loop) and more than a few minutes to reproduce.
>
> Any ideas, whether the problem is in hClose or it may be something else?
> Next, I will try a code review of hClose and instrumenting further to narrow
> down the problem.
>
> Thanks,
> Harendra
>
> On Sat, 16 Nov 2024 at 13:02, Viktor Dukhovni <ietf-dane at dukhovni.org> wrote:
> >
> > On Fri, Nov 15, 2024 at 06:45:40PM +0530, Harendra Kumar wrote:
> >
> > > Coming back to this issue after a break. I reviewed the code carefully
> > > and I cannot find anything where we are doing something in the
> > > application code that affects the RTS locking mechanism. Let me walk
> > > through the steps of the test up to failure and what we are doing in
> > > the code. The test output is like this:
> >
> > It is indeed not immediately clear where in your code or in some
> > dependency (including base, GHC, ...) a descriptor that contributes to
> > the RTS file reader/writer count (indexed by (dev, ino)) might be closed
> > without adjusting the count by calling the RTS `unlockFile()` function
> > (via GHC.IO.FD.release).
> >
> > It may be worth noting that GHC does not *reliably* prevent simultaneous
> > open handles for the same underlying file, because handles returned by
> > hDuplicate do not contribute to the count:
> >
> >     demo.hs:
> >         import GHC.IO.Handle (hDuplicate)
> >         import System.IO
> >
> >         main :: IO ()
> >         main = do
> >             fh1 <- dupOpen "/tmp/foo"
> >             fh2 <- dupOpen "/tmp/foo"
> >             writeNow fh1 "abc\n"
> >             writeNow fh2 "def\n"
> >             readFile "/tmp/foo" >>= putStr
> >             hClose fh1
> >             hClose fh2
> >          where
> >             -- Look Mom, no lock!
> >             dupOpen path = do
> >                 fh <- openFile path WriteMode
> >                 hDuplicate fh <* hClose fh
> >
> >             writeNow fh s = hPutStr fh s >> hFlush fh
> >
> >     $ ghc -O demo.hs
> >     [1 of 2] Compiling Main             ( demo.hs, demo.o )
> >     [2 of 2] Linking demo
> >     $ ./demo
> >     def
> >
> > I am not sure that Haskell really should be holding the application's
> > hand in this area, corrupting output files by concurrent writers can
> > just as easily happen by running two independent processes.  But letting
> > go of this guardrail would IIRC be a deviation from the Haskell report,
> > and there are likely applications that depend on this (and don't use
> > hDupicate or equivalent to break the reader/writer tracking).
> >
> > --
> >     Viktor.
> > _______________________________________________
> > ghc-devs mailing list
> > ghc-devs at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs