[Git][ghc/ghc][master] Handle exceptions from IO manager backend
Marge Bot (@marge-bot)
gitlab at gitlab.haskell.org
Fri Oct 11 07:52:25 UTC 2024
Marge Bot pushed to branch master at Glasgow Haskell Compiler / GHC
Commits:
69960230 by Fabian Thorand at 2024-10-10T19:03:59+00:00
Handle exceptions from IO manager backend
If an IO manager backend throws, it will not actually have registered
the file descriptor. However, at that point, the IO manager state was
already updated to assume the file descriptor is being tracked, leading
to errors and an eventual deadlock down the line as documented in the
issue #21969.
The fix for this is to undo the IO manager state change in case the
backend throws (just as we already do when the backend signals that the
file type is not supported). The exception then bubbles up to user code.
That way we make sure that
1. the bookkeeping state of the IO manager is consistent with the
actions taken by the backend, even in the presence of unexpected
failures, and
2. the error is not silent and visible to user code, making failures
easier to debug.
- - - - -
8 changed files:
- libraries/ghc-internal/src/GHC/Internal/Conc/IO.hs
- libraries/ghc-internal/src/GHC/Internal/Event/Internal.hs
- libraries/ghc-internal/src/GHC/Internal/Event/Internal/Types.hs
- libraries/ghc-internal/src/GHC/Internal/Event/Manager.hs
- libraries/ghc-internal/src/GHC/Internal/Event/Thread.hs
- + testsuite/tests/concurrent/should_run/T21969.hs
- + testsuite/tests/concurrent/should_run/T21969.stdout
- testsuite/tests/concurrent/should_run/all.T
Changes:
=====================================
libraries/ghc-internal/src/GHC/Internal/Conc/IO.hs
=====================================
@@ -176,14 +176,22 @@ threadWaitWriteSTM fd
let killAction = Sync.killThread t
return (waitAction, killAction)
--- | Close a file descriptor in a concurrency-safe way (GHC only). If
--- you are using 'threadWaitRead' or 'threadWaitWrite' to perform
--- blocking I\/O, you /must/ use this function to close file
--- descriptors, or blocked threads may not be woken.
+-- | Close a file descriptor in a concurrency-safe way as far as the runtime
+-- system is concerned (GHC only). If you are using 'threadWaitRead' or
+-- 'threadWaitWrite' to perform blocking I\/O, you /must/ use this function
+-- to close file descriptors, or blocked threads may not be woken.
--
-- Any threads that are blocked on the file descriptor via
-- 'threadWaitRead' or 'threadWaitWrite' will be unblocked by having
-- IO exceptions thrown.
+--
+-- Note that on systems that reuse file descriptors (such as Linux),
+-- using this function on a file descriptor while other threads can still
+-- potentially use it is always prone to race conditions without further
+-- synchronization.
+--
+-- It is recommended to only call @'closeFdWith'@ once no other threads can
+-- use the given file descriptor anymore.
closeFdWith :: (Fd -> IO ()) -- ^ Low-level action that performs the real close.
-> Fd -- ^ File descriptor to close.
-> IO ()
=====================================
libraries/ghc-internal/src/GHC/Internal/Event/Internal.hs
=====================================
@@ -41,6 +41,14 @@ data Backend = forall a. Backend {
-- | Register, modify, or unregister interest in the given events
-- on the given file descriptor.
+ --
+ -- Returns 'True' if the modification succeeded.
+ -- Returns 'False' if this backend does not support
+ -- event notifications on this type of file.
+ --
+ -- If this function throws, the IO manager assumes that the registration
+ -- of the file descriptor failed, so the backend must not throw if the
+ -- registration was successful.
, _beModifyFd :: a
-> Fd -- file descriptor
-> Event -- old events to watch for ('mempty' for new)
@@ -49,6 +57,14 @@ data Backend = forall a. Backend {
-- | Register interest in new events on a given file descriptor, set
-- to be deactivated after the first event.
+ --
+ -- Returns 'True' if the modification succeeded.
+ -- Returns 'False' if this backend does not support
+ -- event notifications on this type of file.
+ --
+ -- If this function throws, the IO manager assumes that the registration
+ -- of the file descriptor failed, so the backend must not throw if the
+ -- registration was successful.
, _beModifyFdOnce :: a
-> Fd -- file descriptor
-> Event -- new events to watch
=====================================
libraries/ghc-internal/src/GHC/Internal/Event/Internal/Types.hs
=====================================
@@ -61,6 +61,11 @@ evtWrite = Event 2
{-# INLINE evtWrite #-}
-- | Another thread closed the file descriptor.
+--
+-- This event is only meant to be used by @'closeFdWith'@ to signal other
+-- threads currently waiting on the same file descriptor that it was closed.
+-- It is not meant to be waited on directly and intentionally not exposed
+-- in the external interface (only @'evtRead'@ and @'evtWrite'@ are).
evtClose :: Event
evtClose = Event 4
{-# INLINE evtClose #-}
=====================================
libraries/ghc-internal/src/GHC/Internal/Event/Manager.hs
=====================================
@@ -312,6 +312,11 @@ step mgr at EventManager{..} = do
-- platform's @select@ or @epoll@ system call, which tend to vary in
-- what sort of fds are permitted. For instance, waiting on regular files
-- is not allowed on many platforms.
+--
+-- This function rethrows exceptions originating from the underlying backend,
+-- for instance due to concurrently closing a file descriptor while it is
+-- just being registered. In that case, it assumes that the registration was
+-- not successful. See #21969.
registerFd_ :: EventManager -> IOCallback -> Fd -> Event -> Lifetime
-> IO (FdKey, Bool)
registerFd_ mgr@(EventManager{..}) cb fd evs lt = do
@@ -327,13 +332,20 @@ registerFd_ mgr@(EventManager{..}) cb fd evs lt = do
el' :: EventLifetime
el' = prevEvs `mappend` el
+
+ -- Used for restoring the old state if registering the FD
+ -- in the backend failed, due to either
+ -- 1. that file type not being supported, or
+ -- 2. the backend throwing an exception
+ undoRegistration = IT.reset fd' oldFdd tbl
case I.elLifetime el' of
-- All registrations want one-shot semantics and this is supported
OneShot | haveOneShot -> do
ok <- I.modifyFdOnce emBackend fd (I.elEvent el')
+ `onException` undoRegistration
if ok
then return (False, True)
- else IT.reset fd' oldFdd tbl >> return (False, False)
+ else undoRegistration >> return (False, False)
-- We don't want or don't support one-shot semantics
_ -> do
@@ -342,10 +354,11 @@ registerFd_ mgr@(EventManager{..}) cb fd evs lt = do
then let newEvs = I.elEvent el'
oldEvs = I.elEvent prevEvs
in I.modifyFd emBackend fd oldEvs newEvs
+ `onException` undoRegistration
else return True
if ok
then return (modify, True)
- else IT.reset fd' oldFdd tbl >> return (False, False)
+ else undoRegistration >> return (False, False)
-- this simulates behavior of old IO manager:
-- i.e. just call the callback if the registration fails.
when (not ok) (cb reg evs)
=====================================
libraries/ghc-internal/src/GHC/Internal/Event/Thread.hs
=====================================
@@ -112,6 +112,13 @@ threadWaitWrite = threadWait evtWrite
-- Any threads that are blocked on the file descriptor via
-- 'threadWaitRead' or 'threadWaitWrite' will be unblocked by having
-- IO exceptions thrown.
+--
+-- Closing file descriptors on one thread while they are still being
+-- used by other threads is always prone to race conditions (since e.g.
+-- on Linux file descriptors can be immediately reused after closing).
+--
+-- It is recommended to only call @'closeFdWith'@ when the file descriptor
+-- can no longer be used by other threads.
closeFdWith :: (Fd -> IO ()) -- ^ Action that performs the close.
-> Fd -- ^ File descriptor to close.
-> IO ()
@@ -154,6 +161,10 @@ closeFdWith close fd = close_loop
close fd `finally` sequence_ (zipWith3 finish mgrs tables cbApps)
pure (pure ())
+-- | Wait for an event on a file descriptor.
+--
+-- The given @'Event'@ may only be (a combination of) @'evtRead'@ or
+-- @'evtWrite'@, but not @'evtClose'@. See @'evtClose'@ for more details.
threadWait :: Event -> Fd -> IO ()
threadWait evt fd = mask_ $ do
m <- newEmptyMVar
=====================================
testsuite/tests/concurrent/should_run/T21969.hs
=====================================
@@ -0,0 +1,100 @@
+{-# LANGUAGE NumericUnderscores #-}
+{-# LANGUAGE ScopedTypeVariables #-}
+module Main where
+
+import Control.Concurrent (ThreadId, forkIO, killThread,
+ threadDelay)
+import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
+import Control.Exception (Exception (..), SomeAsyncException,
+ SomeException, bracket, catch, handle,
+ throwIO)
+import Control.Monad (forM_, unless)
+import GHC.Conc.IO (threadWaitRead, threadWaitWrite)
+
+import qualified Data.ByteString as BS
+import GHC.IO.FD (FD (..))
+import GHC.IO.Handle.FD (handleToFd)
+import System.Environment (getArgs)
+import System.IO (BufferMode (NoBuffering), Handle,
+ hClose, hSetBuffering)
+import System.IO.Error (isResourceVanishedError)
+import System.Posix (Fd (..))
+import System.Posix.IO (createPipe, fdToHandle)
+
+main :: IO ()
+main = do
+ [iterations] <- getArgs
+ -- Usually 1000-2000 tries are enough to lock up, do even more just in case
+ forM_ [1 :: Int .. read iterations] $ \_ -> do
+ bracket setupPipes closePipes $ \(readH, writeH) -> do
+ let
+ handler e
+ | Just as <- fromException e = throwIO (as :: SomeAsyncException)
+ | otherwise = pure ()
+
+ withThread (writer writeH) $ \_ ->
+ reader readH `catch` handler
+
+
+-- | Return a @(read, write)@ handle pair for an anonymous pipe.
+setupPipes :: IO (Handle, Handle)
+setupPipes = do
+ (readFd, writeFd) <- createPipe
+ (,)
+ <$> fdToHandle readFd -- fdToHandle' readFd Nothing True ("read fd " <> show readFd) ReadMode True
+ <*> fdToHandle writeFd -- fdToHandle' writeFd Nothing True ("write fd " <> show writeFd) WriteMode True
+
+-- | Close the handles returned by 'setupPipes'.
+closePipes :: (Handle, Handle) -> IO ()
+closePipes (readH, writeH) = do
+ dropResourceVanishedError $ hClose readH
+ dropResourceVanishedError $ hClose writeH
+
+reader :: Handle -> IO ()
+reader readHandle = do
+ let
+ drain = do
+ fd <- handleToFd readHandle
+ threadWaitRead (Fd (fdFD fd))
+ msg <- BS.hGetSome readHandle 1024
+ unless (BS.null msg) drain
+
+ -- The MVar ensures we more or less simultaneously start reading and closing, increasing the
+ -- chance of hitting the race condition
+ readingBarrier <- newEmptyMVar
+ _ <- forkIO $ do
+ takeMVar readingBarrier
+ hClose readHandle `catch` \(_ :: SomeException) -> pure ()
+
+ putMVar readingBarrier ()
+ drain
+
+withThread :: IO () -> (ThreadId -> IO r) -> IO r
+withThread bgAction mainAction = do
+ bracket
+ (forkIO bgAction)
+ killThread
+ mainAction
+
+-- | Something to keep the reader busy reading.
+writer :: Handle -> IO ()
+writer writeHandle = do
+ hSetBuffering writeHandle NoBuffering
+ let
+ loop = do
+ fd <- handleToFd writeHandle
+ threadWaitWrite (Fd (fdFD fd))
+ BS.hPut writeHandle $ BS.replicate 1024 65
+ -- We need a short delay so that the reader actually needs to wait for data to be present.
+ -- Only then can we trigger the epoll registration race condition.
+ threadDelay 10_000
+ loop
+
+ dropResourceVanishedError loop
+
+-- | Ignore broken pipe errors
+dropResourceVanishedError :: IO () -> IO ()
+dropResourceVanishedError = handle $ \err ->
+ if isResourceVanishedError err
+ then pure ()
+ else throwIO err
=====================================
testsuite/tests/concurrent/should_run/T21969.stdout
=====================================
=====================================
testsuite/tests/concurrent/should_run/all.T
=====================================
@@ -265,6 +265,16 @@ test('T21651',
],
compile_and_run, [''])
+test('T21969',
+ [ only_ways(['threaded1','threaded2', 'nonmoving_thr', 'profthreaded']),
+ when(opsys('mingw32'),skip), # uses POSIX pipes
+ extra_run_opts('50000'),
+ run_timeout_multiplier(0.3), # default timeout seems to be 300, but lockups happen quickly
+ req_target_smp,
+ req_ghc_smp
+ ],
+ compile_and_run, [''])
+
test('hs_try_putmvar001',
[
when(opsys('mingw32'),skip), # uses pthread APIs in the C code
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/69960230cc2c04a8b554ae8b7ebb85626749bf45
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/69960230cc2c04a8b554ae8b7ebb85626749bf45
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241011/f13b4f8b/attachment-0001.html>
More information about the ghc-commits
mailing list