Behaviour of readProcess, and code simplicity (or lack thereof)

Fri Nov 15 21:59:46 UTC 2013

Hi possibly-interested souls,

In this patch to System.Process.readProcess from last year:

https://ghc.haskell.org/trac/ghc/changeset/b5ee908863882d18e4110d96b43ccc1bb5068ceb/process

Bas changed things so that if an async exception is received by the
thread calling readProcess (or readProcessWithExitCode) then the
external process gets forcibly terminated. I do mean forcibly, it's
TerminateProcess() on Windows and SIGKILL (not SIGTERM) on Unix.

I was wondering if this is really desirable behaviour. There are
certainly programs out there that will leave stale lock files (or worse)
if forcibly terminated, that can shut down reasonably cleanly if asked
more nicely.

I have to say, it does make me a bit nervous.

That said, terminating programs cleanly is a tricky business. While
there is SIGTERM on Unix, there's no simple way to do it on Windows;
there's no softer equivalent of TerminateProcess().

But if we think it *is* the right behaviour then I think we ought to be
doing it consistently. There are other System.Process functions that run
processes synchronously that should have the same behaviour.

And it turns out that cleaning up is rather subtle. Takano recently sent
in a patch to fix a deadlock due to the order in which the cleanup
happens:

https://ghc.haskell.org/trac/ghc/changeset/32223a9ab174c22e939c81e24b6f48223c7cb020/process

So if it is desirable behaviour and is so subtle we ought to provide the
users with some help to do it (or indeed to do alternatives if we think
no one policy fits all).

For example, we could provide:

-- | A 'bracketOnError'-style resource handler for 'createProcess'.
--
-- In normal operation it adds nothing, you are still responsible for
-- waiting for (or forcing) process termination and closing any
-- 'Handle's. It only does automatic cleanup if there is an exception.
-- If there is an exception in the body then it ensures that the process
-- gets terminated and any 'CreatePipe' 'Handle's are closed. In
-- particular this means that if the Haskell thread is killed (e.g.
-- 'killThread'), that the external process is also terminated.
--
withCreateProcess
  :: CreateProcess
  -> (Maybe Handle -> Maybe Handle -> Maybe Handle
                   -> ProcessHandle -> IO a)
  -> IO a
withCreateProcess c action =
    bracketOnError
      (createProcess c) cleanupProcess
      (\(m_in, m_out, m_err, ph) -> action m_in m_out m_err ph)

cleanupProcess
  :: (Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle)
  -> IO ExitCode
cleanupProcess (mb_stdin, mb_stdout, mb_stderr, ph) = do
  -- We rely on the fact that terminateProcess really does terminate it
  -- and not just ask it nicely. Otherwise we could deadlock if there
  -- are other threads holding the lock on the stdin/out/err Handles.
  terminateProcess ph
  maybe (return ()) hClose mb_stdin
  maybe (return ()) hClose mb_stdout
  maybe (return ()) hClose mb_stderr
  waitForProcess ph

As it is, the docs say:

        'readProcess' and 'readProcessWithExitCode' are fairly simple
        wrappers around 'createProcess'.  Constructing variants of these
        functions is quite easy: follow the link to the source code to
        see how 'readProcess' is implemented.

but if you do look at the source you see that while it may originally
have been true, it certainly is not anymore! Providing and using a
withCreateProcess-style would help quite a bit.

Also, the ignoring EPIPE stuff is rather unsatisfactory and
non-portable. It's not used consistently and it's impossible for anyone
else to implement it themselves without using GHC internals. It's almost
possible in a nice way, since you could check if it's ResourceVanished
rather than errno = EPIPE, but actually ResourceVanished is not exposed
via System.IO.Error either.

I think it's worth if at all possible making it true that people can
write useful and reliable special cases of createProcess, but we'd need
to provide more help.