Lost output in multithreaded program
Simon Marlow
simonmar@microsoft.com
Thu, 31 Jul 2003 10:39:49 +0100
I haven't investigated in detail, but I believe your problem might be
caused by a known bug in the implementation of forkProcess, namely that
when forking from a child thread it doesn't kill the main thread. The
workaround is to fork (in your case call popen) from the main thread,
not a child thread.
Cheers,
Simon
> I am puzzled by the behaviour of this (stripped-down,=20
> uglified) program.=20
> It is supposed to run a bunch of shell commands simultaneously,=20
> collate their standard output and error, and print their output as=20
> though they had run sequentially.
>=20
> module Main where
>=20
> import Control.Concurrent
> import Data.Maybe
> import GHC.Handle
> import GHC.IO
> import Posix (popen)
>=20
> main =3D do
> cmds <- getContents >>=3D return.lines
> outCh <- newChan
> mainThr <- myThreadId
> forkIO $ do
> mapM_ (startCommand outCh) cmds
> writeChan outCh $ killThread mainThr
> getChanContents outCh >>=3D sequence_
>=20
> startCommand :: Chan (IO ()) -> String -> IO ()
> startCommand ch cmd =3D do
> -- Prevent lazy reads after forking
> length cmd `seq` return ()
> (out,err,pid) <- popen "/bin/sh" ["-c",cmd] Nothing
> -- Prevent deadlock (waiting to read stdout
> -- while the child waits to write stderr).
> forkIO $ length err `seq` return () -- culprit line?
> writeChan ch $ hPutStr stdout out
> writeChan ch $ hFlush stdout
> writeChan ch $ hPutStr stderr err
> writeChan ch $ hFlush stderr
>=20
> Most of the time it works but every so often a chunk of child output=20
> gets lost. (I'm sure I've seen it duplicating chunks of=20
> output too, but=20
> I haven't been able to isolate a test case.) Corruption of=20
> stderr seems=20
> more common than corruption of stdout. The corruption=20
> doesn't get more=20
> predictable if I generate a list of commands with fixed=20
> 'random' sleeps=20
> and use that (so I don't think it's a gross timing issue). The lost=20
> data is not necessarily a prefix of the output string, nor a suffix.=20
> The lost data does not reappear later in the program's=20
> output. The lost=20
> data *does* appear to correspond to byte sequences that are=20
> written in a=20
> single write() by the child process.
>=20
> Example:
>=20
> $ cat test.sh
> #!/bin/sh
> echo -n "std" ; sleep 1
> echo -n "STD" >&2 ; sleep 1
> echo "out" ; sleep 1
> echo "ERR" >&2 ; sleep 1
> $ for i in `seq 100`; do echo "sleep $((RANDOM % 10));=20
> ./test.sh";=20
> done | ./a.out
> <snip>
> stdout
> STDERR
> stdout
> ERR
> stdout
> STDERR
> <snip>
> stdout
> STDERR
> stdout
> STDstdout
> STDERR
> stdout
> STDERR
> <snip>
>=20
> If I comment the 'culprit line' above then the program runs without=20
> error. This confounds me: I believe that that line should have no=20
> effect except to suck the child's stderr into the program as soon as=20
> possible. (Am I wrong? And is there a better way of doing this?)
>=20
> Platform: ghc-6.0-7 / RedHat 8.0 / x86.
> Compilation: ghc -package posix test.hs
>=20
> Where did my output go? Have I run into some gotcha like the 'lazy=20
> reads after forking' thing? Can someone tell me what I'm doing wrong?
>=20
> Hopeful thanks,
> // David
> --=20
> David Hughes
> UNIX sysadmin, Serco SA -+- Tel.: +41 22 767 8997
> Computing Centre, CERN -+- David.W.Hughes@cern.ch
>=20
> This message expresses my own opinions and should not be construed
> as the opinions of Serco (who employ me) or of CERN (where I work).
>=20
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>=20
>=20