Lost output in multithreaded program
David Hughes
David.W.Hughes@cern.ch
Tue, 29 Jul 2003 12:15:27 +0200
Dear list,
I am puzzled by the behaviour of this (stripped-down, uglified) program.
It is supposed to run a bunch of shell commands simultaneously,
collate their standard output and error, and print their output as
though they had run sequentially.
module Main where
import Control.Concurrent
import Data.Maybe
import GHC.Handle
import GHC.IO
import Posix (popen)
main = do
cmds <- getContents >>= return.lines
outCh <- newChan
mainThr <- myThreadId
forkIO $ do
mapM_ (startCommand outCh) cmds
writeChan outCh $ killThread mainThr
getChanContents outCh >>= sequence_
startCommand :: Chan (IO ()) -> String -> IO ()
startCommand ch cmd = do
-- Prevent lazy reads after forking
length cmd `seq` return ()
(out,err,pid) <- popen "/bin/sh" ["-c",cmd] Nothing
-- Prevent deadlock (waiting to read stdout
-- while the child waits to write stderr).
forkIO $ length err `seq` return () -- culprit line?
writeChan ch $ hPutStr stdout out
writeChan ch $ hFlush stdout
writeChan ch $ hPutStr stderr err
writeChan ch $ hFlush stderr
Most of the time it works but every so often a chunk of child output
gets lost. (I'm sure I've seen it duplicating chunks of output too, but
I haven't been able to isolate a test case.) Corruption of stderr seems
more common than corruption of stdout. The corruption doesn't get more
predictable if I generate a list of commands with fixed 'random' sleeps
and use that (so I don't think it's a gross timing issue). The lost
data is not necessarily a prefix of the output string, nor a suffix.
The lost data does not reappear later in the program's output. The lost
data *does* appear to correspond to byte sequences that are written in a
single write() by the child process.
Example:
$ cat test.sh
#!/bin/sh
echo -n "std" ; sleep 1
echo -n "STD" >&2 ; sleep 1
echo "out" ; sleep 1
echo "ERR" >&2 ; sleep 1
$ for i in `seq 100`; do echo "sleep $((RANDOM % 10)); ./test.sh";
done | ./a.out
<snip>
stdout
STDERR
stdout
ERR
stdout
STDERR
<snip>
stdout
STDERR
stdout
STDstdout
STDERR
stdout
STDERR
<snip>
If I comment the 'culprit line' above then the program runs without
error. This confounds me: I believe that that line should have no
effect except to suck the child's stderr into the program as soon as
possible. (Am I wrong? And is there a better way of doing this?)
Platform: ghc-6.0-7 / RedHat 8.0 / x86.
Compilation: ghc -package posix test.hs
Where did my output go? Have I run into some gotcha like the 'lazy
reads after forking' thing? Can someone tell me what I'm doing wrong?
Hopeful thanks,
// David
--
David Hughes
UNIX sysadmin, Serco SA -+- Tel.: +41 22 767 8997
Computing Centre, CERN -+- David.W.Hughes@cern.ch
This message expresses my own opinions and should not be construed
as the opinions of Serco (who employ me) or of CERN (where I work).