Bug in IO libraries when sending data through a pipe?

Jens Petersen petersen@redhat.com
21 Mar 2002 23:20:17 +0900


Hi Volker,

Volker Wysk <post@volker-wysk.de> writes:

> On Mit, 2002-03-20 at 07:00, Jens Petersen wrote:
> > Jens Petersen <petersen@redhat.com> writes:
> > 
> > > > The problem is that the child process doesn't
> > > > receive all the data which the parent sends. It's as
> > > > if "hPutStr vonh txt" sends the data lazily somehow,
> > > > and "hClose vonh" closes the pipe prematurely.
> > > > 
> > > > It varies from run to run exactly which data gets
> > > > through. If I cause the child process to read all
> > > > its input immediately, the problem doesn't seem to
> > > > occur. Normally, it does so gradually, which takes a
> > > > few seconds.
> > > > 
> > > > I'm using GHC 5.02.2
> > > 
> > > Quite possibly could be a bug.  Lazy IO is rather subtle I
> > > think, specially when done across pipes.  I faced some
> > > similar problem with in POpen recently.  You can see how I
> > > solved it (worked round it?) by comparing the latest release
> > > 1.00 with the previous one 0.00.1:
> > > 
> > >         http://www.01.246.ne.jp/~juhp/haskell/popenhs/
> 
> POpen-1.0.0 contains the same bug which I made. It doesn't ensure that
> the values which are needed after the call of forkProcess, before that
> of executeFile, are fully evaluated. So, if they are read lazily from a
> stream, the newly spawned child process reads data from a stream which
> it shares with its parent, making it disappear from the parent's input.
> In this situation, this sure isn't intended.

Perhaps you could give an explicit example?

> Inserting the following lines just before the line "pid <- forkProcess",
> in POpen.hs, would force the corresponding values to be evaluated, so no
> data will be lost.
> 
>     seq (length path) $ seq (sum (map length args)) $ return ()
>     when (isJust env) $ seq (sum (map (\(a,b) -> length a + length b) 
>                                       (fromJust env))) $ return ()
>     when (isJust dir) $ seq (length (fromJust dir)) $ return ()

Hmmm, I don't really see why this is necessary.  Don't the
lazy values of "path", "env" and "dir" just get evaluated
when they're needed here as normal?  (If what you say is
true though it would be simpler just to use "$!" or "!"s for
strict evaluation I guess.)

I would be more worried about the input stream string not
being complete when the input handle is closed.

> I'm also not sure what this part is supposed to do:
> 
>     inr <- if (isJust inpt) then
> 	     do
> 	     (inr', inw) <- createPipe
> 	     hin <- fdToHandle inw
> 	     hPutStr hin $ fromJust inpt
> 	     hClose hin
> 	     return $ Just inr'
> 	    else
> 	    return Nothing

It returns the output end of a pipe containing the input
string if one is given.

> Doesn't it write the input data to a pipe which no process reads
> from..??

Nope, "doTheBusiness" dup2's it to the stdin of the subprocess:

      (outr, outw) <- createPipe
      (errr, errw) <- createPipe
      pid <- forkProcess
      case pid of
        Nothing -> doTheBusiness inr outw errw   -- ***
        Just p -> 
	  do
	  -- close other end of pipes in here
	  when (isJust inr) $
	       fdClose $ fromJust inr
	  fdClose outw
	  fdClose errw
	  hout <- fdToHandle outr
	  outstrm <- hGetContents hout
	  herr <- fdToHandle errr
	  errstrm <- hGetContents herr
	  return (outstrm, errstrm , p)
    where
    doTheBusiness :: 
	Maybe Fd	    -- stdin
	-> Fd		    -- stdout
	-> Fd		    -- stderr
        -> IO (String, String, ProcessID)    -- (stdout, stderr)
    doTheBusiness inr outw errw = 
	do
	maybeChangeWorkingDirectory dir
	when (isJust inr) $
	     dupTo (fromJust inr) stdInput   -- ***
	dupTo outw stdOutput
	dupTo errw stdError
	executeFile path True args env
	-- for typing, should never actually run
	error "executeFile failed!"

Jens