[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Thu Feb 11 18:28:23 EST 2010

On Feb 11, 2010, at 1:57 PM, Bardur Arantsson wrote:
>
>> 2. the remote client has terminated the connection as far as it is
>> concerned but not notified the server -- when you try to send data  
>> it will
>> reject it, and send/write/sendfile/etc will raise sigPIPE.
>> Looking at your debug output, we are seeing the sigPIPE / Broken  
>> Pipe error
>> most of the time. But then there is the case where we get stuck on  
>> the
>> threadWaitWrite.
>> threadWaitWrite is ultimately implemented by passing the file  
>> descriptor to
>> the list of write descriptors in a call to select(). It seems,  
>> however, that
>> select() is not waking up just because calling write() on a file  
>> descriptor
>> *would* cause sigPIPE.
>
> That's what I expect select() with an "errfd" FDSET would do.

Nope. The expectfds are only trigger in esoteric conditions. For TCP  
sockets, I think it only occurs if there is out-of-band data available  
to be read via recv() with the MSG_OOB flag.

http://uw714doc.sco.com/en/SDK_netapi/sockC.OoBdata.html

>> The easiest way to confirm this case is probably to write a small,  
>> pure C
>> program and see what really happens.
>> If this is the case, then it means the only way to tell if the  
>> client has
>> abruptly dropped the connection is to actually try sending the data  
>> and see
>> if the sending function calls sigPIPE. And that means doing some  
>> sort of
>> polling/timeout?
>
> Correct, but the trouble is deciding how often to poll and/or how  
> long the timeout should be.
>
> I don't see any easy answer to that. That's why my suggested  
> "solution" is to simply punt it to the OS (by using portable mode)  
> and suck up the extra overhead of the portable solution. Hopefully  
> the new GHC I/O manager will make it possible to have a proper  
> solution.

The whole point of the sendfile library is to use sendfile(), so not  
using sendfile() seems like the wrong solution. I am also not  
convinced that the new GHC I/O manager will do anything new to make it  
possible to have a proper solution. I believe we would be seeing the  
same error even in pure C, so we need to know the work around that  
works in pure C as well. I am not convinced we are punting to the OS  
by using portable mode either (more below).

>> I do not have a good explanation as to why the portable version  
>> does not
>> fail. Except maybe it is just so slow that it does not ever fill up  
>> the
>> buffer, and hence does not get stuck in threadWaitWrite?
>
> The portable version doesn't call threadWaitWrite. It simply turns  
> the Socket into a handle (which causes it to become blocking)  and  
> so the kernel is tasked with handling all the gritty details.

The portable version does not directly call threadWaitWrite, but it  
still calls it.

Data.ByteString.Char8.hPutStr calls
Data.ByteString.hPut which calls
Data.ByteString.hPutBuf which calls
System.IO.hPutBuf which calls
GHC.IO.Handle.Text.hPutBuf which calls
GHC.IO.Handle.bufWrite.Text which calls
GHC.IO.Device.write which calls
GHC.IO.FD.fdWrite which calls
GHC.IO.FD.writeRawBufferPtr which calls

which is defined as:

writeRawBufferPtr :: String -> FD -> Ptr Word8 -> Int -> CSize -> IO  
CInt
writeRawBufferPtr loc !fd buf off len
   | isNonBlocking fd = unsafe_write -- unsafe is ok, it can't block
   | otherwise   = do r <- unsafe_fdReady (fdFD fd) 1 0 0
                      if r /= 0
                         then write
                         else do threadWaitWrite (fromIntegral (fdFD  
fd)); write
   where
     do_write call = fromIntegral `fmap`
                       throwErrnoIfMinus1RetryMayBlock loc call
                         (threadWaitWrite (fromIntegral (fdFD fd)))
     write         = if threaded then safe_write else unsafe_write
     unsafe_write  = do_write (c_write (fdFD fd) (buf `plusPtr` off)  
len)
     safe_write    = do_write (c_safe_write (fdFD fd) (buf `plusPtr`  
off) len)

According to the following test program, I expect that 'isNonBlocking  
fd' will be 'True'. So it seems like the portable solution should be  
vulnerable to the same condition. Perhaps the portable version is just  
so slow that the OS buffers never fill up so EAGAIN is never raised?

-------------------------------------------------------------------------------------------------------

{-# LANGUAGE RecordWildCards #-}
module Main where

import Control.Concurrent (forkIO)
import Control.Monad (forever)
import Network (PortID(PortNumber), Socket, listenOn)
import Network.Socket (accept, socketToHandle)
import System.IO
import qualified GHC.IO.FD as FD
import GHC.IO.Handle.Internals (withHandle, flushWriteBuffer)
import GHC.IO.Handle.Types (Handle__(..), HandleType(..))
import qualified GHC.IO.FD as FD
import System.Posix.Types (Fd(..))
import System.IO.Error
import GHC.IO.Exception
import Data.Typeable (cast)
import GHC.IO.Handle.Internals (wantWritableHandle)

main =
   listen (PortNumber (toEnum 2525)) $ \s ->
      do h <- socketToHandle s ReadWriteMode
         wantWritableHandle "main" h $ \h_ -> showBlocking h_

showBlocking :: Handle__ -> IO ()
showBlocking h_ at Handle__{..} =
   case cast haDevice of
     Nothing -> return ()
     Just fd -> case (FD.fdIsNonBlocking fd) of
                   1 -> putStrLn "is NonBlocking"
                   0 -> putStrLn "is not NonBlocking"

listen :: PortID -> (Socket -> IO ()) -> IO ()
listen port handler =
   do socket <- listenOn port
      forever $ do (s,sa) <- accept socket
                   forkIO $ handler s

-------------------------------------------------------------------------------------------------------

>> Any way, the fundamental question is:
>> When your write buffer is full, and you call select() on that file
>> descriptor, will select() return in the case where calling write()  
>> again
>> would raise sigPIPE?
>
> I believe so, *if* you give it the FD in the exceptfds FD_SET  
> parameter. Let's face it, any other behavior doesn't make any sense  
> since it's the equivalent of forcing all timeout handling onto the  
> user, just like threadWaitWrite currently does. I've written my fair  
> share of networking code in various languages (including C/C++) and  
> I've never seen this problem of "missing wakeups" before.

All my reading seems to indicate that exceptfds won't help anything --  
it is a seldom used feature and doesn't do what people wish it  
actually did. For example, see answer 1 on this page:

http://stackoverflow.com/questions/1342712/nix-select-and-exceptfds-errorfds-semantics

There is some evidence that when you are doing select() on a readfds,  
and the connection is closed, select() will indicate that the fds is  
ready to be read, but when you read it, you get 0-bytes. That  
indicates that a disconnect has happened. However, if you are only  
doing read()/recv(), I expect that only happens in the event of a  
proper disconnect, because if you are just listening for packets,  
there is no way to tell the difference between the sender just not  
saying anything, and the sender dying:

http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#advanced
http://channel9.msdn.com/forums/TechOff/434466-TcpClient-Test-for-Disconnected/
http://bytes.com/topic/python/answers/40278-detecting-shutdown-remot

But I can not find any clear information on what happens when you do  
select() on a write socket, and the remote end abruptly disconnects.  
Consider first the case that does not use select() at all:

1. write() to a non-blocking socket. That copies the data into the OS  
buffers, and then returns successfully.
2. But then the OS tries to send the data, and the connection has been  
reset. It can't notify you that the write() failed, because it that  
call to write() already returned.
3. you try to do a second write(), that is when you get sigPIPE.

Now let's say you do:

write()
select()
write ()

I believe that select() will not wakeup if it is just monitoring the  
ability to write to the socket and the remote end abruptly drops the  
connection. (I am certain that under linux, if another thread  
explicitly closes the socket, that does not cause select() to wake up  
either.) However,  there is some evidence that if you monitor the  
socket for both reads and writes, that when the first write fails,  
select will wakeup and tell you that there is data available to  
read(). If you read() the data, you will find out that there are 0- 
bytes available, meaning the connection was closed. Alternatively, if  
you tried the second write, then you would get sigPIPE.

http://www.developerweb.net/forum/showthread.php?t=2956
http://stackoverflow.com/questions/180095/how-to-handle-a-broken-pipe-sigpipe-in-python

The tricky part is that if you wake up for a read() and there is data  
available, we don't want to actually read it in then sendfile  
function, (because that data is destined for somewhere else). So, you  
need to just peek at the data with out actually reading it to see if  
there is at least 1 byte available.

If this method of detection is correct, then what we need is a  
threadWaitReadWrite, that will notify us if the socket can be read or  
written. The IO manager does not currently provide a function like  
that.. but we could fake it like this: (untested):

import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Types

data RW = Read | Write

threadWaitReadWrite :: Fd -> IO RW
threadWaitReadWrite fd =
   do m <- newEmptyMVar
      rid <- forkIO $ threadWaitRead fd  >> putMVar m Read
      wid <- forkIO $ threadWaitWrite fd >> putMVar m Write
      r <- takeMVar m
      killThread rid
      killThread wid
      return r

Of course, in the case where the client disconnects because someone  
turns off the power or pulls the ethernet cable, we have no way of  
knowing what is going on -- so there is still the possibility that  
dead connections will be left open for a long time.

And, there is also the concern that even the portable version may have  
this issue. My research indicates that it should. In fact, any  
application which tries to send data over the network could be  
vulnerable to this bug. So, I am a little disturbed as to why the  
portable version does not appear to have issues..

- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100211/7199cdbb/attachment.html