[Haskell-cafe] Re: sendfile leaking descriptors on Linux?
Jeremy Shaw
jeremy at n-heptane.com
Thu Feb 11 18:28:23 EST 2010
On Feb 11, 2010, at 1:57 PM, Bardur Arantsson wrote:
>
>> 2. the remote client has terminated the connection as far as it is
>> concerned but not notified the server -- when you try to send data
>> it will
>> reject it, and send/write/sendfile/etc will raise sigPIPE.
>> Looking at your debug output, we are seeing the sigPIPE / Broken
>> Pipe error
>> most of the time. But then there is the case where we get stuck on
>> the
>> threadWaitWrite.
>> threadWaitWrite is ultimately implemented by passing the file
>> descriptor to
>> the list of write descriptors in a call to select(). It seems,
>> however, that
>> select() is not waking up just because calling write() on a file
>> descriptor
>> *would* cause sigPIPE.
>
> That's what I expect select() with an "errfd" FDSET would do.
Nope. The expectfds are only trigger in esoteric conditions. For TCP
sockets, I think it only occurs if there is out-of-band data available
to be read via recv() with the MSG_OOB flag.
http://uw714doc.sco.com/en/SDK_netapi/sockC.OoBdata.html
>> The easiest way to confirm this case is probably to write a small,
>> pure C
>> program and see what really happens.
>> If this is the case, then it means the only way to tell if the
>> client has
>> abruptly dropped the connection is to actually try sending the data
>> and see
>> if the sending function calls sigPIPE. And that means doing some
>> sort of
>> polling/timeout?
>
> Correct, but the trouble is deciding how often to poll and/or how
> long the timeout should be.
>
> I don't see any easy answer to that. That's why my suggested
> "solution" is to simply punt it to the OS (by using portable mode)
> and suck up the extra overhead of the portable solution. Hopefully
> the new GHC I/O manager will make it possible to have a proper
> solution.
The whole point of the sendfile library is to use sendfile(), so not
using sendfile() seems like the wrong solution. I am also not
convinced that the new GHC I/O manager will do anything new to make it
possible to have a proper solution. I believe we would be seeing the
same error even in pure C, so we need to know the work around that
works in pure C as well. I am not convinced we are punting to the OS
by using portable mode either (more below).
>> I do not have a good explanation as to why the portable version
>> does not
>> fail. Except maybe it is just so slow that it does not ever fill up
>> the
>> buffer, and hence does not get stuck in threadWaitWrite?
>
> The portable version doesn't call threadWaitWrite. It simply turns
> the Socket into a handle (which causes it to become blocking) and
> so the kernel is tasked with handling all the gritty details.
The portable version does not directly call threadWaitWrite, but it
still calls it.
Data.ByteString.Char8.hPutStr calls
Data.ByteString.hPut which calls
Data.ByteString.hPutBuf which calls
System.IO.hPutBuf which calls
GHC.IO.Handle.Text.hPutBuf which calls
GHC.IO.Handle.bufWrite.Text which calls
GHC.IO.Device.write which calls
GHC.IO.FD.fdWrite which calls
GHC.IO.FD.writeRawBufferPtr which calls
which is defined as:
writeRawBufferPtr :: String -> FD -> Ptr Word8 -> Int -> CSize -> IO
CInt
writeRawBufferPtr loc !fd buf off len
| isNonBlocking fd = unsafe_write -- unsafe is ok, it can't block
| otherwise = do r <- unsafe_fdReady (fdFD fd) 1 0 0
if r /= 0
then write
else do threadWaitWrite (fromIntegral (fdFD
fd)); write
where
do_write call = fromIntegral `fmap`
throwErrnoIfMinus1RetryMayBlock loc call
(threadWaitWrite (fromIntegral (fdFD fd)))
write = if threaded then safe_write else unsafe_write
unsafe_write = do_write (c_write (fdFD fd) (buf `plusPtr` off)
len)
safe_write = do_write (c_safe_write (fdFD fd) (buf `plusPtr`
off) len)
According to the following test program, I expect that 'isNonBlocking
fd' will be 'True'. So it seems like the portable solution should be
vulnerable to the same condition. Perhaps the portable version is just
so slow that the OS buffers never fill up so EAGAIN is never raised?
-------------------------------------------------------------------------------------------------------
{-# LANGUAGE RecordWildCards #-}
module Main where
import Control.Concurrent (forkIO)
import Control.Monad (forever)
import Network (PortID(PortNumber), Socket, listenOn)
import Network.Socket (accept, socketToHandle)
import System.IO
import qualified GHC.IO.FD as FD
import GHC.IO.Handle.Internals (withHandle, flushWriteBuffer)
import GHC.IO.Handle.Types (Handle__(..), HandleType(..))
import qualified GHC.IO.FD as FD
import System.Posix.Types (Fd(..))
import System.IO.Error
import GHC.IO.Exception
import Data.Typeable (cast)
import GHC.IO.Handle.Internals (wantWritableHandle)
main =
listen (PortNumber (toEnum 2525)) $ \s ->
do h <- socketToHandle s ReadWriteMode
wantWritableHandle "main" h $ \h_ -> showBlocking h_
showBlocking :: Handle__ -> IO ()
showBlocking h_ at Handle__{..} =
case cast haDevice of
Nothing -> return ()
Just fd -> case (FD.fdIsNonBlocking fd) of
1 -> putStrLn "is NonBlocking"
0 -> putStrLn "is not NonBlocking"
listen :: PortID -> (Socket -> IO ()) -> IO ()
listen port handler =
do socket <- listenOn port
forever $ do (s,sa) <- accept socket
forkIO $ handler s
-------------------------------------------------------------------------------------------------------
>> Any way, the fundamental question is:
>> When your write buffer is full, and you call select() on that file
>> descriptor, will select() return in the case where calling write()
>> again
>> would raise sigPIPE?
>
> I believe so, *if* you give it the FD in the exceptfds FD_SET
> parameter. Let's face it, any other behavior doesn't make any sense
> since it's the equivalent of forcing all timeout handling onto the
> user, just like threadWaitWrite currently does. I've written my fair
> share of networking code in various languages (including C/C++) and
> I've never seen this problem of "missing wakeups" before.
All my reading seems to indicate that exceptfds won't help anything --
it is a seldom used feature and doesn't do what people wish it
actually did. For example, see answer 1 on this page:
http://stackoverflow.com/questions/1342712/nix-select-and-exceptfds-errorfds-semantics
There is some evidence that when you are doing select() on a readfds,
and the connection is closed, select() will indicate that the fds is
ready to be read, but when you read it, you get 0-bytes. That
indicates that a disconnect has happened. However, if you are only
doing read()/recv(), I expect that only happens in the event of a
proper disconnect, because if you are just listening for packets,
there is no way to tell the difference between the sender just not
saying anything, and the sender dying:
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#advanced
http://channel9.msdn.com/forums/TechOff/434466-TcpClient-Test-for-Disconnected/
http://bytes.com/topic/python/answers/40278-detecting-shutdown-remot
But I can not find any clear information on what happens when you do
select() on a write socket, and the remote end abruptly disconnects.
Consider first the case that does not use select() at all:
1. write() to a non-blocking socket. That copies the data into the OS
buffers, and then returns successfully.
2. But then the OS tries to send the data, and the connection has been
reset. It can't notify you that the write() failed, because it that
call to write() already returned.
3. you try to do a second write(), that is when you get sigPIPE.
Now let's say you do:
write()
select()
write ()
I believe that select() will not wakeup if it is just monitoring the
ability to write to the socket and the remote end abruptly drops the
connection. (I am certain that under linux, if another thread
explicitly closes the socket, that does not cause select() to wake up
either.) However, there is some evidence that if you monitor the
socket for both reads and writes, that when the first write fails,
select will wakeup and tell you that there is data available to
read(). If you read() the data, you will find out that there are 0-
bytes available, meaning the connection was closed. Alternatively, if
you tried the second write, then you would get sigPIPE.
http://www.developerweb.net/forum/showthread.php?t=2956
http://stackoverflow.com/questions/180095/how-to-handle-a-broken-pipe-sigpipe-in-python
The tricky part is that if you wake up for a read() and there is data
available, we don't want to actually read it in then sendfile
function, (because that data is destined for somewhere else). So, you
need to just peek at the data with out actually reading it to see if
there is at least 1 byte available.
If this method of detection is correct, then what we need is a
threadWaitReadWrite, that will notify us if the socket can be read or
written. The IO manager does not currently provide a function like
that.. but we could fake it like this: (untested):
import Control.Concurrent
import Control.Concurrent.MVar
import System.Posix.Types
data RW = Read | Write
threadWaitReadWrite :: Fd -> IO RW
threadWaitReadWrite fd =
do m <- newEmptyMVar
rid <- forkIO $ threadWaitRead fd >> putMVar m Read
wid <- forkIO $ threadWaitWrite fd >> putMVar m Write
r <- takeMVar m
killThread rid
killThread wid
return r
Of course, in the case where the client disconnects because someone
turns off the power or pulls the ethernet cable, we have no way of
knowing what is going on -- so there is still the possibility that
dead connections will be left open for a long time.
And, there is also the concern that even the portable version may have
this issue. My research indicates that it should. In fact, any
application which tries to send data over the network could be
vulnerable to this bug. So, I am a little disturbed as to why the
portable version does not appear to have issues..
- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100211/7199cdbb/attachment.html
More information about the Haskell-Cafe
mailing list