[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Jeremy Shaw jeremy at n-heptane.com
Wed Feb 17 16:54:29 EST 2010


On Wed, Feb 17, 2010 at 1:27 PM, Bardur Arantsson <spam at scientician.net>wrote:

>
>  (Obviously, if people are using sendfile with something other than
>> happstack,
>> it does not help them, but it  sounds like trying to fix things in
>>
> > sendfile is misguided anyway.)
>
>>
>>
> How so? As a user I expect sendfile to work and not semi-randomly block
> threads indefinitely.
>
>
Because it only addresses *one* case when this type of blocking can happen.

Shouldn't hPut and friends also block indefinitely since they also use
threadWaitWrite? If so, what good is just fixing sendfile, when all other
network I/O will still block indefinitely?

If things are 'fixed' at a higher-level, by using SO_KEEPALIVE, then does
sendfile really need a hack to deal with it?

With your proposed fix, if the user unplugs the network cable, then won't
you get an polling loop that never terminates? That doesn't sound any better
than the current situation..

You said that you have not seen this issue when using the code that uses
hPut, only the code that uses sendfile(). But my research indicates that we
*should* see the error. So, I am not very comfortable fixing just sendfile
and ignoring the fact that all network I/O might be borked..

I am also not 100% pleased by the SO_KEEPALIVE solution. There are really
two errors which can occur:

  1. the remote end drops the connection in such a manner that we
immediately get notified of it by seeing that a read select() on the socket
is successful but there are 0 bytes available to read. This happens because
the remote end sent a notification to us that they have terminated the
connection.

  2. the remote end drops off the network (for example, the network cable is
disconnected). In this case, we will not get any notification via read
select(), because the remote server is not there to send the notification.
The only solution is to eventually timeout.

By using a timeout to handle #2, we implicitly handle #1, but in a very
untimely manner.

Ideally, we would like to handle both these cases separately. In case #1, we
know immediately, that the connection is dead, and can therefore clean
things up. With case #2, the remote client might actually come back online,
(someone plugs the cable back in), and the transfer resumes. Perhaps in some
applications we want infinite timeouts for case #2. That does not mean we do
not want case #1 handled.

However, I do not really see a good way of handle #1 right now that works
for all network code, not just sendfile.

The issue seems to be that select() was designed as a way to *avoid* using
threads. There seems to be the assumption in the network code that you are
going to do a select on the read and write aspects of the socket. When the
select returns you will then look at what happened, and take the correct
action.

But, in Haskell, we are using multiple threads. So the code that is looking
to read data and the code that is looking to write data don't really know
about each other. So even if the read thread detects the closed socket, it
has no idea that some other thread needs to be killed.

so, what to do? Perhaps it is wrong to use a socket in more than one thread?
Obviously, having multiple threads trying read the same socket, or write to
the same socket would be a mess. So why do we expect it is ok to have one
thread reading and a different thread writing? But, even if we do restrict
ourselves to only accessing a socket from one thread at a time, we still
have the issue that every place which uses threadWaitWrite needs to handle
the disconnect case. We could, of course, write a wrapper function that does
the check, and call that instead. But we still have not really solved the
problem. The code in the I/O libraries that eventually implements hPut calls
threadWaitWrite. But it has no idea that the file descriptor it is waiting
on is a socket which has special requirements. That code is also used for
writing to plain old files, etc, so it probably wouldn't make sense for it
to behave that way by default..

- jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100217/cd56eab2/attachment.html


More information about the Haskell-Cafe mailing list