[Haskell-cafe] Re: sendfile leaking descriptors on Linux?

Bardur Arantsson spam at scientician.net
Sun Feb 21 02:21:00 EST 2010


Jeremy Shaw wrote:
> Hello,
> 
> I think to make progress on this bug we really need a failing test case that
> other people can reproduce.
> 
> I have hacked up small server that should reproduce the error (using fdWrite
> instead of sendfile). And a small C client which is intended to reproduce
> the error -- but doesn't.
> 
> I have attached both.
> 
> The server tries to write a whole lot of 'a' characters to the client. The
> client does not consume any of them. This causes the server to block on the
> threadWaitWrite.
> 
> No matter how I kill the client, threadWaitWrite always wakes up.

Are you running the client and server on different physical machines? If 
so, have you tried simply yanking the connection?

Your client isn't dropping the connection hard -- if you kill the client 
(even with a -9) your OS cleans up any open sockets it has. On 
well-behaved OS'es that cleanup usually involves properly shutting down 
the connection somehow. Different OS'es have different ideas about what 
constitutes "properly shutting down the connection" -- some simply don't.

My hypothesis is that the PS3 doesn't properly shut down the connection, 
but simply sends a RST (or maybe a FIN) and drops any further packets. 
I'll do a Wireshark dump after posting this to see if I can see what 
it's doing at the TCP level -- I'm not optimistic about seeing the exact 
moment when the "leak" occurs, but maybe the general pattern can yield 
some useful ideas.

I have no idea how to test this without using an actual PS3.

 > So, we
> need to figure out exactly what the PS3 is doing differently that causes
> threadWaitWrite to not wakeup..

Does it matter? I can reproduce this reliably within a few minutes of 
testing.

Note that this doesn't happen *every* time the PS3 disconnects and 
reconnects, it just happens some of the time. It's enough to eat up 
MAX_FDs file descriptors in a few hours of playing media normally. If I 
do a lot of seeking (forces a disconnect+reconnect) through the movie, 
at least one file descriptor usually leaks within a few minutes.

> If we don't know why it is failing, then I
> don't think we can properly fix it.

I'm more pragmatic: If, after applying a fix, I cannot reproduce this 
problem within a few hours (or so) or running my media server, I'd say 
it's fixed. As long as the modifications to the sendfile library don't 
change its behavior in other ways, I don't see the problem.

P.S. Does anyone else out there have a PS3 to test with?



More information about the Haskell-Cafe mailing list