[Haskell-cafe] Exceeding OS limits for simultaneous socket connections

Rob Stewart robstewart57 at gmail.com
Mon Jan 30 14:22:37 CET 2012


Hi,

I'm experiencing the "accept: resource exhausted (Too many open
files)" exception when trying to use sockets in my Haskell program.

The situation:
- Around a dozen Linux machines running my Haskell program,
transmitting thousands of messages to each other, sometimes within a
small period of time.
- I'm using the Network.Socket.ByteString.Lazy module to send and
receive lazy bytestrings
- The socket sent to the getMsg is bound and is listening, and is not
closed until the program exits. getMsg is called in a loop to receive
lazy bytestring from remote nodes. The socket is initialized with:

 sock <- Network.Socket.socket (addrFamily myAddr) Stream defaultProtocol
 bindSocket sock (addrAddress myAddr)
 listen sock 10

Here's the code:

sendMsg :: Maybe HostName -> Int -> Lazy.ByteString -> IO ()
sendMsg dest sckt msg = do
 result <- try $ withSocketsDo $ do
 addrinfos <- getAddrInfo Nothing dest (Just (show sckt))
 let serveraddr = head addrinfos
 sock <- socket (addrFamily serveraddr) Stream defaultProtocol
 connect sock (addrAddress serveraddr)
 sendAll sock msg
 sClose sock
 case result of
 Left (ex::IOException) -> return () -- permit send failure
 Right _ -> return ()


getMsg :: Socket -> IO Lazy.ByteString
getMsg sock = do
 result <- try $ withSocketsDo $ do
 (conn, addr) <- accept sock
 getContents conn
 case result of
 Left (ex::IOException) -> putStrLn (show ex) >> getMsg sock
 Right msg -> return msg

The current topology is a master/slave setup. For some programs that
use these functions above, `sendMsg' is called thousands of times in
quick succession on the remote nodes, where the destination of the
`sendAll' function is the master node. Here's the maximum number of
simultaneous sockets I am permitted to have open on my Linux machines:

$ ulimit -n
1024

Indeed, when I experience the "accept: resource exhausted (Too many
open files)" exception, I check the number of open sockets, which
exceeds 1024, by looking at the contents of the directory:
ls -lah /proc/<prod_id>/fd

It is within the getContents function that, once the lazy bytestring
is fully received, the socket is shutdown http://goo.gl/B6XcV :
shutdown sock ShutdownReceive

There seems to be no way of limiting the number of permitted
connection requests from remote nodes. What I am perhaps looking for
is a mailbox implementation on top of sockets, or another way to avoid
this error. I am looking to scale up to 100's of nodes, where the
possibility of more than 1024 simultaneous socket connections to one
node is increased. Merely increasing the ulimit feels like a temporary
measure. Part of the dilemma is that the `connect' call in `sendMsg'
does not throw an error, despite the fact that it does indeed cause an
error on the receiving node, by pushing the number of open connections
to the same socket on the master node, beyond the 1024 limit permitted
by the OS.

Am I missing something? One would have thought such a problem occurs
frequently with Haskell web servers and the like.. ?

--
Rob Stewart



More information about the Haskell-Cafe mailing list