[Haskell-cafe] Amazonka, conduit and sockets not closing

Will Yager will.yager at gmail.com
Sat Nov 28 17:42:58 UTC 2020


Linux has kernel params you can tweak for socket reuse. Also look up SO_REUSEADDR for background. 

> On Nov 28, 2020, at 8:44 AM, Bryan Richter <b at chreekat.net> wrote:
> 
> I thought CLOSE_WAIT *is* one of the "closed" states. TCP sockets
> stick around for a few minutes after use, right? You may simply be
> generating sockets faster than one operating system can handle. Find
> some way to reuse existing sockets, perhaps?
> 
>> On Thu, Nov 26, 2020 at 3:13 PM Magnus Therning <magnus at therning.org> wrote:
>> 
>> I've run into a problem with running out of filedescriptors. The
>> following snippet is a trimmed down version of what I'm doing:
>> 
>> #+begin_src haskell
>> main :: IO ()
>> main = do
>>  awsEnv <- newEnv Discover
>>  runAWSCond awsEnv $
>>    sqsSource queueUrl
>>      .| C.mapC snd
>>      .| sqsDeleteSink queueUrl
>>  where
>>    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit
>> 
>> sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
>> sqsSource queueUrl = do
>>  (_, msgs) <- C.lift $ recvSQS queueUrl
>>  C.yieldMany msgs
>>  sqsSource queueUrl
>> 
>> sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
>> sqsDeleteSink queueUrl = do
>>  C.await >>= \case
>>    Nothing -> pure ()
>>    Just receiptHandle -> do
>>      void $ C.lift $ delSQS queueUrl receiptHandle
>>      sqsDeleteSink queueUrl
>> 
>> recvSQS queueUrl = do
>>  let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
>>  rmrs <- send rm
>>  let status = rmrs ^. rmrsResponseStatus
>>      msgs = rmrs ^. rmrsMessages & traversed %~ extract
>>  pure (status, catMaybes msgs)
>>  where
>>    extract msg = do
>>      body <- msg ^. mBody
>>      rh <- msg ^. mReceiptHandle
>>      pure (body, rh)
>> 
>> delSQS queueUrl receiptHandle = do
>>  let dm = deleteMessage queueUrl receiptHandle
>>  send dm
>> #+end_src
>> 
>> This works fine for a while, but given a queue with enough messages it will fail
>> with something like
>> 
>> #+begin_example
>> TransportError (HttpExceptionRequest Request {
>>  host                 = "sqs.eu-central-1.amazonaws.com"
>>  port                 = 443
>>  secure               = True
>>  requestHeaders       = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
>>  path                 = "/"
>>  queryString          = ""
>>  method               = "POST"
>>  proxy                = Nothing
>>  rawBody              = False
>>  redirectCount        = 0
>>  responseTimeout      = ResponseTimeoutMicro 70000000
>>  requestVersion       = HTTP/1.1
>> }
>> (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))
>> #+end_example
>> 
>> After some detours I found out that it's actually not a network issue, but
>> rather that the process runs out of filedescriptors. Using =lsof= I can see that
>> it doesn't seem to close /any/ sockets at all, instead they get stuck in a
>> =CLOSE_WAIT= state:
>> 
>> #+begin_example
>> COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
>> wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
>> wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
>> wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
>> wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
>> ...
>> #+end_example
>> 
>> Am I using Amazonka and/or Conduit in a way that results in this? How do I should I use them?
>> 
>> Or, is it an issue somewhere "below" my code? What can I do address that?
>> 
>> Thanks for any insights or help
>> /M
>> 
>> --
>> Magnus Therning              OpenPGP: 0x927912051716CE39
>> email: magnus at therning.org
>> twitter: magthe              http://magnus.therning.org/
>> 
>> Action is the foundational key to all success.
>>     — Pablo Picasso
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.


More information about the Haskell-Cafe mailing list