[Haskell-cafe] Amazonka, conduit and sockets not closing
Will Yager
will.yager at gmail.com
Sat Nov 28 17:42:58 UTC 2020
Linux has kernel params you can tweak for socket reuse. Also look up SO_REUSEADDR for background.
> On Nov 28, 2020, at 8:44 AM, Bryan Richter <b at chreekat.net> wrote:
>
> I thought CLOSE_WAIT *is* one of the "closed" states. TCP sockets
> stick around for a few minutes after use, right? You may simply be
> generating sockets faster than one operating system can handle. Find
> some way to reuse existing sockets, perhaps?
>
>> On Thu, Nov 26, 2020 at 3:13 PM Magnus Therning <magnus at therning.org> wrote:
>>
>> I've run into a problem with running out of filedescriptors. The
>> following snippet is a trimmed down version of what I'm doing:
>>
>> #+begin_src haskell
>> main :: IO ()
>> main = do
>> awsEnv <- newEnv Discover
>> runAWSCond awsEnv $
>> sqsSource queueUrl
>> .| C.mapC snd
>> .| sqsDeleteSink queueUrl
>> where
>> runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit
>>
>> sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
>> sqsSource queueUrl = do
>> (_, msgs) <- C.lift $ recvSQS queueUrl
>> C.yieldMany msgs
>> sqsSource queueUrl
>>
>> sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
>> sqsDeleteSink queueUrl = do
>> C.await >>= \case
>> Nothing -> pure ()
>> Just receiptHandle -> do
>> void $ C.lift $ delSQS queueUrl receiptHandle
>> sqsDeleteSink queueUrl
>>
>> recvSQS queueUrl = do
>> let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
>> rmrs <- send rm
>> let status = rmrs ^. rmrsResponseStatus
>> msgs = rmrs ^. rmrsMessages & traversed %~ extract
>> pure (status, catMaybes msgs)
>> where
>> extract msg = do
>> body <- msg ^. mBody
>> rh <- msg ^. mReceiptHandle
>> pure (body, rh)
>>
>> delSQS queueUrl receiptHandle = do
>> let dm = deleteMessage queueUrl receiptHandle
>> send dm
>> #+end_src
>>
>> This works fine for a while, but given a queue with enough messages it will fail
>> with something like
>>
>> #+begin_example
>> TransportError (HttpExceptionRequest Request {
>> host = "sqs.eu-central-1.amazonaws.com"
>> port = 443
>> secure = True
>> requestHeaders = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
>> path = "/"
>> queryString = ""
>> method = "POST"
>> proxy = Nothing
>> rawBody = False
>> redirectCount = 0
>> responseTimeout = ResponseTimeoutMicro 70000000
>> requestVersion = HTTP/1.1
>> }
>> (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))
>> #+end_example
>>
>> After some detours I found out that it's actually not a network issue, but
>> rather that the process runs out of filedescriptors. Using =lsof= I can see that
>> it doesn't seem to close /any/ sockets at all, instead they get stuck in a
>> =CLOSE_WAIT= state:
>>
>> #+begin_example
>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>> wd-stats 88674 magnus 23u IPv4 815196 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
>> wd-stats 88674 magnus 24u IPv4 811362 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
>> wd-stats 88674 magnus 25u IPv4 811386 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
>> wd-stats 88674 magnus 26u IPv4 813527 0t0 TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
>> ...
>> #+end_example
>>
>> Am I using Amazonka and/or Conduit in a way that results in this? How do I should I use them?
>>
>> Or, is it an issue somewhere "below" my code? What can I do address that?
>>
>> Thanks for any insights or help
>> /M
>>
>> --
>> Magnus Therning OpenPGP: 0x927912051716CE39
>> email: magnus at therning.org
>> twitter: magthe http://magnus.therning.org/
>>
>> Action is the foundational key to all success.
>> — Pablo Picasso
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
More information about the Haskell-Cafe
mailing list