[Haskell-cafe] Amazonka, conduit and sockets not closing

Magnus Therning magnus at therning.org
Thu Nov 26 13:12:49 UTC 2020

I've run into a problem with running out of filedescriptors. The
following snippet is a trimmed down version of what I'm doing:

#+begin_src haskell
main :: IO ()
main = do
  awsEnv <- newEnv Discover
  runAWSCond awsEnv $
    sqsSource queueUrl
      .| C.mapC snd
      .| sqsDeleteSink queueUrl
    runAWSCond awsEnv = runResourceT . runAWS awsEnv . within Frankfurt . C.runConduit

sqsSource :: MonadAWS m => T.Text -> C.ConduitT () (T.Text, T.Text) m ()
sqsSource queueUrl = do
  (_, msgs) <- C.lift $ recvSQS queueUrl
  C.yieldMany msgs
  sqsSource queueUrl

sqsDeleteSink :: MonadAWS m => T.Text -> C.ConduitT T.Text o m ()
sqsDeleteSink queueUrl = do
  C.await >>= \case
    Nothing -> pure ()
    Just receiptHandle -> do
      void $ C.lift $ delSQS queueUrl receiptHandle
      sqsDeleteSink queueUrl

recvSQS queueUrl = do
  let rm = receiveMessage queueUrl & rmMaxNumberOfMessages ?~ 10
  rmrs <- send rm
  let status = rmrs ^. rmrsResponseStatus
      msgs = rmrs ^. rmrsMessages & traversed %~ extract
  pure (status, catMaybes msgs)
    extract msg = do
      body <- msg ^. mBody
      rh <- msg ^. mReceiptHandle
      pure (body, rh)

delSQS queueUrl receiptHandle = do
  let dm = deleteMessage queueUrl receiptHandle
  send dm

This works fine for a while, but given a queue with enough messages it will fail
with something like

TransportError (HttpExceptionRequest Request {
  host                 = "sqs.eu-central-1.amazonaws.com"
  port                 = 443
  secure               = True
  requestHeaders       = [("Host","sqs.eu-central-1.amazonaws.com"),("X-Amz-Date","20201126T101659Z"),("X-Amz-Content-SHA256","2e4bdf20a857a1416f218b1218670cf019ff53268d0adb34fe06402a62f3271d"),("Content-Type","application/x-www-form-urlencoded; charset=utf-8"),("Authorization","<REDACTED>")]
  path                 = "/"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 0
  responseTimeout      = ResponseTimeoutMicro 70000000
  requestVersion       = HTTP/1.1
 (ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [AI_ADDRCONFIG], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 0, addrAddress = <assumed to be undefined>, addrCanonName = <assumed to be undefined>}, host name: Just "sqs.eu-central-1.amazonaws.com", service name: Just "443"): does not exist (System error)))

After some detours I found out that it's actually not a network issue, but
rather that the process runs out of filedescriptors. Using =lsof= I can see that
it doesn't seem to close /any/ sockets at all, instead they get stuck in a
=CLOSE_WAIT= state:

wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60624-> (CLOSE_WAIT)
wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43482-> (CLOSE_WAIT)
wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60628-> (CLOSE_WAIT)
wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43486-> (CLOSE_WAIT)

Am I using Amazonka and/or Conduit in a way that results in this? How do I should I use them?

Or, is it an issue somewhere "below" my code? What can I do address that?

Thanks for any insights or help

Magnus Therning
email: magnus at therning.org
twitter: magthe              http://magnus.therning.org/

Action is the foundational key to all success.
     — Pablo Picasso
