[Haskell-cafe] Amazonka, conduit and sockets not closing

Viktor Dukhovni ietf-dane at dukhovni.org
Sat Nov 28 18:34:16 UTC 2020


On Thu, Nov 26, 2020 at 02:12:49PM +0100, Magnus Therning wrote:

> After some detours I found out that it's actually not a network issue, but
> rather that the process runs out of filedescriptors. Using =lsof= I can see that
> it doesn't seem to close /any/ sockets at all, instead they get stuck in a
> =CLOSE_WAIT= state:
> 
> #+begin_example
> COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> wd-stats 88674 magnus   23u  IPv4 815196      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60624->52.119.188.213:https (CLOSE_WAIT)
> wd-stats 88674 magnus   24u  IPv4 811362      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43482->52.119.189.184:https (CLOSE_WAIT)
> wd-stats 88674 magnus   25u  IPv4 811386      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:60628->52.119.188.213:https (CLOSE_WAIT)
> wd-stats 88674 magnus   26u  IPv4 813527      0t0  TCP ip-192-168-0-9.eu-central-1.compute.internal:43486->52.119.189.184:https (CLOSE_WAIT)
> ...
> #+end_example

How many such still open file descriptors did you find?
(If you run "lsof -n -P -i tcp -a -p $pid", it'll produce
the output faster, reporting only sockets).

Contrary to other replies, indeed the sockets above are NOT closed in
your process, otherwise they'd not be associated with a file descriptor
and would just show up in "netstat", but not "lsof" output.

I don't know what happens inside Amazonka, but typically clients doing
many concurrent HTTPS calls employ a TlsManager that maintains a
connection pool, and would avoid opening too many concurrent
connections, but would also keep a limited number of connections open
for more requests.

How many still open connections did you find?  I don't know whether
TlsManager aggregates connections by name or IP address, if the latter,
perhaps (very speculatively, without looking at the underlying code,
...) Amazon's IP address is changing quickly (short or 0 TTL) breaking
the connection pool's per-destination connection limits.  This is a
wild guess, more evidence is needed to make it actually plausible or
rule it out.

-- 
    Viktor.


More information about the Haskell-Cafe mailing list