[Haskell-cafe] Long pauses / idle time in a Haskell TCP server

Alex Iliev alex.iliev at gmail.com
Mon Oct 8 10:10:42 CEST 2012


Hello,

I'm seeing long pauses in a server based on the 'scalable-server'
package from Hackage, version 0.2.2 [1]. It normally performs very
well, about 100-150 micro-secs client latency over the loopback
interface, but a fair number of requests are much slower, 38-40
milli-secs, making the mean latency and throughput quite bad.

The server is a simple version of a memory cache, with operations to
"put" a key-value mapping, and to "get" the value for a key, using a
data map. I've tried with the strict versions of
'unordered-containers' and the 'containers' package for the data map
implementation, with similar results.

I've tried it with ghc 7.4.1 and with 7.6.1 with similar results.

Runtime info I got shows:
- GC report: GC time is much less than the total time of the slow
responses, so does not explain them.
- CPU profile: Idle time is very high, and would explain the slow
responses. By idle time I mean the part of wall time which is not
included in the profile report's "total time". For example, when
running a test for 5 minutes, the profile report has total time = 0.05
secs. 'top' output is consistent - quite low CPU usage during such a
test. Apart from this there is nothing notable in the profile, most
time is spent in the network input and output processing
(network-enumerator etc.).

One observation is that the pauses seem correlated with overwriting
entries in the data map. If I'm mostly adding new keys (as well as
reading), the throughput is much better, and the idle time in the CPU
profile is lower (though the time spent in GC increases a lot).

I've removed all concurrency from the program (including my version of
scalable-server) to eliminate that factor, but the problem persists. I
tried it on two fairly different Linux machines, with similar results.
The client I use to benchmark is a simple C program, so I don't
suspect it of causing the pauses. Looking at the timestamps on the
network traffic confirms that the pauses are at the server.

Any idea what might be causing the idle time? The long pauses are
consistently 38-40ms, maybe that points to some aspect of CPU
scheduling, leaving the program idle for some time?

I can put the code on github if it would help.

Many thanks!
Alex

[1] http://hackage.haskell.org/package/scalable-server-0.2.2



More information about the Haskell-Cafe mailing list