[Haskell-cafe] Does anyone know the reason for differeng
measurements in profiling under -threaded?
Jesper Louis Andersen
jesper.louis.andersen at gmail.com
Thu Apr 22 18:14:29 EDT 2010
Hi,
I am asking if anyone has seen the following behaviour from GHC under
-threaded and heavy use of File I/O, network I/O and STM use. After
having run Combinatorrent for a while and then terminating it, we get
the following output from the RTS in GC statistics:
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.22s ( 60.06s elapsed)
GC time 0.05s ( 0.07s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.26s ( 60.12s elapsed)
%GC time 18.2% (0.1% elapsed)
Alloc rate 431,635,651 bytes per MUT second
Productivity 80.3% of total user, 0.4% of total elapsed
Note that I am running it for exactly 60 seconds of wall-clock time.
Yet, the profile information reports:
total time = 11.20 secs (560 ticks @ 20 ms)
total alloc = 78,696,568 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
ticks bytes
MAIN MAIN 80.4 0.2
450 20033
PeerControl Process.Peer 11.2 1.0
63 95272
Sender Process.Peer.Sender 6.2 0.0
35 387
sha1_digest Digest 1.8 1.4
10 136377
checkPiece FS 0.2 92.3
1 9083658
TorrentManager Process.TorrentManager 0.2 1.8
1 179587
which is clearly wrong. Total time is nowhere near 11.20 secs with 560
ticks here. I am almost certain that the 450 ticks attributed to MAIN
is wrong as well. I have SCCs at each possible forkIO-point in the
program. -prof -auto-all -caf-all doesn't make any difference either,
I've tried. Has anyone seen this kind of problem and in that case,
what was the fix? Barring nobody knows, the only thing I have left to
try is to dive into the GHC RTS and figure out what is going on in
there.
As an aside: Before writing combinatorrent, I wrote etorrent, A
BitTorrent client in Erlang. The two clients follow more or less the
same basic structure and largely does the same ting. Combinatorrent
has better memory usage right now, mostly due to representation
choices: Haskell can do much better data representation than erlang.
On the other hand, etorrent still beats out combinatorrent in the CPU
department, using less CPU. And that somewhat perplexes me. This is
why I am looking at performance profiles and trying to find the
culprit. But with the above problem, I can't find it, since I don't
trust the profiler anymore to give me the right results. All logic
suggest that Haskell is ripping apart Erlang in all parts that has no
communication, mostly due to being compiled and having a static type
system (The erlang code is interpreted, even though they have a HiPE
JIT). In practice however, my guess is that the binary handling in
Erlang is quite effective, as is the communication between processes,
so this might turn the tides in the favor of the erlang-RTS.
As a fun thing, I tried downloading the same file in the test
environment and running strace -cf on both programs. Here are the top
winners. I don't think you can trust all the columns:
Erlang:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.57 285.220815 2241 127300 10518 futex
2.16 6.453726 271 23828 9838 read
2.15 6.420512 279153 23 wait4
0.07 0.198851 3 64073 1 epoll_wait
0.02 0.048004 4000 12 poll
0.01 0.039131 2 16297 write
0.01 0.026917 0 88057 37842 recvfrom
0.01 0.018174 0 75803 36 epoll_ctl
0.00 0.008001 667 12 1 select
0.00 0.007133 1 5138 3 lseek
0.00 0.001931 64 30 6 execve
0.00 0.001912 0 5535 writev
0.00 0.001334 191 7 vfork
0.00 0.001209 0 2684 1995 open
0.00 0.001128 0 3094 2257 stat
0.00 0.000942 6 153 munmap
Haskell/GHC:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.20 15834.075616 176163 89883 10065 futex
21.72 10682.355809 176009 60692 26672 recvfrom
12.22 6009.064699 176023 34138 18 select
10.25 5038.622913 176009 28627 1 write
8.45 4153.652812 176010 23599 1 read
4.84 2380.702238 176009 13526 1 writev
3.59 1765.107536 175965 10031 rt_sigprocmask
1.95 961.185526 176009 5461 fstat
1.94 956.257032 176009 5433 lseek
1.80 883.829682 175922 5024 getrusage
0.85 419.781465 176009 2385 107 rt_sigreturn
0.04 19.713008 176009 112 fcntl
0.03 12.320630 176009 70 mmap
0.02 9.680495 176009 55 socket
0.02 9.504619 176011 54 53 connect
0.02 8.976459 176009 51 getsockopt
0.01 5.456279 176009 31 close
Apart from Erlang using epoll, there is not too much difference from
the perspective of the kernel, which I think is rather good :)
--
J.
More information about the Haskell-Cafe
mailing list