FW: [Haskell-cafe] The RTSOPTS "-qm" flag's impact on runtime
Simon Peyton-Jones
simonpj
Tue Oct 1 09:46:14 UTC 2013
Simon: did you see this? A factor of 50 in runtime seems pretty significant!
Simon
-----Original Message-----
From: Haskell-Cafe [mailto:haskell-cafe-bounces at haskell.org] On Behalf Of Iustin Pop
Sent: 30 September 2013 23:14
To: Haskell Cafe
Subject: [Haskell-cafe] The RTSOPTS "-qm" flag's impact on runtime
Hi all,
I found an interesting case where the rtsopts -qm flag makes a
significant difference in runtime (~50x). This is using GHC 7.6.3, llvm 3.4, program
compiled with "-threaded -O2 -fllvm" and a couple of language extension.
Source is at
http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=chameneosredux&lang=ghc&id=4&data=u64q,
on the language shootout benchmarks.
Running the code without -N results (on my computer) in around 4 seconds
of runtime:
$ time ./orig 6000000
?
real 0m3.919s
user 0m3.903s
sys 0m0.010s
This is reasonably consistent. Running -N4 (this is an 8-core machine)
results in the surprising:
$ time ./orig 6000000 +RTS -N4
?
real 1m15.154s
user 1m38.790s
sys 2m7.947s
The cores are all used very erratically (continuously changing
5%-20%-40%) and the overall cpu usage is ~27-28%. Note the surprising
2m7s of sys usage, which means the kernel is involved a lot?
Note that removing the explicit forkOn and running with -N4 results in
somewhat worse performance:
real 2m6.548s
user 2m13.470s
sys 2m3.043s
So in that sense the forkOn itself is not at fault. What I have found is
that -qm is here a life saver:
$ time ./orig 6000000 +RTS -N4 -qm
real 0m2.773s
user 0m5.610s
sys 0m0.123s
Adding -qa doesn't make a big difference. To summarise more runs (in
terms of cpu used, user+sys):
with forkOn:
- -N4: 228s
- -N4 -qa: 110s
- -N4 -qm: 6s
- -N4 -qm -qa: 6s
without forkOn:
- -N4: 253s
- -N4 -qa: 252s
- -N4 -qm: 5s
- -N4 -qm -qa: 5s
(Note that "without forkOn" is a bit slower in term of wall-clock, as
the "with forkOn" version distributes the work a bit better, even if it
uses overall a tiny bit more CPU.)
So the question is, what does -qm actually do that it affects this
benchmark so much (~50x)? (The docs are not very clear on it)
And furthermore, could there be an heuristic inside the runtime such
that automatic thread migration is suspended if threads are
"over-migrated" (which is what I suppose happens here)?
thanks for any explanations,
iustin
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe at haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the ghc-devs
mailing list