[Haskell-cafe] Haskell Speed Myth
Thomas M. DuBuisson
thomas.dubuisson at gmail.com
Sun Aug 24 14:03:30 EDT 2008
> Hmm thanks, that's interesting -- I was think it was probably caused
> by OS X, but it appears to happen on Linux too. Could you try running
> the old code too, and see if you experience the order of magnitude
> slowdown too?
The original program on my Linux 2.6.26 Core2 Duo:
[tom at myhost Test]$ time ./tr-threaded 1000000
37
real 0m0.635s
user 0m0.530s
sys 0m0.077s
[tom at myhost Test]$ time ./tr-nothreaded 1000000
37
real 0m0.352s
user 0m0.350s
sys 0m0.000s
[tom at myhost Test]$ time ./tr-threaded 1000000 +RTS -N2
37
real 0m13.954s
user 0m4.333s
sys 0m5.736s
--------------------------
Seeing as there still was obviously not enough computation to justify
the OS threads in my last example, I made a test where it hashed a 32
byte string (show . md5 . encode $ val):
[tom at myhost Test]$ time ./threadring-nothreaded 1000000
50
552
real 0m1.408s
user 0m1.323s
sys 0m0.083s
[tom at myhost Test]$ time ./threadring-threaded 1000000
50
552
real 0m1.948s
user 0m1.807s
sys 0m0.143s
[tom at myhost Test]$ time ./threadring-threaded 1000000 +RTS -N2
552
50
real 0m1.663s
user 0m1.427s
sys 0m0.237s
[tom at myhost Test]$
---------------------------
Seeing as this still doesn't beat the old RTS, I decided to increase the
per unit work a little more. This code will hash 10KB every time the
token is passed / decremented.
[tom at myhost Test]$ time ./threadring-nothreaded 100000
(308,77851ef5e9e781c04850a7df9cc855d2)
real 2m56.453s
user 2m55.399s
sys 0m0.457s
[tom at myhost Test]$ time ./threadring-threaded 100000
(308,77851ef5e9e781c04850a7df9cc855d2)
real 3m6.430s
user 3m5.868s
sys 0m0.460s
[tom at myhost Test]$ time ./threadring-threaded 100000 +RTS -N2
(810,77851ef5e9e781c04850a7df9cc855d2)
(308,77851ef5e9e781c04850a7df9cc855d2)
real 1m55.616s
user 2m47.982s
sys 0m3.586s
* Yes, I notice its exiting before the output gets printed a couple
times, oh well.
-------------------------
REFLECTION
Yay, the multicore version pays off when the workload is non-trivial.
CPU utilization is still rather low for the -N2 case (70%). I think the
Haskell threads have an affinity for certain OS threads (and thus a
CPU). Perhaps it results in a CPU having both tokens of work and the
other having none?
More information about the Haskell-Cafe
mailing list