[Haskell-cafe] Haskell Speed Myth

Sun Aug 24 14:03:30 EDT 2008

> Hmm thanks, that's interesting -- I was think it was probably caused  
> by OS X, but it appears to happen on Linux too.  Could you try running  
> the old code too, and see if you experience the order of magnitude  
> slowdown too?

The original program on my Linux 2.6.26 Core2 Duo:

[tom at myhost Test]$ time ./tr-threaded 1000000
37

real    0m0.635s
user    0m0.530s
sys     0m0.077s
[tom at myhost Test]$ time ./tr-nothreaded 1000000
37

real    0m0.352s
user    0m0.350s
sys     0m0.000s
[tom at myhost Test]$ time ./tr-threaded 1000000 +RTS -N2
37

real    0m13.954s
user    0m4.333s
sys     0m5.736s

--------------------------

Seeing as there still was obviously not enough computation to justify
the OS threads in my last example, I made a test where it hashed a 32
byte string (show . md5 . encode $ val):
[tom at myhost Test]$ time ./threadring-nothreaded 1000000
50
552

real    0m1.408s
user    0m1.323s
sys     0m0.083s
[tom at myhost Test]$ time ./threadring-threaded 1000000
50
552

real    0m1.948s
user    0m1.807s
sys     0m0.143s
[tom at myhost Test]$ time ./threadring-threaded 1000000 +RTS -N2
552
50

real    0m1.663s
user    0m1.427s
sys     0m0.237s
[tom at myhost Test]$ 

---------------------------

Seeing as this still doesn't beat the old RTS, I decided to increase the
per unit work a little more.  This code will hash 10KB every time the
token is passed / decremented.

[tom at myhost Test]$ time ./threadring-nothreaded 100000
(308,77851ef5e9e781c04850a7df9cc855d2)

real    2m56.453s
user    2m55.399s
sys     0m0.457s

[tom at myhost Test]$ time ./threadring-threaded 100000         
(308,77851ef5e9e781c04850a7df9cc855d2)

real    3m6.430s
user    3m5.868s
sys     0m0.460s

[tom at myhost Test]$ time ./threadring-threaded 100000 +RTS -N2
(810,77851ef5e9e781c04850a7df9cc855d2)
(308,77851ef5e9e781c04850a7df9cc855d2)

real    1m55.616s
user    2m47.982s
sys     0m3.586s

* Yes, I notice its exiting before the output gets printed a couple
times, oh well.

-------------------------
REFLECTION

Yay, the multicore version pays off when the workload is non-trivial.
CPU utilization is still rather low for the -N2 case (70%).  I think the
Haskell threads have an affinity for certain OS threads (and thus a
CPU).  Perhaps it results in a CPU having both tokens of work and the
other having none?