No "last core parallel slowdown" on OS X

Simon Marlow marlowsd at gmail.com
Tue Apr 21 04:39:40 EDT 2009


2009/4/20 Dave Bayer <bayer at cpw.math.columbia.edu>:
> I ran some longer trials, and noticed a further pattern I wish I could
> explain:
>
> I'm comparing the enumeration of the roughly 69 billion atomic lattices on
> six atoms, on my four core, 2.4 GHz Q6600 box running OS X, against an eight
> core, 2 x 3.16 Ghz Xeon X5460 box at my department running Linux. Note that
> my processor now costs $200 (it's the venerable "Dodge Dart" of quad core
> chips), while the pair of Xeon processors cost $2400. The Haskell code is
> straightforward; it uses bit fields and reverse search, but it doesn't take
> advantage of symmetry, so it must "touch" every lattice to complete the
> enumeration. Its memory footprint is insignificant.
>
> Never mind 7 cores, Linux performs worse before it runs out of cores.
> Comparing 1, 2, 3, 4 cores on each machine, look at "real" and "user" time
> in minutes, and the ratio:
>
> Linux
> 2 x 3.16 GHz Xeon X5460
> 1       2       3       4
> 466.7   250.8   183.7   149.3
> 466.4   479.0   505.2   528.1
> 1.00    1.91    2.75    3.54
>
> OS X
> 2.4 GHx Q6600
> 1       2       3       4
> 676.9   359.4   246.7   191.4
> 673.4   673.7   675.9   674.8
> 0.99    1.87    2.74    3.53
>
> These ratios match up like physical constants, or at least invariants of my
> Haskell implementation. However, the user time is constant on OS X, so these
> ratios reflect the actual parallel speedup on OS X. The user time climbs
> steadily on Linux, significantly diluting the parallel speedup on Linux.
> Somehow, whatever is going wrong in the interaction between Haskell and
> Linux is being captured in this increase in user time.

We can't necessarily blame this on Linux: the two machines have
different hardware.  There could be cache-effects at play, for
example.

Maybe you could try the new affinity options (+RTS -qa) and see if
that makes any difference?  That would reduce the effect of scheduling
effects due to the OS (although when the number of cores you use is
less than the real number of cores in the machine, the OS is still
free to move threads around.  To get reliable numbers you should
really disable some of the cores at boot-time).

Cheers,
  Simon


More information about the Glasgow-haskell-users mailing list