No "last core parallel slowdown" on OS X
dons at galois.com
Tue Apr 21 04:57:39 EDT 2009
> 2009/4/20 Dave Bayer <bayer at cpw.math.columbia.edu>:
> > I ran some longer trials, and noticed a further pattern I wish I could
> > explain:
> > I'm comparing the enumeration of the roughly 69 billion atomic lattices on
> > six atoms, on my four core, 2.4 GHz Q6600 box running OS X, against an eight
> > core, 2 x 3.16 Ghz Xeon X5460 box at my department running Linux. Note that
> > my processor now costs $200 (it's the venerable "Dodge Dart" of quad core
> > chips), while the pair of Xeon processors cost $2400. The Haskell code is
> > straightforward; it uses bit fields and reverse search, but it doesn't take
> > advantage of symmetry, so it must "touch" every lattice to complete the
> > enumeration. Its memory footprint is insignificant.
> > Never mind 7 cores, Linux performs worse before it runs out of cores.
> > Comparing 1, 2, 3, 4 cores on each machine, look at "real" and "user" time
> > in minutes, and the ratio:
> > Linux
> > 2 x 3.16 GHz Xeon X5460
> > 1 2 3 4
> > 466.7 250.8 183.7 149.3
> > 466.4 479.0 505.2 528.1
> > 1.00 1.91 2.75 3.54
> > OS X
> > 2.4 GHx Q6600
> > 1 2 3 4
> > 676.9 359.4 246.7 191.4
> > 673.4 673.7 675.9 674.8
> > 0.99 1.87 2.74 3.53
> > These ratios match up like physical constants, or at least invariants of my
> > Haskell implementation. However, the user time is constant on OS X, so these
> > ratios reflect the actual parallel speedup on OS X. The user time climbs
> > steadily on Linux, significantly diluting the parallel speedup on Linux.
> > Somehow, whatever is going wrong in the interaction between Haskell and
> > Linux is being captured in this increase in user time.
> We can't necessarily blame this on Linux: the two machines have
> different hardware. There could be cache-effects at play, for
> Maybe you could try the new affinity options (+RTS -qa) and see if
> that makes any difference? That would reduce the effect of scheduling
> effects due to the OS (although when the number of cores you use is
> less than the real number of cores in the machine, the OS is still
> free to move threads around. To get reliable numbers you should
> really disable some of the cores at boot-time).
Little advice and tidbits are creeping out of Simon's head.
Is it time for a parallel performance wiki, where every question that
becomes an FAQ gets documented live?
Maybe put details on the wiki so we can grow a large FAQ to capture this
More information about the Glasgow-haskell-users