No "last core parallel slowdown" on OS X

Tue Apr 21 04:57:39 EDT 2009

marlowsd:
> 2009/4/20 Dave Bayer <bayer at cpw.math.columbia.edu>:
> > I ran some longer trials, and noticed a further pattern I wish I could
> > explain:
> >
> > I'm comparing the enumeration of the roughly 69 billion atomic lattices on
> > six atoms, on my four core, 2.4 GHz Q6600 box running OS X, against an eight
> > core, 2 x 3.16 Ghz Xeon X5460 box at my department running Linux. Note that
> > my processor now costs $200 (it's the venerable "Dodge Dart" of quad core
> > chips), while the pair of Xeon processors cost $2400. The Haskell code is
> > straightforward; it uses bit fields and reverse search, but it doesn't take
> > advantage of symmetry, so it must "touch" every lattice to complete the
> > enumeration. Its memory footprint is insignificant.
> >
> > Never mind 7 cores, Linux performs worse before it runs out of cores.
> > Comparing 1, 2, 3, 4 cores on each machine, look at "real" and "user" time
> > in minutes, and the ratio:
> >
> > Linux
> > 2 x 3.16 GHz Xeon X5460
> > 1       2       3       4
> > 466.7   250.8   183.7   149.3
> > 466.4   479.0   505.2   528.1
> > 1.00    1.91    2.75    3.54
> >
> > OS X
> > 2.4 GHx Q6600
> > 1       2       3       4
> > 676.9   359.4   246.7   191.4
> > 673.4   673.7   675.9   674.8
> > 0.99    1.87    2.74    3.53
> >
> > These ratios match up like physical constants, or at least invariants of my
> > Haskell implementation. However, the user time is constant on OS X, so these
> > ratios reflect the actual parallel speedup on OS X. The user time climbs
> > steadily on Linux, significantly diluting the parallel speedup on Linux.
> > Somehow, whatever is going wrong in the interaction between Haskell and
> > Linux is being captured in this increase in user time.
> 
> We can't necessarily blame this on Linux: the two machines have
> different hardware.  There could be cache-effects at play, for
> example.
> 
> Maybe you could try the new affinity options (+RTS -qa) and see if
> that makes any difference?  That would reduce the effect of scheduling
> effects due to the OS (although when the number of cores you use is
> less than the real number of cores in the machine, the OS is still
> free to move threads around.  To get reliable numbers you should
> really disable some of the cores at boot-time).
> 

Little advice and tidbits are creeping out of Simon's head.

Is it time for a parallel performance wiki, where every question that
becomes an FAQ gets documented live?

    http://haskell.org/haskellwiki/Performance/Parallel

Maybe put details on the wiki so we can grow a large FAQ to capture this
"oral tradition".

-- Don