[GHC] #8578: Improvements to SpinLock implementation

Fri Dec 6 13:58:40 UTC 2013

#8578: Improvements to SpinLock implementation
-------------------------------------+------------------------------------
        Reporter:  parcs             |            Owner:  parcs
            Type:  task              |           Status:  patch
        Priority:  normal            |        Milestone:
       Component:  Runtime System    |          Version:  7.7
      Resolution:                    |         Keywords:
Operating System:  Unknown/Multiple  |     Architecture:  Unknown/Multiple
 Type of failure:  None/Unknown      |       Difficulty:  Unknown
       Test Case:                    |       Blocked By:
        Blocking:                    |  Related Tickets:
-------------------------------------+------------------------------------

Comment (by simonmar):

 Here are my results with `-N4` on an Intel Core i7-3770 (4 cores, 8
 threads).

 {{{
 --------------------------------------------------------------------------------
         Program           Size    Allocs   Runtime   Elapsed  TotalMem
 --------------------------------------------------------------------------------
    blackscholes          +0.0%     +0.0%     -1.7%     -2.4%     -0.3%
           coins          +0.0%     -0.0%     +0.4%     +1.0%     -8.6%
            gray          +0.0%     +0.0%    +15.1%    +14.3%     +0.0%
          mandel          +0.0%     +0.0%     +3.3%     +3.3%     -0.8%
         matmult          +0.0%     +8.1%     -2.4%     -2.6%     +0.0%
         minimax          +0.0%     +0.0%     -1.3%     -1.1%     +0.0%
           nbody          +0.0%     -6.0%     -1.9%      0.06     +0.0%
          parfib          +0.0%     +0.1%    +16.2%    +16.2%     +0.0%
         partree          +0.0%     -0.0%     +1.0%     +0.5%     -3.0%
            prsa          +0.0%     -0.1%     +1.1%     +0.9%     +0.0%
          queens          +0.0%     -0.5%     -1.3%     -0.5%     +7.1%
             ray          +0.0%     -0.3%     -0.4%     -0.5%     +0.0%
        sumeuler          +0.0%     +0.0%     +1.0%     +1.0%     +0.0%
       transclos          +0.0%     +0.0%     +1.2%     +1.4%     +0.0%
 --------------------------------------------------------------------------------
             Min          +0.0%     -6.0%     -2.4%     -2.6%     -8.6%
             Max          +0.0%     +8.1%    +16.2%    +16.2%     +7.1%
  Geometric Mean          +0.0%     +0.1%     +2.0%     +2.3%     -0.4%
 }}}

 Not good!  Two programs (gray and parfib) are significantly worse.

 The effect is real, here is the timing info for parfib before and after:

 {{{
 5.70user 0.00system 0:01.43elapsed 397%CPU (0avgtext+0avgdata
 20816maxresident)k
 6.52user 0.00system 0:01.64elapsed 397%CPU (0avgtext+0avgdata
 21568maxresident)k
 }}}

 I wonder whether not using a locked instruction in the spinlock might
 cause the loop to spin for longer, because it takes longer for the memory
 write to reach the core that is waiting for it?

 Someone could probably dig into this further with perf.  But the lesson
 here, as usual, is to always benchmark and don't just assume that because
 it looks good it will work in practice!

--
Ticket URL: <http://ghc.haskell.org/trac/ghc/ticket/8578#comment:7>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler