[Haskell-cafe] When are MVars better than STM?
cma at bitemyapp.com
Sun Jan 24 06:55:18 UTC 2016
Could you post the code please?
On Sun, Jan 24, 2016 at 12:46 AM, Thomas Koster <tkoster at gmail.com> wrote:
> Hi friends,
> Using Criterion, I have been running benchmarks to measure the
> relative performance of STM and MVars for some simple transactions
> that I expect will be typical in my application. I am using GHC 7.10.2
> and libraries as at Stackage LTS 3.2.
> I have found that STM is faster than MVars in all my benchmarks,
> without exception. This seems to go against accepted wisdom .
> I have not included my source code here to save space, but if you
> suspect that I am using MVars incorrectly, just say so and I will post
> my source code separately.
> I have two questions:
> 1. When are MVars faster than STM? If the answer is "never", then when
> are MVars "better" than STM? (Choose your own definition of "better".)
> 2. When given two capabilities (+RTS -N2), MVars are suddenly an order
> of magnitude slower than with just one capability. Why?
> For those who want details:
> My benchmark forks four Haskell threads. Each thread repeats a
> transaction that increments a shared counter many, many times. These
> transactions must be serialized. The counter is therefore highly
> contended. One version uses an MVar to store the counter in the
> obvious way. The other version uses a TVar instead.
> By the way, simply using "atomic-primops" to increment the counter
> won't do because the increment operation is actually a mock substitute
> for a more complex operation. I use the counter for my benchmarks
> because the real operation needs much more memory and I don't want the
> additional, unpredictable GC cost to affect my measurements.
> Typical measurements are:
> 1 capability, using MVar: 37.30 ms
> 1 capability, using TVar: 24.88 ms
> 2 capabilities, using MVar: 1.564 s
> 2 capabilities, using TVar: 80.09 ms
> 4 capabilities, using MVar: 2.890 s
> 4 capabilities, using TVar: 207.8 ms
> Notice that the MVar version suddenly slows by an order of magnitude
> when run with more than one capability. Why is this so? (This is
> question 2.)
> Despite the absolute time elapsed, I realize that the CPU usage
> characteristics of the two versions are also quite different. I
> realize that the MVar version interlocks the four threads so that only
> one capability is ever busy at a time, irrespective of the number of
> capabilities available, whereas the STM version allows up to four
> capabilities to be busy at once. However, I believe that the
> additional parallel transactions in the STM version would be mostly
> wasted, destined to be retried. Unless I am mistaken, this assumption
> appears to be consistent with the observation that the STM version
> with -N1 is the fastest of all. Despite all this wasted work by the
> thundering herd, the total CPU time (i.e. my power bill) for the STM
> version is still less than for the MVar version, because the MVar
> version so dramatically slow.
> Paradoxically, MVars seem to be the wrong tool for this job. So when
> are MVars faster than STM? (This is question 1.)
> Thomas Koster
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
Currently working on http://haskellbook.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Haskell-Cafe