[Haskell-cafe] When are MVars better than STM?

Sun Jan 24 06:46:22 UTC 2016

Hi friends,

Using Criterion, I have been running benchmarks to measure the
relative performance of STM and MVars for some simple transactions
that I expect will be typical in my application. I am using GHC 7.10.2
and libraries as at Stackage LTS 3.2.

I have found that STM is faster than MVars in all my benchmarks,
without exception. This seems to go against accepted wisdom [1][2][3].
I have not included my source code here to save space, but if you
suspect that I am using MVars incorrectly, just say so and I will post
my source code separately.

I have two questions:

1. When are MVars faster than STM? If the answer is "never", then when
are MVars "better" than STM? (Choose your own definition of "better".)

2. When given two capabilities (+RTS -N2), MVars are suddenly an order
of magnitude slower than with just one capability. Why?

For those who want details:

My benchmark forks four Haskell threads. Each thread repeats a
transaction that increments a shared counter many, many times. These
transactions must be serialized. The counter is therefore highly
contended. One version uses an MVar to store the counter in the
obvious way. The other version uses a TVar instead.

By the way, simply using "atomic-primops" to increment the counter
won't do because the increment operation is actually a mock substitute
for a more complex operation. I use the counter for my benchmarks
because the real operation needs much more memory and I don't want the
additional, unpredictable GC cost to affect my measurements.

Typical measurements are:

1 capability, using MVar: 37.30 ms
1 capability, using TVar: 24.88 ms
2 capabilities, using MVar: 1.564 s
2 capabilities, using TVar: 80.09 ms
4 capabilities, using MVar: 2.890 s
4 capabilities, using TVar: 207.8 ms

Notice that the MVar version suddenly slows by an order of magnitude
when run with more than one capability. Why is this so? (This is
question 2.)

Despite the absolute time elapsed, I realize that the CPU usage
characteristics of the two versions are also quite different. I
realize that the MVar version interlocks the four threads so that only
one capability is ever busy at a time, irrespective of the number of
capabilities available, whereas the STM version allows up to four
capabilities to be busy at once. However, I believe that the
additional parallel transactions in the STM version would be mostly
wasted, destined to be retried. Unless I am mistaken, this assumption
appears to be consistent with the observation that the STM version
with -N1 is the fastest of all. Despite all this wasted work by the
thundering herd, the total CPU time (i.e. my power bill) for the STM
version is still less than for the MVar version, because the MVar
version so dramatically slow.

Paradoxically, MVars seem to be the wrong tool for this job. So when
are MVars faster than STM? (This is question 1.)

[1] https://stackoverflow.com/questions/15439966/when-why-use-an-mvar-over-a-tvar
[2] https://www.reddit.com/r/haskell/comments/39ef3y/ioref_vs_mvar_vs_tvar_vs_tmvar/
[3] https://mail.haskell.org/pipermail/haskell-cafe/2014-January/112158.html

Thanks,
Thomas Koster