[Haskell-cafe] IORef vs TVar performance: 6 seconds versus 4 minutes

Mon Dec 29 04:21:39 EST 2008

On Sun, Dec 28, 2008 at 11:08 PM, Evan Laforge <qdunkan at gmail.com> wrote:
> But every STM operation has to modify a transaction log, which seems
> like it should be even more expensive than frobbing a lock bit.  So it
> seems like if the per-operation STM overhead is higher, and blocking
> contention is the same (assuming the lock implementation uses shared
> locks for reads), I don't see how the STM implementation could be
> faster.

Absolutely, although theoretically for lots of repeated operations
your transaction log remains in cache, so it shouldn't be too bad.

When I talked to Simon Peyton-Jones at MSR this fall, he talked about
locks being like "the assembly language of concurrency"; STM is then a
high-level language.  You pay some amount in performance to get
abstraction & the ability to write composable programs that remain
correct, a problem that locking hasn't been able to solve despite
decades of research.

It's always possible to decompose the final program down into one that
could be based on locks.  But this is often at the cost of
maintainability; the code becomes one mess of spaghetti locking in
order to maintain whatever invariants need to be maintained to prevent
deadlock and compose code together.

> I'm sure it's all more complicated than this... but where exactly is
> the STM performance coming from in those papers?  Also, they disclaim
> in the testing section that the STM implementation was immature, but
> now parallel GC and perhaps STM doesn't do so much dynamic allocation
> (?), shouldn't the STM numbers be even better?

The paper probably got its performance from the "optimistic
concurrency" that STM allows; transactions that read many shared parts
of a data structure but only modify a small amount that is likely far
away from other transactions, like in a binary tree, is ideal for STM
performance.

You are correct that multiple-reader locks could get some of this
benefit back.  But those systems have serious drawbacks when a
read-only lock needs to be changed into a read-write lock in the
middle of an operation.  There's also the same deadlock/lock-ordering
problems as in any lock-based solution.  STM cuts this Gordian knot by
forcing all lock-taking actions to be otherwise pure, besides the
mutation to the data structures protected by the locks.  This way, a
failing transaction can always just be killed and restarted if
something goes wrong.  It might be possible to enforce the same sort
of purity on top of a lock-based system.

I don't think the STM runtime has gotten a lot of love since that
paper; given that TVar operations are using a linear search over the
transaction log, I suspect it could go a long way if it had a capable
volunteer.  This is part of the motivation behind trying to port the
concurrency substrate from C into Haskell [1]; getting STM out of the
GHC RTS and into the hands of library writers will put a lot more
eyeballs on the performance problems.

  -- ryan

[1] Li, Peng.  Programmable Concurrency in a Pure and Lazy Language.
http://www.seas.upenn.edu/~lipeng/homepage/