[Haskell-cafe] Looking for an efficient tree in STM

Sat Mar 18 17:46:30 EST 2006

G'day all.

On Wed, Mar 08, 2006 at 01:50:06PM +0200, Einar Karttunen wrote:

> Does anyone have an efficient tree implemented in STM that
> supports concurrent updates in an efficient fashion?

One could easily rewrite this question as:

   Does anyone have an efficient tree that
   supports concurrent updates in an efficient fashion?

The short answer is: If by "tree" you mean "binary search tree", then
the answer is no.  If you ever come up with an efficient concurrent
binary search tree, please collect your PhD.

The longer answer is: If you want to know what data structures can be
made efficiently concurrent, look at the data structures that database
programmers use.

In general, the popular data structures are those which allow for
tunable granularity.  There must be well-partitioned regions of the
data structure which are big enough so that locking them doesn't cost
too much and are small enough so that you get effective concurrency.

As an example, hash tables are popular because it's easily partitioned
(you can lock groups of hash chains independently), and it's highly
tunable (the hash table size and the number of hash chains per lock are
both good knobs).

B-trees are popular for a similar reason.  A node is an obvious unit of
granularity, since different threads can work in different nodes without
interfering.  Not only is the page size tunable, there is also an obvious
way to "group" nodes together such that locks can cover more than one
physical node if you need that.

Covering more than one node with a lock makes slightly more sense with
a B-tree than with a hash table, because B-tree operations can require
restructuring the tree, which requires holding more than one lock.
Usually this is only a small fraction of operations (it depends on the
workload, but 1% is a good rule of thumb), but grouping physical nodes
together reduces the number of locks you need to hold even further.

I've talked about "locks" here, but of course lock contention is the
same thing as thread contention in STM.  It's just that the details are
different, and details are important in concurrent programming.

Cheers,
Andrew Bromage