[Haskell-cafe] STM, IO and b-trees

Mon Aug 20 22:10:33 EDT 2007

On 8/21/07, Ben <midfield at gmail.com> wrote:
> for sake of argument, suppose an enterprising haskell newbie wanted to
> code up concurrent b-trees (really b-link trees) in haskell.  if i am
> understanding STM correctly, it will NOT help in any way with the
> implementation, because of the IO-intensive nature of the algorithms?
> so i will have to resort to the usual games with locks and latches?

I have produced exactly such an implementation in my day-job (so I
can't, at this stage, give you the code, I'm afraid), but I'll happily
give you some tips:

1. Investigate relaxed balance.

BTrees with relaxed balance enable you to break up operations into
much smaller transactions, which will reduce the amount of rerunning
on transactions (big transactions are more likely to contain
conflicts).

Also, getting all the edge cases right is hard with strict balance.
Especially in the presence of deletions. It is VASTLY simpler with
relaxed balance, though there are a few little tricks. If it was too
easy, it wouldn't be any fun (see 3, below). Hint: Although the
on-disk version doesn't need or want parent pointers, you might want
them for your in-memory version of pages.

2. Separate the IO from the BTree-stuff.

Conceptually keep a <code>TVar (Map Address ByteString)</code>. In the
transaction, use this to find pages. If the page is not there, throw
an exception containing the desired address. In a wrapper, catch the
exception, read the page, add it to the map as a separate transaction
then retry the original transaction. I say "conceptually" because
something like <code>TArray Address (Maybe ByteString)</code>, or
similar will yield much better concurrency. In general, you want to
push the TVars down as far as possible.

3. Have Fun

STM is very cool, so make sure you enjoy making it all hang together. :-)