[Haskell-cafe] STM, IO and b-trees

Tue Aug 21 02:21:23 EDT 2007

On 8/21/07, Ben <midfield at gmail.com> wrote:
> some basic questions: you have limits on key and value size?  fixed
> page sizes per tree?

I have chosen Yes and Yes respectively, though these are *choices*.
Personally, I think fixed sized pages are fine. Handling unlimited
sized keys is not hard, especially if you think they will be rare: you
use a bit to flag an immediate key vs a remote key, and keep an
overflow file for long keys.

Another idea that I've had, though I have not had the opportunity to
implement it is to use arithmetic coding on the keys. If you carve up
the zero-to-one interval in sorted order, then the arithmetic coded
bitstrings sort (lexicographically) in the same order as the
uncompressed strings. From the top of my head, most *english* words
can compress in this fashion with a 0 order model to about 2-3 bits
per character. This doesn't eliminate a key length by itself, but it
should increase the limit dramatically in practice.

My implementation uses 64K pages, with a limit of 32K - 8 bytes on the
size of an individual key.

> on point 2: it sounds like you'r suggesting using STM after all.  not
> sure if i understand what you've said:

something like

type PagePtr t = TVar (Address, Maybe t)

data Node = Node (PagePtr Node) [(Key,PagePtr Node)] | Leaf [(Key,Data)]

> work with a page cache.  do the various b-tree algorithms in STM
> transactions on the cached pages.  retry STM transactions when pages
> aren't loaded. on successful STM transaction completion, write out the
> dirty pages.

Yes, except you might want to be clever about flushing dirty pages more lazily.

My implementation isn't crash-proof (i.e. doesn't support crash
recovery - the environment in which my code operated means that if
something bad happens, I can rebuild from external data).

>  probably use the trick where the STM transaction returns
> an IO action which you then perform.  probably use ordinary page-level
> locks to deal with concurrency on IO -- STM doesn't help.

Maybe. See the SPJ video on STM. His basic point is that STM helps get
rid of the stupid concurrency bugs, leaving just the "more
interesting" ones for you to deal with. :-)

Actually, using relaxed balance and making all the tree updates local,
means that my btree code successfully supports a very high degree of
concurrency.

> as far as implementing the page cache: using a TVar Map would
> essentially be creating a global lock on the entire cache.

Exactly, which is why you want to push the TVars down.

>  so you
> suggest using other implementations.  your suggestion sounds like an
> array with linear search, or probably something more sophisticated.  i
> would imagine a balanced binary tree where the nodes are TVars might
> work nice, though rebalacing would touch a lot of nodes.  (it does
> seem strange haskell doesn't have more concurrent data structures.)

Yes, I've chatted with Andrew Bromage about the need for

Data.Map.Concurrent
Data.Set.Concurrent
etc.

I have a concurrent hash table which works very nicely. Think

class Hashable t where
    hash :: t -> Word64

type HashTable k v = TArray Word64 [(k,v)]

Another alternative that others have suggested are Tries (radix
trees). Something along the lines:

type Trie v = ....

insert :: [Bit] -> v -> Trie v -> STM ()

> on point 1: you're saying relaxed balance is better for STM-style
> optimistic transactions?  i'm using
>
> B-Trees with Relaxed Balance (1993)
> Kim S. Larsen, Rolf Fagerberg
> IPPS: 9th International Parallel Processing Symposium

Yes. There's a tech report version which includes details of deletion
which IIRC the one you mention does not. citeseer... google....

The reason I believe relaxed balance works particularly well with STM
is that all the operations (insert, delete, rebalance) operate
*locally*. That is, they only modify a single node or a couple of
proximate nodes. In STM terms, this means a given operation only
*writes* a couple of TVars close to where the operation took place.
One of the cool things is that Larsen et al prove that as you progress
up the tree the number of rebalancing operations drops geometrically.

Effectively, this means you very rarely need to get a write-lock on
the root (or top few nodes) of the tree, so randomly dispersed
operations are unlikely to conflict with one another.

Also, by separating the insert/delete bits from the rebalance bits,
both are simpler to code, and you have fewer edge cases to think
about.

cheers,
T
-- 
Dr Thomas Conway
drtomc at gmail.com

Silence is the perfectest herald of joy:
I were but little happy, if I could say how much.