[Haskell-cafe] Information on B-Tree IO implemenations
Scott Weeks
weeksie at twelvestone.com
Mon Nov 7 14:40:50 EST 2005
Thanks Malcolm, that is a great explanation.
> In this code, in fact we
> managed to keep the lookup function pure, because the Binary interface
> allows referentially-transparent I/O (getFAt), which may not be
> possible in other serialisation libraries. This facility however
> depends on an invariant that the B-Tree can only be extended - there
> is no operation to remove nodes or data.
That's perfect. The insertion functions clear up the IO part of the
equation immensely and since I will need to be able to delete nodes and
data it will allow me to implement search/delete myself and cement a
bit of this in my head.
Cheers, that was a great help.
On 07/11/2005, at 11:12 PM, Malcolm Wallace wrote:
> Scott Weeks <weeksie at twelvestone.com> writes:
>
>> For a project that I'm working on I need to implement a B-Tree or
>> similar disk based data structure for storing large amounts of data. I
>> feel like I'm getting it and then I get hung up on disk IO operations
>> and trying to figure out how they should fit into the mix.
>
> Here is a first-cut specification of a B-Tree:
>
> data BTree k d = BTree Int [(k,[d])] [BTree k d]
>
> where k = key and d = stored data. Obviously this does not actually
> store the B-Tree on file, it just gives a flavour of the intended
> structure.
>
> In order to store the data on file, we need to introduce an indirection
> corresponding to a file pointer, on each of the child nodes.
>
> data BTree k d = BTree Int [(k,[d])] [FilePtr (BTree k d)]
>
> Then you need a serialisation routine to write any one node of the
> tree out and read it back in. And your tree-traversal routines
> (lookup, insert, etc) need to be embedded in the IO monad, so they
> can do the necessary file access.
>
> Attached is an implementation of B-Trees (by Colin Runciman), which
> uses nhc98's Binary class
> http://haskell.org/nhc98/libs/Binary.html
> for serialisation. The type BinHandle corresponds to a file handle,
> and Bin corresponds to a file pointer. In this code, in fact we
> managed to keep the lookup function pure, because the Binary interface
> allows referentially-transparent I/O (getFAt), which may not be
> possible in other serialisation libraries. This facility however
> depends on an invariant that the B-Tree can only be extended - there
> is no operation to remove nodes or data. However the insertion
> routine clearly must use I/O to extend the tree. I hope you can see
> the pattern of how this I/O works.
>
> Note also that the B-Tree structure is stored in BPages, separate
> from the data values hanging off the tree, which are in DBlocks.
>
> Hope this helps,
> Regards,
> Malcolm
> <BTree.hs>_______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list