[Haskell-cafe] Very large data structures in the heap
Dr. Olaf Klinke
o.klinke at dkfz-heidelberg.de
Fri Feb 20 16:58:42 UTC 2015
> To an extent, the virtual memory of a modern operating system solves the memory bottleneck.
> I'm curious what you were hoping for when thinking of exploiting laziness. If the search tree is truly huge, you'd probably resort to fancy (or fuzzy or both) algorithms to break up the work.
> -- Kim-Ee
Here is an update: I considered comments made by Kim-Ee and Jachim
Breitner and decided to represent my search tree as an unboxed IOVector
that stores indices in its elements. The code now looks very imperative.
It is a bit simpler though as the pure tying-the-knot code that builds the
I tried several approaches for lookups:
(i) Reconstruct the search tree (lazily) from the vector each time a query
is made. A relatively small part should actually be constructed and
garbage-collected. This turned out to be not overwhelmingly fast.
(ii) Keep the search tree structure implicit. Instead, descend the tree by
doing reads on the vector and follow the included pointers around. This
(iii) Combine both: After a pre-specified number of lookup/insert cycles,
reconstruct a part of the search tree in the heap and traverse this,
keeping the same tree for subsequent lookups. When reaching a leaf in the
heap, continue hopping around the IOVector.
Since the size of the vector is not known beforehand, inserts use
Data.Vector.Unboxed.Mutable.grow from time to time. As I read here
(https://github.com/haskell/vector/issues/36) and tested on my machine,
growing the IOVector makes a copy of it. So after my IOVector has grown to
considerable size, additional inserts slow down considerably. However,
when constructing a tree of final size 1.2 million nodes, the approach
(iii) already outperforms my pure algorithm. A tree with 18 million nodes
is no problem to construct.
More information about the Haskell-Cafe