[Haskell-cafe] Compressed Data.Map for more efficient RAM usage?

Johan Tibell johan.tibell at gmail.com
Thu Feb 16 23:27:45 CET 2012


On Thu, Feb 16, 2012 at 1:51 PM, Jeremy Shaw <jeremy at n-heptane.com> wrote:
> Sometimes we  want to store very large collection types in RAM -- such as a
> Data.Map or Data.IxSet.
>
> It seems like we could trade-off some speed for space savings by compressing
> the values in RAM.

Not knowing the actual data you want to store or the usage pattern I
cannot quite say if these suggestions will be of use:

* If the data is used in a read-only fashion (e.g. as a big lookup
table,) consider using an SSTable
(http://en.wikipedia.org/wiki/SSTable). The wiki page doesn't contain
a lot of documentation but I think you can find one implemented in
Cassandra. An SSTable is a sorted table from byte keys and byte
values. It's highly memory efficient as the keys can be prefix encoded
(as they are stored sorted) and whole blocks of the table can be
compressed using e.g. Zippy.

* If you need mutability you could consider using LevelDB.

* If you need a pure data structure consider using a type-specialized
version of the HAMT data structure which I use in the experimental
hamt branch  of unordered-containers
(https://github.com/tibbe/unordered-containers). The HAMT has quite
low overhead as is and if you specialize the key and value types (at
the cost of lots of code duplication) you can decrease the per
key/value overhead with another 4 words per such pair.

-- Johan



More information about the Haskell-Cafe mailing list