[Haskell-cafe] Storing big datasets

Bardur Arantsson spam at scientician.net
Fri May 6 20:41:57 UTC 2016

On 05/06/2016 10:28 PM, Mikhail Volkhov wrote:
> Hi!
> I'm using ACID package as main database -- it's simple and... ACID
> (which is cool of course).

Just for clarification: Do you mean acid-state?

> So now I need to store up to ~30GB of data in a single data structure
> (!) that will be constantly updated (some kind of a huge tree set).
> Here's a question -- how to operate that big structure?
> 1. It doesn't even fit in RAM

Yes it does. My desktop machine has 32GB RAM.

I'm not trying to brag or anything, and *you* may not have that amount
of memory, but it *should* be cheap enough to buy enough RAM if it
avoids radically redesigning an existing system (with all the associated
risk, developer time, etc.).

If this is a one-off or rare occurrence, you could get 160GB RAM on an
Amazon M4 instance (m4.10xlarge) if you wanted to. (Amazon's pricing
structure is incredibly opaque and I didn't do a thorough investigation,
so I apologize if this is wrong.)

> 2. It should be updated atomically and frequently (every 10 seconds up
> to 500 elements out of 10^7).

This is a bit vague. Do you mean that an *average* of 500 (out of 10^7)
will be updated every 10 seconds?

> 3. What structures should I use? I'd like to store up to 10^6~10^7 some
> simple elements there too, that will be gigabytes of data. So it seems
> to me I can't use Data.Set.

Well, if storing things in RAM is out of the question, so is using Data.Set.


More information about the Haskell-Cafe mailing list