[Haskell-cafe] Storing big datasets

Joachim Durchholz jo at durchholz.org
Sun May 8 07:41:49 UTC 2016

Am 08.05.2016 um 03:34 schrieb Lana Black:
> On 11:17 Sat 07 May     , Joachim Durchholz wrote:
>> Am 07.05.2016 um 02:27 schrieb Lana Black:
>>> Hi Mikhail,
>>> Have you considered external database engines? I suppose you could benefit from using Redis in your case.
>> Wikipedia says that while Redis can use the disk as "virtual memory",
>> that feature is deprecated, so it definitely expects to keep the whole
>> dataset in memory. Which kind of defeats the whole point for Mikhail.
> I believe we're talking about different things here. This page [1] says
> that disk persistence is a standard feature in redis.
 > [1] http://redis.io/topics/persistence

I see. Seems like WP got something wrong here.
Actually the claims look quite nice, even though I'd want to verify 
these before actually relying on them as usual with claims :-)

> It is certainly possible to get 2k requests per second with postgresql [2]
> on a fairly limited hardware if network latency is taken out of the picture.
> Again this heavily depends on the kind of data and other conditions.
> [2] https://gist.github.com/chanks/7585810

Ah, that's about scaling up under concurrent load. Mikhail's use case 
does not have multiple processes per data instance.

It's an interesting data point though. I've been hearing figures of "a 
few hundred transactions per second, max, unless you have dedicated 
hardware", but now I'm wondering how things might work out in a 
single-threading scenario. It could easily be faster, though it could 
also easily be slower because the engine can't parallelize work.
I have to admit I have no idea how this would turn out.


More information about the Haskell-Cafe mailing list