[Haskell-cafe] Storing big datasets

Sun May 8 01:34:08 UTC 2016

On 11:17 Sat 07 May     , Joachim Durchholz wrote:
> Am 07.05.2016 um 02:27 schrieb Lana Black:
> > Hi Mikhail,
> >
> > Have you considered external database engines? I suppose you could benefit from using Redis in your case.
> 
> Wikipedia says that while Redis can use the disk as "virtual memory", 
> that feature is deprecated, so it definitely expects to keep the whole 
> dataset in memory. Which kind of defeats the whole point for Mikhail.

I believe we're talking about different things here. This page [1] says
that disk persistence is a standard feature in redis. Furthermore, my
employer uses redis as a storage backend for raw video, and it works
just fine. But yes, redis might not be the best choice depending on what
data Mikhail works with.

> sqlite comes to mind. HDBC claims to have a driver for it, and it's 
> linked into the application so there would be no additional setup required.
> 
> If Mikhail already has another database in his application, the setup 
> cost is already paid for, and he might want to check whether that's fast 
> enough for his purposes.
> I'd expect postgresql or mysql to be too slow, but H2 to work just fine. 
> Of course that's just expectations, so testing would be needed.

It is certainly possible to get 2k requests per second with postgresql [2]
on a fairly limited hardware if network latency is taken out of the picture.
Again this heavily depends on the kind of data and other conditions.

[1] http://redis.io/topics/persistence
[2] https://gist.github.com/chanks/7585810