[Haskell-cafe] Re: sizeFM type

Tue Apr 27 23:41:12 EDT 2004

G'day all.

Quoting Tomasz Zielonka <t.zielonka at students.mimuw.edu.pl>:

> It seems I qualify as almost nobody,

Yes, that's why I said "almost".  I am also aware of a very, very small
number of exceptions.  (Disclaimer: I work in the database server area for
a living, so I'm possibly only thinking of the needs of our customers, not
necessarily other people.)

> cause I've had a database with more
> than 2^34 records on a single IA32 machine.

I should qualify that slightly, and add the disclaimer that even if you
are planning to put 2^34 records on a single box, it often pays to
organise this as multiple physical databases with some kind of
clustering layer on top, all on the one machine.

Splitting the problem up increases the effectiveness of each CPU's L2
cache.  Moreover, if your problem is easily partitioned (e.g. there are
a lot of queries with a particularly highly discriminating key), any
O(n log n) operations get effectively reduced to O(n/N log (n/N)).

> However, I wouldn't ever
> think of loading the entire database to RAM.

For very large databases, smart software and smart operating systems mean
that accessing RAM isn't that much less expensive than accessing disk these
days because prediction is so good.  The limiting factor is cache, and we
won't see L2 cache size increasing anywhere near 17Gb any time soon.

Cheers,
Andrew Bromage