ByteStrings and the ram barrier

Tue May 16 05:10:04 EDT 2006

On Fri, May 12, 2006 at 04:07:47PM +1000, Donald Bruce Stewart wrote:
> The theory is that we'd be able to efficiently process large data, where
> both ByteString and [Char] fails, and open a new range of applications
> that could be handled successfully with Haskell.

My large data files are already divided into reasonably sized chunks
and I think this approach is quite widespread - at least Google also
processes much of their data in chunks.

To process my data with Haskell, I would have to be able to decode it
into records with efficiency close to what I achieve in C++ (say, at
least 80% as fast). Until now I managed to get 33% of it in reasonably
pure Haskell, which is not that bad IMO. However, I feel that I've hit a
barrier now and will have to use raw Ptr's or FFI. Maybe I could try
pushing it through the haskell-cafe optimisation ;-)

Anyway, the point is that large data tends to be divided into smaller
chunks not only because it's impossible to load the whole file into
memory, but also to allow random access, to help distributing the
computation over many computers, etc. So, I am not sure Haskell would
gain that much by being able to process terabytes of data in one go.

On the other hand, this is quite cool and I am probably wrong, being
concentrated on my needs.

Best regards
Tomasz