[Haskell-cafe] Re: Processing of large files

Tue Nov 2 21:00:47 EST 2004

On 2004-11-02, Peter Simons <simons at cryp.to> wrote:
> John Goerzen writes:
>
> >> Read and process the file in blocks:
>
> > I don't think that would really save much memory [...]
>
> Given that the block-oriented approach has constant space
> requirements, I am fairly confident it would save memory.

Perhaps a bit, but not a significant amount.

> > and in fact, would likely just make the code a lot more
> > complex. It seems like a simple wrapper around
> > hGetContents over a file that uses block buffering would
> > suffice.
>
> Either your algorithm can process the input in blocks or it
> cannot. If it can, it doesn't make one bit a difference if
> you do I/O in blocks, because your algorithm processes
> blocks anyway. If your algorithm is *not* capable of

Yes it does.  If you don't set block buffering, GHC will call read()
separately for *every* single character.  (I've straced stuff!)  This is
a huge performance penalty for large files.  It's a lot more efficient
if you set block buffering in your input, even if you are using interact
and lines or words to process it.

-- John