[Haskell-cafe] Re: Processing of large files
jgoerzen at complete.org
Tue Nov 2 21:00:47 EST 2004
On 2004-11-02, Peter Simons <simons at cryp.to> wrote:
> John Goerzen writes:
> >> Read and process the file in blocks:
> > I don't think that would really save much memory [...]
> Given that the block-oriented approach has constant space
> requirements, I am fairly confident it would save memory.
Perhaps a bit, but not a significant amount.
> > and in fact, would likely just make the code a lot more
> > complex. It seems like a simple wrapper around
> > hGetContents over a file that uses block buffering would
> > suffice.
> Either your algorithm can process the input in blocks or it
> cannot. If it can, it doesn't make one bit a difference if
> you do I/O in blocks, because your algorithm processes
> blocks anyway. If your algorithm is *not* capable of
Yes it does. If you don't set block buffering, GHC will call read()
separately for *every* single character. (I've straced stuff!) This is
a huge performance penalty for large files. It's a lot more efficient
if you set block buffering in your input, even if you are using interact
and lines or words to process it.
More information about the Haskell-Cafe