[Haskell-cafe] Re: Processing of large files
simons at cryp.to
Wed Nov 3 07:51:38 EST 2004
John Goerzen writes:
>> Given that the block-oriented approach has constant space
>> requirements, I am fairly confident it would save memory.
> Perhaps a bit, but not a significant amount.
>> > [read/processing blocks] would likely just make the
>> > code a lot more complex. [...]
>> Either your algorithm can process the input in blocks or
>> it cannot. If it can, it doesn't make one bit a
>> difference if you do I/O in blocks, because your
>> algorithm processes blocks anyway.
> Yes it does. If you don't set block buffering, GHC will
> call read() separately for *every* single character.
I referred to the alleged complication of code, not to
whether the handle's 'BufferingMode' influences the
performance or not.
> (I've straced stuff!)
How many read(2) calls does this code need?
import Control.Monad ( when )
import Foreign.Marshal.Array ( allocaArray, peekArray )
import Data.Word ( Word8 )
main :: IO ()
main = do
h <- openBinaryFile "/etc/profile" ReadMode
hSetBuffering h NoBuffering
n <- fmap cast (hFileSize h)
buf <- allocaArray n $ \ptr -> do
rc <- hGetBuf h ptr n
when (rc /= n) (fail "huh?")
buf' <- peekArray n ptr :: IO [Word8]
return (map cast buf')
cast :: (Enum a, Enum b) => a -> b
cast = toEnum . fromEnum
> It's a lot more efficient if you set block buffering in
> your input, even if you are using interact and lines or
> words to process it.
Of course it is. Which is why an I/O-bound algorithm should
process blocks. It's more efficient. And uses slightly less
memory, too. Although I have been told it's not a
More information about the Haskell-Cafe