[Haskell] Files and lazyness

Simon Marlow simonmar at microsoft.com
Tue Aug 2 07:43:34 EDT 2005


On 01 August 2005 16:05, Cale Gibbard wrote:

> Your problem is, as you pointed out, that readFile does lazy IO.
> Although the semantics of it can be a bit confusing at times, it is
> useful for applications where you have a large file which is being
> consumed, and you don't want to allocate all of the memory for it
> before doing any processing. Laziness lets you read the file as needed
> -- you may not even need it all, depending on what is being done. This
> is quite helpful when you have something like a couple gigabytes of
> data on disk which needs processing. It can however be confusing at
> first that it may not finish reading the file before the file is
> altered, or, in situations involving handles, before the handle is
> closed.
> 
> You can write a strict IO version of readFile in Haskell as follows:
> 
> import IO
> 
> hGetContents' hdl = do e <- hIsEOF hdl
>                        if e then return []
>                             else do c <- hGetChar hdl
>                                     cs <- hGetContents' hdl
>                                     return (c:cs)
> 
> readFile' fn = do hdl <- openFile fn ReadMode
>                   xs <- hGetContents' hdl
>                   hClose hdl
>                   return xs
> 
> If you use readFile', it will ensure that the entire file is read and
> memory for the string is allocated before continuing. This ought to
> solve your problem.

Note that hGetContents' is likely to be slow, because it does repeated
hGetChar.  Also it is not tail-recursive, so it will run out of stack
for a large file.  It would be better to read the whole file into memory
but lazilly convert it to a String, like so:

import System.IO
import System.IO.Unsafe
import Foreign
import Data.Char

readFile' f = do
  h <- openFile f ReadMode
  s <- hFileSize h
  fp <- mallocForeignPtrBytes (fromIntegral s)
  len <- withForeignPtr fp $ \buf -> hGetBuf h buf (fromIntegral s)
  lazySlurp fp 0 len

lazySlurp :: ForeignPtr Word8 -> Int -> Int -> IO String
lazySlurp fp ix len
  | ix == len = return []
  | otherwise = do
     c <- withForeignPtr fp $ \p -> peekElemOff p ix
     cs <- unsafeInterleaveIO (lazySlurp fp (ix+1) len)
     return (chr (fromIntegral c) : cs)

Actually, I recommend always using this version unless you need to deal
with really large files.  Performance won't be quite as good as
readFile, but I've attached a tweaked version that performs much better.

Cheers,
	Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readfile.hs
Type: application/octet-stream
Size: 956 bytes
Desc: readfile.hs
Url : http://www.haskell.org//pipermail/haskell/attachments/20050802/c4d01bec/readfile.obj


More information about the Haskell mailing list