[Haskell] Files and lazyness
Simon Marlow
simonmar at microsoft.com
Tue Aug 2 07:43:34 EDT 2005
On 01 August 2005 16:05, Cale Gibbard wrote:
> Your problem is, as you pointed out, that readFile does lazy IO.
> Although the semantics of it can be a bit confusing at times, it is
> useful for applications where you have a large file which is being
> consumed, and you don't want to allocate all of the memory for it
> before doing any processing. Laziness lets you read the file as needed
> -- you may not even need it all, depending on what is being done. This
> is quite helpful when you have something like a couple gigabytes of
> data on disk which needs processing. It can however be confusing at
> first that it may not finish reading the file before the file is
> altered, or, in situations involving handles, before the handle is
> closed.
>
> You can write a strict IO version of readFile in Haskell as follows:
>
> import IO
>
> hGetContents' hdl = do e <- hIsEOF hdl
> if e then return []
> else do c <- hGetChar hdl
> cs <- hGetContents' hdl
> return (c:cs)
>
> readFile' fn = do hdl <- openFile fn ReadMode
> xs <- hGetContents' hdl
> hClose hdl
> return xs
>
> If you use readFile', it will ensure that the entire file is read and
> memory for the string is allocated before continuing. This ought to
> solve your problem.
Note that hGetContents' is likely to be slow, because it does repeated
hGetChar. Also it is not tail-recursive, so it will run out of stack
for a large file. It would be better to read the whole file into memory
but lazilly convert it to a String, like so:
import System.IO
import System.IO.Unsafe
import Foreign
import Data.Char
readFile' f = do
h <- openFile f ReadMode
s <- hFileSize h
fp <- mallocForeignPtrBytes (fromIntegral s)
len <- withForeignPtr fp $ \buf -> hGetBuf h buf (fromIntegral s)
lazySlurp fp 0 len
lazySlurp :: ForeignPtr Word8 -> Int -> Int -> IO String
lazySlurp fp ix len
| ix == len = return []
| otherwise = do
c <- withForeignPtr fp $ \p -> peekElemOff p ix
cs <- unsafeInterleaveIO (lazySlurp fp (ix+1) len)
return (chr (fromIntegral c) : cs)
Actually, I recommend always using this version unless you need to deal
with really large files. Performance won't be quite as good as
readFile, but I've attached a tweaked version that performs much better.
Cheers,
Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: readfile.hs
Type: application/octet-stream
Size: 956 bytes
Desc: readfile.hs
Url : http://www.haskell.org//pipermail/haskell/attachments/20050802/c4d01bec/readfile.obj
More information about the Haskell
mailing list