[Haskell-beginners] Reading Multiple Files and Iterate Function Application

Tue Oct 12 08:22:08 EDT 2010

On Tuesday 12 October 2010 11:02:47, Lorenzo Isella wrote:
> Thanks Thomas. Yep, I do need some extra reading unfortunately.
> One question: if I was to apply a function on many files file1,
> file2...using e.g. Python, this would be my pipeline
> read file1
> do stuff on file 1
>
> read file2
> do stuff on file 2
>
> ......
>
> Now, due to the laziness of haskell, can I here resort to this approach
>
> read file1, file2... into a single list
>
> map (do-my-stuff) on list
>
> As far as I understand, this should not result e.g. into a huge RAM
> consumptions since files are read and processed only when needed (hence
> one at the time).
> Am I on the right track?

Yes, but there are dangers on that way.
With readFile, the contents are read lazily upon demand, but the file is 
opened immediately for reading. So

contentsList <- mapM readFile fileList

or

allContents <- fmap concat $ mapM readFile fileList

can make you run out of file handles if fileList is long enough.

Also, the file handles aren't closed until the entire contents of the file 
has been read (there are a few situations where the handle is closed 
earlier) and they're not guaranteed to be immediately closed when the end 
of the file has been reached, they could linger for a GC or two.
That means you can also run out of file handles when you process the files 
sequentially (if you have a bad consumption pattern).

The memory usage depends on your consumption pattern, independent of 
whether theSting[s] you process come[s] from file readings or from a non-IO 
generator.
If you keep references to the beginning of the list, you get a leak, if you 
consume the list sequentially, it runs in small space.

> Cheers
>
> Lorenzo