[Haskell-beginners] Reading Multiple Files and Iterate Function
Application
Daniel Fischer
daniel.is.fischer at web.de
Tue Oct 12 08:22:08 EDT 2010
On Tuesday 12 October 2010 11:02:47, Lorenzo Isella wrote:
> Thanks Thomas. Yep, I do need some extra reading unfortunately.
> One question: if I was to apply a function on many files file1,
> file2...using e.g. Python, this would be my pipeline
> read file1
> do stuff on file 1
>
> read file2
> do stuff on file 2
>
> ......
>
> Now, due to the laziness of haskell, can I here resort to this approach
>
> read file1, file2... into a single list
>
> map (do-my-stuff) on list
>
> As far as I understand, this should not result e.g. into a huge RAM
> consumptions since files are read and processed only when needed (hence
> one at the time).
> Am I on the right track?
Yes, but there are dangers on that way.
With readFile, the contents are read lazily upon demand, but the file is
opened immediately for reading. So
contentsList <- mapM readFile fileList
or
allContents <- fmap concat $ mapM readFile fileList
can make you run out of file handles if fileList is long enough.
Also, the file handles aren't closed until the entire contents of the file
has been read (there are a few situations where the handle is closed
earlier) and they're not guaranteed to be immediately closed when the end
of the file has been reached, they could linger for a GC or two.
That means you can also run out of file handles when you process the files
sequentially (if you have a bad consumption pattern).
The memory usage depends on your consumption pattern, independent of
whether theSting[s] you process come[s] from file readings or from a non-IO
generator.
If you keep references to the beginning of the list, you get a leak, if you
consume the list sequentially, it runs in small space.
> Cheers
>
> Lorenzo
More information about the Beginners
mailing list