[Haskell-beginners] Processing a list of files the Haskell way

Michael Schober Micha-Schober at web.de
Sat Mar 10 12:55:06 CET 2012


Hi everyone,

I'm currently trying to solve a problem in which I have to process a 
long list of files, more specifically I want to compute MD5 checksums 
for all files.

I have code which lists me all the files and holds it in the following 
data structure:

data DirTree = FileNode FilePath | DirNode FilePath [DirTree]

I tried the following:

-- calculates MD5 sums for all files in a dirtree
addChecksums :: DirTree -> IO [(DirTree,MD5Digest)]
addChecksums dir = addChecksums' [dir]
   where
     addChecksums' :: [DirTree] -> IO [(DirTree,MD5Digest)]
     addChecksums' [] = return []
     addChecksums' (f@(FileNode fp):re) = do
       bytes <- BL.readFile fp
       rest <- addChecksums' re
       return ((f,md5 bytes):rest)
     addChecksums' ((DirNode fp filelist):re) = do
       efiles <- addChecksums' filelist
       rest <- addChecksums' re
       return $ efiles ++ rest


This works fine, but only for a small number of files. If I try it on a 
big directory tree, the memory gets junked up and it aborts with an 
error message telling me that there are too many open files.

So I guess, I have to sequentialize the code a little bit more. But at 
the same time, I want to keep it as functional as possible and I don't 
want to write C-like code.

What would be the Haskell way to do something like this?

Thanks for all the input,
Michael






More information about the Beginners mailing list