[Haskell-beginners] getRecursiveContents - example from `Real World
Haskell'
Johann Giwer
johanngiwer at web.de
Fri Dec 5 06:14:07 EST 2008
`Real World Haskell' is a great book. I really love it. When I tried an example
from the 9th Chapter, I was a bit disappointed:
*Main> f <- getRecursiveContents "/home/johann/"
Heap exhausted;
Current maximum heap size is 128000000 bytes (122 Mb);
use `+RTS -M<size>' to increase it.
The function lookes like this:
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = do
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ \name -> do -- 1
let path = topdir </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths) -- 2
OK, I'm using a small machine and my home directory contains ~30,000 files. But
that couldn't be the real problem. And even if this function is a small example
it should work reliable.
The programming language I know best (and this is meant relative -- I'm only a
`would be programmer') is python. Python has good support for functional
programming, but no builtin tail recursion. So my first idea about the bug in
`getRecursiveContents' went in this direction. Two hours later I had worked out
this solution:
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents = getRecursiveContents' []
where
getRecursiveContents' l p =
E.handle (\_ -> return (p:l)) $ do -- 3
c <- getDirectoryContents p
let c' = filter (`notElem` [".", ".."]) c
x <- foldM (\l' p' -> getRecursiveContents' l' (p </> p')) l c' -- 4
return (x)
Folding (4) and appending (3) would give less memory usage than mapping (1) and
concatenation (2), I thought. This function worked well for small directory
(for which the original one did, too). But tested with my home directory it
went into an infinite loop. That led me to the actually problem:
`doesDirectoryExist' also accepts symlinks to directories. Another hour later
this was fixed:
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents = getRecursiveContents' []
where
getRecursiveContents' l p = do
s <- getSymbolicLinkStatus p
if isDirectory s
then
E.handle (\_ -> return (p:l)) $ do
c <- getDirectoryContents p
let c' = filter (`notElem` [".", ".."]) c
x <- foldM (\l' p' -> getRecursiveContents' l' (p </> p')) l c'
return (x)
else
return (p:l)
Finally I fixed the original function (this only took about 30 min :-). The
handle (5) catches errors caused by unreadable directories
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topdir = E.handle (\_ ->return [topdir]) $ do -- 5
names <- getDirectoryContents topdir
let properNames = filter (`notElem` [".", ".."]) names
paths <- forM properNames $ \name -> do
let path = topdir </> name
s <- getSymbolicLinkStatus path
if isDirectory s
then getRecursiveContents path
else return [path]
return (concat paths)
The imports for all functions mentioned above are:
import Control.Monad ( forM, filterM, foldM )
import qualified Control.Exception as E
import System.Directory (doesDirectoryExist, getDirectoryContents)
import System.FilePath ((</>))
import System.Posix (getSymbolicLinkStatus, isDirectory)
Any suggestions about this solution are welcome.
Johann
More information about the Beginners
mailing list