[Haskell-beginners] space leak processing multiple compressed files

Ian Knopke ian.knopke at gmail.com
Tue Sep 4 12:00:48 CEST 2012


Hi everyone,

I have a collection of bzipped files. Each file has a different number
of items per line, with a separator between them. What I want to do is
count the items in each file. I'm trying to read the files lazily but
I seem to be running out of memory. I'm assuming I'm holding onto
resources longer than I need to. Does anyone have any advice on how to
improve this?

Here's the basic program, slightly sanitized:

main = do

    -- get a list of file names
    filelist <- getFileList "testsetdir"

    -- process each compressed file
    files <- mapM (\x -> do
                            thisfile <- B.readFile x
                            return (Z.decompress thisfile)
                    ) filelist


    display $ processEntries files


    putStrLn "finished"

-- processEntries
-- processEntries is defined elsewhere, but basically does some string
processing per line,
-- counts the number of resulting elements and sums them per file
processEntries :: [B.ByteString] -> Int
processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs

-- display a field that returns a number
display :: Int -> IO ()
display = putStrLn . show



More information about the Beginners mailing list