[Haskell-beginners] space leak processing multiple compressed files
edwards.benj at gmail.com
Tue Sep 4 15:38:52 CEST 2012
You might want to look at conduits if you need deterministic and prompt
finalisation. I would sketch out a solution but I have only my phone.
On Sep 4, 2012 2:36 PM, "Ian Knopke" <ian.knopke at gmail.com> wrote:
> Hi Lorenzo,
> You're correct. Well spotted! I must have created that doing some copy
> and paste. The program is basically as you suggested it. Here's a
> corrected version:
> main = do
> -- get a list of file names
> filelist <- getFileList "testsetdir"
> -- process each compressed file
> files <- mapM (\x -> do
> thisfile <- B.readFile x
> return (Z.decompress thisfile)
> ) filelist
> display $ processEntries files
> putStrLn "finished"
> -- processEntries
> -- processEntries is defined elsewhere, but basically does some string
> -- processing per line, counts the number of resulting elements and
> sums them per file
> processEntries :: [B.ByteString] -> Int
> processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs
> I'm still running into memory issues though. I think it's the mapM
> loop above and that each file is not being released after reading
> through it. Does that seem reasonable, and is there any way to write
> this better?
> ... and countItems uses foldl'
> On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <lbolla at gmail.com> wrote:
> > On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.knopke at gmail.com>
> >> main = do
> >> -- get a list of file names
> >> filelist <- getFileList "testsetdir"
> >> -- process each compressed file
> >> files <- mapM (\x -> do
> >> thisfile <- B.readFile x
> >> return (Z.decompress thisfile)
> >> ) filelist
> >> display $ processEntries files
> >> putStrLn "finished"
> >> -- processEntries
> >> -- processEntries is defined elsewhere, but basically does some string
> >> processing per line,
> >> -- counts the number of resulting elements and sums them per file
> >> processEntries :: [B.ByteString] -> Int
> >> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
> > The problem seems to be your `processEntries` function: it is
> > recursively defined, and as far as I understand, it's never going to
> > end because "y" (inside the lambda function) is always going to be the
> > full list of files (xs).
> > Probably, `processEntries` should be something like:
> > processEntries = foldl' (\acc fileContent -> acc + processFileContent
> > fileContent) 0
> > processFileContent :: B.ByteString -> Int
> > processFileContent = -- count what you have to, in a file
> > In fact, processEntries could be rewritten without using foldl':
> > processEntries = sum . map processFileContent
> > hth,
> > L.
> Beginners mailing list
> Beginners at haskell.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beginners