[Haskell-beginners] space leak processing multiple compressed files

Benjamin Edwards edwards.benj at gmail.com
Tue Sep 4 15:38:52 CEST 2012


You might want to look at conduits if you need deterministic and prompt
finalisation. I would sketch out a solution but I have only my phone.
On Sep 4, 2012 2:36 PM, "Ian Knopke" <ian.knopke at gmail.com> wrote:

> Hi Lorenzo,
>
> You're correct. Well spotted! I must have created that doing some copy
> and paste. The program is basically as you suggested it. Here's a
> corrected version:
>
> main = do
>
>     -- get a list of file names
>     filelist <- getFileList "testsetdir"
>
>     -- process each compressed file
>     files <- mapM (\x -> do
>                             thisfile <- B.readFile x
>                             return (Z.decompress thisfile)
>                     ) filelist
>
>     display $ processEntries files
>
>     putStrLn "finished"
>
> -- processEntries
> -- processEntries is defined elsewhere, but basically does some string
> -- processing per line, counts the number of resulting elements and
> sums them per file
> processEntries :: [B.ByteString] -> Int
> processEntries xs = foldl' (\x y -> x + countItems (B.lines y)) 0 xs
>
> I'm still running into memory issues though. I think it's the mapM
> loop above and that each file is not being released after reading
> through it. Does that seem reasonable, and is there any way to write
> this better?
>
>
> Ian
>
>
>
> ... and countItems uses foldl'
> On Tue, Sep 4, 2012 at 1:55 PM, Lorenzo Bolla <lbolla at gmail.com> wrote:
> > On Tue, Sep 4, 2012 at 11:00 AM, Ian Knopke <ian.knopke at gmail.com>
> wrote:
> >> main = do
> >>
> >>     -- get a list of file names
> >>     filelist <- getFileList "testsetdir"
> >>
> >>     -- process each compressed file
> >>     files <- mapM (\x -> do
> >>                             thisfile <- B.readFile x
> >>                             return (Z.decompress thisfile)
> >>                     ) filelist
> >>
> >>
> >>     display $ processEntries files
> >>
> >>
> >>     putStrLn "finished"
> >>
> >> -- processEntries
> >> -- processEntries is defined elsewhere, but basically does some string
> >> processing per line,
> >> -- counts the number of resulting elements and sums them per file
> >> processEntries :: [B.ByteString] -> Int
> >> processEntries xs = foldl' (\x y -> x + processEntries (B.lines y)) 0 xs
> >
> > The problem seems to be your `processEntries` function: it is
> > recursively defined, and as far as I understand, it's never going to
> > end because "y" (inside the lambda function) is always going to be the
> > full list of files (xs).
> >
> > Probably, `processEntries` should be something like:
> >
> > processEntries = foldl' (\acc fileContent -> acc + processFileContent
> > fileContent) 0
> >
> > processFileContent :: B.ByteString -> Int
> > processFileContent = -- count what you have to, in a file
> >
> > In fact, processEntries could be rewritten without using foldl':
> > processEntries = sum . map processFileContent
> >
> > hth,
> > L.
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20120904/ecd101fb/attachment.htm>


More information about the Beginners mailing list