[Haskell-cafe] How to deal with huge text file?

Ivan Miljenovic ivan.miljenovic at gmail.com
Mon May 24 22:26:07 EDT 2010


On 25 May 2010 12:20, Magicloud Magiclouds
<magicloud.magiclouds at gmail.com> wrote:
> This is the function. The problem sure seems like something was
> preserved unexpected. But I cannot find out where is the problem.
>
> seperateOutput file =
>  let content = lines file
>      indexOfEachOutput_ = fst $ unzip $ filter (\(i, l) ->
>                                                 " Log for " `isPrefixOf` l
>                                               ) $ zip [0..] content
>      indexOfEachOutput = indexOfEachOutput_ ++ [length content] in

     ^^^^^^^^^^^^^^^^

     Expensive bit
>  map (\(a, b) ->
>         drop a $ take b content
>      ) $ zip indexOfEachOutput $ tail indexOfEachOutput

You're not "streaming" the String; you're also keeping it around to
calculate the length (I'm also not sure how GHC optimises that if at
all; it might even re-evaluate the length each time you use
indexOfEachOutput.

The zipping of indexOfEachOutput should be OK without that length at
the end, as it will lazy construct the zipped list (only evaluating up
to two values at a time).  However, you'd be better off using "zipWith
f" rather than "map f . zip".

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com


More information about the Haskell-Cafe mailing list