[Haskell-cafe] How to deal with huge text file?
Magicloud Magiclouds
magicloud.magiclouds at gmail.com
Tue May 25 02:12:13 EDT 2010
Yes, this code works with a little hack. Thank you.
On Tue, May 25, 2010 at 11:06 AM, Daniel Fischer
<daniel.is.fischer at web.de> wrote:
> On Tuesday 25 May 2010 04:26:07, Ivan Miljenovic wrote:
>> On 25 May 2010 12:20, Magicloud Magiclouds
>>
>> <magicloud.magiclouds at gmail.com> wrote:
>> > This is the function. The problem sure seems like something was
>> > preserved unexpected. But I cannot find out where is the problem.
>> >
>> > seperateOutput file =
>> > let content = lines file
>> > indexOfEachOutput_ = fst $ unzip $ filter (\(i, l) ->
>> > " Log for "
>> > `isPrefixOf` l ) $ zip [0..] content indexOfEachOutput =
>> > indexOfEachOutput_ ++ [length content] in
>>
>> ^^^^^^^^^^^^^^^^
>>
>> Expensive bit
>>
>> > map (\(a, b) ->
>> > drop a $ take b content
>> > ) $ zip indexOfEachOutput $ tail indexOfEachOutput
>>
>> You're not "streaming" the String; you're also keeping it around to
>> calculate the length (I'm also not sure how GHC optimises that if at
>> all; it might even re-evaluate the length each time you use
>> indexOfEachOutput.
>
> Not that it helps, but it evaluates the length only once.
> But it does that at the very end, when dealing with the last log.
>
>>
>> The zipping of indexOfEachOutput should be OK without that length at
>> the end, as it will lazy construct the zipped list (only evaluating up
>> to two values at a time). However, you'd be better off using "zipWith
>> f" rather than "map f . zip".
>
> There'd still be the problem of
>
> drop a $ take b content
>
> , so nothing can be garbage collected before everything's done.
>
> separateOutpout file =
> let contents = lines file
> split = break ("Log for " `isPrefixOf`)
> msplit [] = Nothing
> msplit lns = Just (split lns)
> in drop 1 $ unfoldr msplit contents
>
> should fix it.
>
>
--
竹密岂妨流水过
山高哪阻野云飞
More information about the Haskell-Cafe
mailing list