[Haskell-cafe] [newbie] processing large logs
Donald Bruce Stewart
dons at cse.unsw.edu.au
Sat May 13 22:44:33 EDT 2006
martine:
> On 5/14/06, Eugene Crosser <crosser at average.org> wrote:
> >main = printMax . (foldr processLine empty) . lines =<< getContents
> >[snip]
> >The thing kinda works on small data sets, but if you feed it with
> >250,000 lines (1000 distinct), the process size grows to 200 Mb, and on
> >500,000 lines I get "*** Exception: stack overflow" (using runhaskell
> >from ghc 6.2.4).
>
> To elaborate on Udo's point:
> If you look at the definition of foldr you'll see where the stack
> overflow is coming from: foldr recurses all the way down to the end
> of the list, so your stack gets 250k (or attempts 500k) entries deep
> so it can process the last line in the file first, then unwinds.
Also, don't use runhaskell! Compile the code with -O :)
-- Don
More information about the Haskell-Cafe
mailing list