[Haskell-cafe] [newbie] processing large logs

Donald Bruce Stewart dons at cse.unsw.edu.au
Sat May 13 22:44:33 EDT 2006


martine:
> On 5/14/06, Eugene Crosser <crosser at average.org> wrote:
> >main = printMax . (foldr processLine empty) . lines =<< getContents
> >[snip]
> >The thing kinda works on small data sets, but if you feed it with
> >250,000 lines (1000 distinct), the process size grows to 200 Mb, and on
> >500,000 lines I get "*** Exception: stack overflow" (using runhaskell
> >from ghc 6.2.4).
> 
> To elaborate on Udo's point:
> If you look at the definition of foldr you'll see where the stack
> overflow is coming from:  foldr recurses all the way down to the end
> of the list, so your stack gets 250k (or attempts 500k) entries deep
> so it can process the last line in the file first, then unwinds.

Also, don't use runhaskell! Compile the code with -O :)

-- Don


More information about the Haskell-Cafe mailing list