[Haskell-cafe] [newbie] processing large logs
martine at danga.com
Sat May 13 22:34:13 EDT 2006
On 5/14/06, Eugene Crosser <crosser at average.org> wrote:
> main = printMax . (foldr processLine empty) . lines =<< getContents
> The thing kinda works on small data sets, but if you feed it with
> 250,000 lines (1000 distinct), the process size grows to 200 Mb, and on
> 500,000 lines I get "*** Exception: stack overflow" (using runhaskell
> from ghc 6.2.4).
To elaborate on Udo's point:
If you look at the definition of foldr you'll see where the stack
overflow is coming from: foldr recurses all the way down to the end
of the list, so your stack gets 250k (or attempts 500k) entries deep
so it can process the last line in the file first, then unwinds.
More information about the Haskell-Cafe