[Haskell-beginners] stack overflow summing numbers read from a big file

mukesh tiwari mukeshtiwari.iiitm at gmail.com
Sat Mar 23 13:11:10 CET 2013


Hi Axel
If you see the type of your function
Prelude> :t (  show . sum . map read . words )
(  show . sum . map read . words ) :: String -> String
It takes a string and return string.

On Sat, Mar 23, 2013 at 6:12 PM, Axel Wegen <axel.wegen at gmail.com> wrote:

> When I try to run the following code on a 50M file consisting of one
> number per line I receive a `Stack space overflow' error and the program
> seems to consume a lot of memory. Why does that happen?
>

It's already mentioned there "A String is represented as a list of
Charvalues; each element of a list is allocated individually, and has
some
book-keeping overhead. These factors affect the memory consumption and
performance of a program that must read or write text or binary data. On
simple benchmarks like this, even programs written in interpreted languages
such as Python can outperform Haskell code that uses String by an order of
magnitude".


> And how can I
> fix the problem without increasing the Stack with -Ksize hoping that it
> will be big enough? Any general advice on processing big files with
> Haskell?
>

Each ByteString type performs better under particular circumstances. For
streaming a large quantity (hundreds of megabytes to terabytes) of data,
the lazy ByteString type is usually best. Its chunk size is tuned to be
friendly to a modern CPU's L1 cache, and a garbage collector can quickly
discard chunks of streamed data that are no longer being used.


>
> -- sumFile.hs
> -- adapted from Real World Haskell's SumFile.hs at the beginning of
> -- Chapter 8
> main = interact sumFile
>   where sumFile = show . sum . map read . words
>
> ghc -o sumFile sumFile.hs
> ./sumFile < ./numbers
>

See if this code is working ( I haven't tested it on big file )

import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Maybe ( fromJust )

readI :: BS.ByteString -> Integer
readI = fst . fromJust . BS.readInteger

main = BS.interact sumFile where
     sumFile =  BS.pack . show . sum . map readI . BS.words


-Mukesh

>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130323/f5bbd14e/attachment.htm>


More information about the Beginners mailing list