[Haskell-beginners] Data.Binary.Get for large files

Daniel Fischer daniel.is.fischer at web.de
Thu Apr 29 21:12:50 EDT 2010


Am Freitag 30 April 2010 00:37:59 schrieb Philip Scott:
> Hello again folks,
>
> Sorry to keep troubling you - I'm very appreciative of the help you've
> given so far. I've got one more for you that has got me totally stumped.
> I'm writing a program which deals with largish-files, the one I am using
> as a test case is not stupidly large at about 200mb. After three
> evenings, I have finally gotten rid of all the stack overflows, but I am
> unfortunately left with something that is rather unfeasably slow. I was
> hoping someone with some keener skills than I could take a look, I've
> tried to distill it to the simplest case.
>
> This program just reads in a file, interpreting each value as a double,
> and does a sort of running average on them. The actual function doesn't
> matter too much, I think it is the reading it in that is the problem.

Replace getFloat64le with e.g. getWord64le to confirm.
The reading of IEEE754 floating point numbers seems rather complicated.
Maybe doing it differently could speed it up, maybe not.

> This takes about three minutes to run on my (fairly modern) laptop.. The
> equivilant C program takes about 5 seconds.

Are you sure that it's really equivalent?

>
> I'm sure I am doing something daft, but I can't for the life of me see
> what. Any hints about how to get the profiler to show me useful stuff
> would be much appreciated!
>
> All the best,
>
> Philip
>
> PS: If, instead of computing a single value I try and build a list of
> the values, the program ends up using over 2gb of memory to read a 200mb
> file.. any ideas on that one?

Hm, 200MB file => ~25 million Doubles, such a list needs at least 400MB.
Still a long way to 2GB. I suspect you construct a list of thunks, not 
Doubles.



More information about the Beginners mailing list