[Haskell-beginners] Data.Binary.Get for large files

MAN elviotoccalino at gmail.com
Thu Apr 29 20:46:08 EDT 2010


I can't find the error in your code (assuming there is an error), so I'm
checking the code you didn't write, and the only thing that set off an
alarm was...

getFloat64le :: Get Double
getFloat64le = getFloat (ByteCount 8) $ splitBytes . reverse

splitBytes :: [Word8] -> RawFloat

...that every chunk read in the Get monad is being reversed, so that you
can take one float (and you are taking in over 26 million floats) in
little endian. I really don't know if this hits performance, but I
assume the C equivalent would be reading an array in reverse order.
I am more than willing to believe this is not the cause of such
performance loss, but can't find a reason.


PS1: "(e == True)" == "e"

PS2:
I know it's not important, but I can't help it: that is not an average
you're computing...

El jue, 29-04-2010 a las 23:37 +0100, Philip Scott escribió:
> Hello again folks, 
> 
> Sorry to keep troubling you - I'm very appreciative of the help you've
> given so far. I've got one more for you that has got me totally
> stumped. I'm writing a program which deals with largish-files, the one
> I am using as a test case is not stupidly large at about 200mb. After
> three evenings, I have finally gotten rid of all the stack overflows,
> but I am unfortunately left with something that is rather unfeasably
> slow. I was hoping someone with some keener skills than I could take a
> look, I've tried to distill it to the simplest case. 
> 
> This program just reads in a file, interpreting each value as a
> double, and does a sort of running average on them. The actual
> function doesn't matter too much, I think it is the reading it in that
> is the problem. Here's the code: 
> 
> import Control.Exception 
> import qualified Data.ByteString.Lazy as BL 
> import Data.Binary.Get 
> import System.IO 
> import Data.Binary.IEEE754 
> 
> myGetter acc = do 
>     e <- isEmpty 
>     if e == True 
>         then 
>             return acc 
>         else do 
>             t <- getFloat64le 
>             myGetter $! ((t+acc)/2) 
> 
> myReader file = do 
>     h <- openBinaryFile file ReadMode 
>     bs <- BL.hGetContents h 
>     return $ runGet (myGetter 0)  bs 
> 
> main = do 
>     d <- myReader "data.bin" 
>     evaluate d 
> 
> This takes about three minutes to run on my (fairly modern) laptop..
> The equivilant C program takes about 5 seconds. 
> 
> I'm sure I am doing something daft, but I can't for the life of me see
> what. Any hints about how to get the profiler to show me useful stuff
> would be much appreciated! 
> 
> All the best, 
> 
> Philip 
> 
> PS: If, instead of computing a single value I try and build a list of
> the values, the program ends up using over 2gb of memory to read a
> 200mb file.. any ideas on that one? 
> 
> 
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners




More information about the Beginners mailing list