[Haskell-beginners] Data.Binary.Get for large files
Daniel Fischer
daniel.is.fischer at web.de
Fri Apr 30 18:10:48 EDT 2010
Am Freitag 30 April 2010 23:06:07 schrieb Philip Scott:
> Hi Daniel
>
> > Replace getFloat64le with e.g. getWord64le to confirm.
> > The reading of IEEE754 floating point numbers seems rather
> > complicated. Maybe doing it differently could speed it up, maybe not.
>
> That speeds things up by a factor of about 100 :)
Yes, I too.
>
> I think there must be some efficiency to be extracted from there
> somewhere.. Either the IEEE module
Look at the code. It does a lot of hard work. That is probably necessary to
ensure correctness, but it's sloow.
If you feel like playing with fire,
{-# LANGUAGE BangPatterns, MagicHash #-}
import qualified Data.ByteString.Lazy as BL
import Data.Binary
import Data.Binary.Get
import GHC.Prim
import Data.Word
getFloat64le :: Get Double
getFloat64le = fmap unsafeCoerce# getWord64le
myGetter !acc = do
e <- isEmpty
if e then
return acc
else do
!t <- getFloat64le
myGetter ((t+acc)/2)
may work on your system (no warranties, you know what 'unsafe' means, don't
you?).
> or the Data.Binary.Get.
Considering that it's quick enough getting Word64, you won't get much
improvement there.
>
> Is it possible to get the profiler to look deeper than the top level
> module?
Lots of {-# SCC "foo" #-} pragmas. Or create a local copy of the module and
import that, then -prof -auto-all should give more info.
> With all the options I could find, it only ever tells me about
> things in the file I am dealing with..Hm, 200MB file => ~25 million
> Doubles, such a list needs at least 400MB.
>
> > Still a long way to 2GB. I suspect you construct a list of thunks, not
> > Doubles.
>
> I think you are almost certainly right. Is there an easy way to see
> if/how/where this is happening?
Read the core, profile with all -hx flags and look at the profiles, show
the code to more experienced Haskellers.
>
> Thanks once again,
>
> Philip
More information about the Beginners
mailing list