[Haskell-cafe] question about Data.Binary and Double instance

Tue Apr 17 22:34:58 EDT 2007

On Tue, 2007-04-17 at 10:32 -0700, David Roundy wrote:
> Hi all,
> 
> I'm wondering what exactly inspired the decode/encodeFloat implementation
> for Data.Binary? It seems to me like it'd be much better to use a standard
> format like IEEE, which would also be much more efficient, since as far as
> I know, on every implementation a Double and a CDouble are identical.
> 
> Are there any suggestions how I could use Data.Binary to actually read a
> binary file full of Doubles? Should I just use the Array interface, and
> forget laziness and hopes of handling different-endian machines? Or is
> there some way to reasonably do this using Data.Binary?

Hi David,

We'd like to use IEEE format as the default Data.Binary serialisation
format for Haskell's Float and Double type, the only thing that makes
this tricky is doing it portably and efficiently.

We can't actually guarantee that we have any IEEE format types
available. The isIEEE will tell you if a particular type is indeed IEEE
but what do we do if isIEEE CDouble = False ?

Perhaps we just don't care about ARM or other arches where GHC runs that
do not use IEEE formats, I don't know. If that were the case we'd say
something like:

instance Binary Double where
  put d = assert (isIEEE (undefined :: Double)) $ do
            write (poke d)

If we do care about ARM and the like then we need some way to translate
from the native Double encoding to an IEEE double external format. I
don't know how to do that. I also worry we'll end up with lots of
#ifdefs.

The other problem with doing this efficiently is that we have to worry
about alignment for that poke d operation. If we don't know the
alignment we have to poke into an aligned side buffer and copy over.
Similar issues apply to reading.

I'm currently exploring more design ideas for Data.Binary including how
to deal with alignment. Eliminating unnecessary bounds checks and using
aligned memory operations also significantly improves performance. I can
get up to ~750Mb/s serialisation out of a peak memory bandwidth of
~1750Mb/s, though a Haskell word-writing loop can only get ~850Mb/s.

Duncan