getting a Binary module into the standard libs

Simon Marlow simonmar@microsoft.com
Mon, 11 Nov 2002 17:57:02 -0000


> This doesn't seem like an awful lot of work, and if it would=20
> help getting
> Binary into the hier libs, I'd be more than happy to do it. =20
> Given Eray's comments, it might turn out to be faster.

Great!

> Simon: I have a version of the GHC Binary module, but I know that I've
> mucked around with it quite a bit.  I've also got the one off=20
> of cvs in
> ghc/compiler/utils/Binary.hs, which would probably be a better
> (read: safer) starting place.  Basically we want to add two functions:
>=20
>   putBit :: BinHandle -> Bool -> IO ()
>   getBit :: BinHandle -> Bool -> IO ()

I'd do it this way:

  putBits :: BinHandle -> Int{-size-} -> Int{-value-} -> IO ()

and similarly for getBits.  It will be easiest if the size is not
allowed to go over 8, because then you have to deal with endianness, and
in any case we already have put for Int16, Int32 etc. written in terms
of putWord8.

Currently the binary format is endian-independent for the basic integral
types.  If you use Int, then a binary file written on one machine is
still only useable on a machine of the same word size, but if you want a
truly mobile binary file you can restrict yourself to the explicitly
sized integral types.  I think this is a nice property to keep.

> It seems that in order to accomplish this, the BinMem=20
> constructor needs to
> be augmented with two fields, one Word8 which contains bits which have
> been "put" but haven't yet been written to the array and another Word8
> which stores the current bit position we are at in this=20
> Word8.  Then, the
> work comes down mostly to bit-twiddling in the putWord8 and putBit
> functions (putBit being the simpler of the two).  It seems the BinIO
> constructor would require basically the identical thing, which means
> perhaps this stuff should be added to the BinHandleState variable.

BinMem and BinIO differ quite a bit here: for BinMem you can write
straight into the array, whereas for BinIO we need a cache - a single
byte at the least, but ideally more.  BinMem is the most important case
to optimise (for us in GHC anyhow), since BinIO is already significantly
slower due to the overhead of the Handle interface.

There should really be a closeBin function too; it's quite simple to
add.

Cheers,
	Simon