Persistant (as in on disk) data

Simon Marlow simonmar@microsoft.com
Thu, 6 Mar 2003 16:26:51 -0000


> > > i'd like to be able to dump data structures to disk, and=20
> later load=20
> > > them.
> >=20
> > A Binary library was discussed recently on the libraries list.
> > I think the outstanding issues are ...
> >
> >   (a) is the API for GHC's Binary library acceptable, or do we need
> >       the extra bells and whistles that the NHC version has?
>=20
> In particular, the NHC version is platform-independent with regard
> to endian-ness issues, whereas I believe the GHC version is not?

No, the GHC version *is* endian-independent.  But it's not word-size
independent, so moving a binary file from a machine with a 32-bit Int to
a machine with a 64-bit Int won't work (unless you avoid Int and stick
to fixed-size types and Integer).

[snip]
> >   (c) how do we derive instances of Binary?
>=20
> The nhc98 compiler already accepts a `deriving Binary' clause.
> DrIFT likewise already supports {-! derives : Binary !-}.  I imagine
> it wouldn't be too difficult to use GHC's support for Generics to code
> up a simple deriving mechanism as well.
>=20
> > IMHO: something is better than nothing, so I'd be in favour of just
> > plugging in the Binary library from GHC, and marking it=20
> "experimental".
>=20
> Something I have never got round to asking before is what were the
> perceived defects in the nhc98 Binary library that encouraged (a) Sven
> to rewrite it for GHC, and (b) Simon to rewrite it again?

>From my point of view, speed was the main driving factor.

If Sven's library had worked, I guess I'd probably have started from
that.  I looked at it and didn't immediately understand it, so I started
by throwing out everything from the API that I didn't need and then
wrote an implementation of that, using bytes rather than bits on the
assumption that the extra simplicity would yield better performace.

Most of the changes I made to the API were for performance reasons.  eg.
using put_ instead of put enables tail-calls to kick in more often (and
makes Binary instances slightly easier to write).  I needed some
laziness, but didn't need the full generality of getF and I was
suspicious of its efficiency, so I wrote some more fine-grained
lazy-reading operations in the IO monad instead.

It would be nice if whatever Binary library we adopted for the
hierarchical libraries could be used by GHC, but it's not essential.
Since our goals are firmly focussed on performance, that might not fit
with the prevailing demand from other users, so I'm agnostic about which
Binary implementation gets used.  It's more important to get *something*
that people can use.

Cheers,
	Simon