a dream of databases

Alastair Reid alastair@reid-consulting-uk.ltd.uk
Fri, 13 Jun 2003 14:47:09 +0100


On Friday 13 June 2003 3:02 am, John Meacham wrote:
> so, I have been wanting to implement serialize to database functionality
> for haskell in a certain way which may or may not be possible..
>
> what would be nice is if I could dump an entire complex haskell data
> structure (perhaps cyclic, but not infinite) to a hash-table database [..]

This comes up every now and then.  The lack of an existing library to do 
anything like this is because there's a bunch of tricky issues to deal with.

It seems that to get anywhere with this, you have to decide not to deal with 
some of the following:

1) Writing unevaluated thunks out
    (your 'not infinite' comment above suggests you have already
   dropped this).

2) Writing out datastructures which contain functions:

    data T = App (T  -> T) T | Const Int

3) Writing out partial applications like 'Prelude.foldr (:) Nil'.

[2 and 3 lead to problems establishing function identity and 
versioning problems. 
For example, when you read that back in, will you assume that it means
the version of Prelude.foldr provided by your compiler or will you need to 
read the function (in source-, intermediate- or machine- code)?
Partial applications involving locally defined functions are especially tricky 
since we don't have a natural 'name' we can use to identify them.]

4) Versioning issues caused by change of compiler options or change of
    compiler version, or change of compiler.
    e.g., if I compile the code with -O2 and then try to read it into
    a program compiled with -O

5) Versioning issues caused by change in source code.
   e.g., if I write it out, add a new constructor to a datatype and recompile.
   e.g., if I write it out of one program and read it into another.

[4 and 5 together amount to asking 'How do you decide if two types are the 
same?  Haskell provides an answer for two types in a single program or 
translation unit but things get hazy when you have two programs, etc.]

6) Making sharing too observable.

   When you talk about writing cyclic datastructures out, you are
   talking about a  property (i.e., sharing) of Haskell datastructures
   which varies between compilers
   and might even vary with level of optimization.

   I think what you mean is that you would like the property that a
   (fully evaluated) data structure which consumes finite heap space
   should require finite disk space.  Indeed, we'd like the space to be
   at most a constant factor bigger.

  This amounts to the same thing but doesn't require us to delve into 
  black holes like defining when sharing happens in Haskell implementations.

There's probably a few more details...

--
Alastair Reid