a dream of databases
Alastair Reid
alastair@reid-consulting-uk.ltd.uk
Fri, 13 Jun 2003 14:47:09 +0100
On Friday 13 June 2003 3:02 am, John Meacham wrote:
> so, I have been wanting to implement serialize to database functionality
> for haskell in a certain way which may or may not be possible..
>
> what would be nice is if I could dump an entire complex haskell data
> structure (perhaps cyclic, but not infinite) to a hash-table database [..]
This comes up every now and then. The lack of an existing library to do
anything like this is because there's a bunch of tricky issues to deal with.
It seems that to get anywhere with this, you have to decide not to deal with
some of the following:
1) Writing unevaluated thunks out
(your 'not infinite' comment above suggests you have already
dropped this).
2) Writing out datastructures which contain functions:
data T = App (T -> T) T | Const Int
3) Writing out partial applications like 'Prelude.foldr (:) Nil'.
[2 and 3 lead to problems establishing function identity and
versioning problems.
For example, when you read that back in, will you assume that it means
the version of Prelude.foldr provided by your compiler or will you need to
read the function (in source-, intermediate- or machine- code)?
Partial applications involving locally defined functions are especially tricky
since we don't have a natural 'name' we can use to identify them.]
4) Versioning issues caused by change of compiler options or change of
compiler version, or change of compiler.
e.g., if I compile the code with -O2 and then try to read it into
a program compiled with -O
5) Versioning issues caused by change in source code.
e.g., if I write it out, add a new constructor to a datatype and recompile.
e.g., if I write it out of one program and read it into another.
[4 and 5 together amount to asking 'How do you decide if two types are the
same? Haskell provides an answer for two types in a single program or
translation unit but things get hazy when you have two programs, etc.]
6) Making sharing too observable.
When you talk about writing cyclic datastructures out, you are
talking about a property (i.e., sharing) of Haskell datastructures
which varies between compilers
and might even vary with level of optimization.
I think what you mean is that you would like the property that a
(fully evaluated) data structure which consumes finite heap space
should require finite disk space. Indeed, we'd like the space to be
at most a constant factor bigger.
This amounts to the same thing but doesn't require us to delve into
black holes like defining when sharing happens in Haskell implementations.
There's probably a few more details...
--
Alastair Reid