[Haskell-cafe] Haskell as data manipulation language?

Sat Dec 4 00:50:29 EST 2004

Hi,

I am probably not the first to think about this, yet I was unable to 
find any signs of activities in this direction, so maybe I will be able 
to know something from this list.

First of all, reading multiple messages about global and mutable 
variables desired in Haskell, I was thinking: isn't the root of the 
problem in the fact that there are no variables in Haskell at all?

Indeed (and my experience of digging in Hugs sources confirms) Haskell 
program (function) is just a description of some memory structure which 
being completed by the bindings of its arguments, is further transformed 
by the evaluator into some other data structure, etc.

Now imagine a data storage with structure similar to one of a Haskell 
program. I. e. a storage having metadata-objects like data constructor 
functions,  data types which are parents to data constructors (like in 
data X = Foo Int | Bar Float), classes, instances, modules, etc. Also 
data objects themselves, i. e. precompiled and serialized functions 
along with their type information. Data objects do not differ from 
functions i. e. they are stored the same way.

Now quoting Graham Klyne

Subject was: [HAskell-Cafe] Mutable and persistent values (was: Top 
Level TWI's again) posted 2004-11-23 05:11:25 PST:

(I tried to post reply via Google Groups, but it did not make it back to 
the list):

 > It is my view that use a functional language effectively, it is 
necessary
 > to think differently about its structure than one would for a 
conventional
 > imperative program.  Global values become parameters and return
 > values;  mutable values become input values and new (separate) return
 > values.  If you're used to writing imperative programs that do not use
 > global state, but rather use parameters to pass values, the difference
 > isn't always quite as great as might be imagined.
 >
 > For complex values, this sounds horribly inefficient, but because values
 > are pure (not mutable), very high levels of sharing can (sometimes, with
 > good design!) be extensively shared.  Thus, a return value may well 
consist
 > mostly of a copy of the input value, with differences, rather than 
actually
 > being a complete new copy.  (Chris Okasaki's book shows well how to 
design
 > to obtain these benefits.)
 >
 > Also, because Haskell is lazy, complex structures passed between 
functions
 > don't necessarily ever exist in their entirety.  Sometimes, I think of a
 > data structure as describing a computational traversal through some 
data,
 > rathert than something that actually exists.  (This leads to very 
elegant
 > expression of things like Prolog search-and-backtrack algorithms.)

So, if there is a data object (say, a list) stored with version number 
N, and another value is cons'ed to the list, then version N+1 will be 
stored as an application of cons to the value added, and the version N.

Thus, via versioning, some kind of mutability may be achieved (and of 
course these check-in and check-out must belong to IO monad).

These stored functions must be visible to the compiler, and when 
compiling some function f which calls g (compiled earlier and stored as 
version N), the code of this version of f (say, M) is built to call 
version N of g further on. So, all the type checks are done during 
compilation, and are not necessary during runtime, and yet the functions 
are compiled and stored separately (trying to address the issue with too 
large monolithic programs).

Newer version of f (say M+1) may be compiled against newer (at that 
time) version of g (N+1). So different versions of the same function may 
demonstrate different behaviors, but the "history of development" is 
preserved.

Put together, this gives some conceptual image of a "programmable data 
storage" with queries being calls to stored functions (with externally 
supplied agruments). All good things of Haskell like static type safety 
are preserved.

I am thinking of implementation of something like this using Hugs 
runtime as the basis. I realize, this may be very time consuming, so I 
am wondering, maybe someone already is developing something like this?

Or someone already tried, and found this inefficient or in some other 
way not feasible to implement?

Dimitry Golubovsky
Middletown, CT