Marshalling functions was: Transmitting Haskell values

Wed Oct 29 15:23:06 EST 2003

Alastair Reid wrote:
>> Is [marshaling functions] something absolutely impossible in 
>> Haskell and by what reason? Just because of strong typing (forgive
>> my stupidity ;)? Or are there some deeper theoretical limitations?
>> 
> 
> The big theoretical issue is whether it would provide an Eq or Show 
> instance for -> by the backdoor.  Careful API design could avoid the
> worst of this.

What's the problem with a Show instance for ->? (Or, put another way:
what makes the current Show instance in the Prelude harmless compared to
a fuller implementation?)

I'm not sure whether having an Eq instance for -> would be so bad.
General equality satisfies two constraints:
1. For all functions (including field access functions and whatnot),
"equal" inputs will return "equal" outputs.
2. True generality is the finest partitioning that satisfies (1).

(1) is a recursion (and a massive one), (2) specifies we want the least
fixed point of the recursion. Also, it's (2) that makes equality
undecidable.

(1) is essential and cannot be changed.

Would changing (2) be dangerous? What are the dangers?
For example, we could define -> equality to be source code equality,
modulo a well-defined set of semantics-preserving transformations (such
as translation to core language and renaming local variables - nothing
fancy, I'd consider f x = x to be different from f = \x x).

> A semi-theoretical issue is to do with sharing of values, libraries 
> and types. A simple function like '\ x -> x' is fairly easy to write
> out because it doesn't refer to any other objects but what about:
> 
> \ x -> head [ y | (x',y) <- table, x == x' ]
> 
> This refers to an object 'table' that will have to be accessible from
>  the receiver.  Choices are to access the object over the network, to
>  copy the object or to move the object and access it remotely from 
> the sender.

The default semantics for marshalling such an object would be to copy
'table' along with the function.
Essentially it's the responsibility of the programmer to make sure that
he isn't inadvertently exporting the entire state of the computation.
(Actually sometimes that's exactly what's wanted: dump out the entire
state of the computation, say, as a checkpoint to safeguard against
potential crashes.)

> Remote access requires a network connection and suddenly adding two 
> numbers can raise a TCP/IP exception.

Right, but, in principle, adding two numbers can generate a stack
overflow or an out-of-memory error. There's nothing new here, except for
failure probabilities.

> Copying has consequences for mutable datatypes (a useful but 
> non-standard extension of Haskell), for foreign datatypes (Ptr, etc.)
> and for interfaces to foreign functions.

Indeed. These cannot be marshalled in any useful sense. (Which makes me
more than a bit suspicious about the Fd instances of Show and Read, BTW.)

> Copying also has consequences for space and time usage: each copy 
> uses more space

That can be controlled by the programmer (provided that sharing is 
preserved over marshalling).

> and laziness is lost.

Not at all. Simply marshall the thunks instead of the evaluated values.
In other words, instances of marshalling functions must be non-strict.

Actually I didn't find strictness annotations on the Show instances in
the library docs. Is this an omission, intentionally left
implementation-dependent, or are they indeed lazy?

> If an object is copied:
> 
> 1) Is it evaluated first?

It shouldn't. Haskell is a lazy language, and forcing evaluation would
make any objects that contain an infinite list deep down in their bowels
potentially non-marshallable.

> 2) Can it contain mutable objects? 3) Can it contain Foreign 
> pointers?

Not by my book, at least not by default.
If would probably be possible to create instances that handle specific
cases, e.g. if the Foreign pointer is just an implementation vehicle for
an FtpFile object, then there might be specific marshalling code that
marshals just the URL and rebuilds the connection on the receiving side.
This also exemplifies the standard problem: Most foreign functions can
fail, even unpredictably, so it's probably better to send just the
specifications for building an equivalent FFI object across the network
(which is exactly what the IO monad is good for: it contains action
specifications, not the actions themselves).

> 4) Can it contain functions?

Of course - this is what got this thread started in the first place :-)

> 5) Is sharing preserved within the object? e.g., does let x=1; 
> y=(x,x,x) in (y,y,y) get sent as 3 objects or as 13 objects?

Semantically, this is irrelevant. However, solutions to the next point
will usually preserve sharing, so this is less of an issue.

> 6) Are cyclic objects ok? e.g., can 'let x=1:x in x' be sent?

They should be.
Actually, Show and Read should be able to handle this.
... unfortunately, a quick test in GHCi revealed that "show" goes into
an endless loop when fed with the above expression. Obviously, I'm
expecting too much here.
Maybe some of my Mozart experience is showing through here - the
Mozart/Oz Inspector tool will show circular values with no problems, any
references that form a cycle are printed as a marker.
I.e. the above would look something like
   A=1 : A
when printed, 'let x = 1 : 2 : x' would print as
   A=1 : 2 : A
etc.

> If it wasn't for mutable objects, foreign types and foreign 
> functions, copying objects would be a straightforward but tedious 
> issue.

Agreed.

> Some of these issues can be avoided if objects are evaluated first 
> because then the type can be used to determine whether mutable 
> objects or foreign types are reachable.

Sometimes the evaluated value will be shorter than its unevaluated
version, sometimes it's the reverse.
Let the programmer decide what makes more sense. Since the programmer
can always force evaluation as needed but cannot "unevaluate" something
once it was executed, the system shouldn't force evaluation.
Particularly since evaluation may not always terminate (transmitting a
value doesn't mean it will be evaluated on the other side after all).

> \ x -> lookup x table
> 
> As well as referring to 'table', this also refers to a standard 
> library function.  We can either copy 'lookup' over or we can use the
>  function of the same name on the receiver.

We can send a hash code of the function and have the receiver request
the full function if it finds that the versions doesn't match. Actually,
hash codes would be a better identification than qualified function names.
Primitive functions would need name equality - there's no good way to
compare them in any other useful way.

> Copying the function over can get expensive because lots of Haskell 
> libraries depend on other Haskell libraries and especially if we end
>  up copying it over multiple times.

A function need never be transferred more than once. (Keep them in a
hash table with weak references, so that they can be garbage collected
if they aren't in use anymore.)

> Using the function with the same name runs the risk that the name is
>  the same but the function itself is different.

Indeed.

> Avoiding the overhead of sharing functions more than once is probably
>  easy enough.  Create a signature for each function by hashing all 
> the code in the function and in every function and object it depends 
> on. Don't transmit functions which the receiver already has.  If 
> storing the function to a file, have the store operation omit 
> functions listed in a table of signatures.  e.g., you might provide 
> signatures for all the standard libraries in ghc 6.02 on windows98. 
> (Since most GHC operations are the same on all platforms and in all 
> recent releases, this ought to match many functions in ghc 5.04.)
> 
> foo :: T -> T
> 
> If the receiving program contains a type 'T' and the sending program 
> contains a type 'T', we might have to check that they are the same 
> type so we have to choose an appropriate equality test for types.  Is
>  structural equality enough or too strong or do we also want to
> insist that any data hiding property the original program had is
> preserved?

Types can be compared just like functions. I believe that the issues are
just the same as with equality, but I may be wrong here.

> A small corner of this question concerns portability of type 
> representations between architectures with different word sizes and 
> different endianness.

Representation is not a serious problem (though getting it under control 
can be some work).
What's more of a problem are implementation-dependent types, such as 
that integer type that's guaranteed to have at least such-and-so many 
binary digits. The shift functions assume a one's-complement 
representation of integers, which is the case on most modern processors 
but doesn't hold once things are ported to an MVS mainframe. (Oh, and 
Unicode function names are going to be very problematic if you want to 
compile them on a 7-bit ASCII or EBCDIC machine, but the only 
consequence that I can see for marshalling is that functions should 
never be identified by name during marshalling, they should be 
identified via hash code.)

Regards,
Jo