Marshalling functions was: Transmitting Haskell values

Wed Oct 29 12:15:31 EST 2003

> Is [marshaling functions] something absolutely impossible in
> Haskell and by what reason? Just because of strong typing (forgive my
> stupidity ;)? Or are there some deeper theoretical limitations?

The big theoretical issue is whether it would provide an Eq or Show instance 
for -> by the backdoor.  Careful API design could avoid the worst of this.

A semi-theoretical issue is to do with sharing of values, libraries and types.  
A simple function like '\ x -> x' is fairly easy to write out because it 
doesn't refer to any other objects but what about:

\ x -> head [ y | (x',y) <- table, x == x' ]

  This refers to an object 'table' that will have to be
  accessible from the receiver.  Choices are to access the
  object over the network, to copy the object or to move the
  object and access it remotely from the sender.

  Remote access requires a network connection and suddenly
  adding two numbers can raise a TCP/IP exception.

  Copying has consequences for mutable datatypes (a useful but
   non-standard extension of Haskell), for foreign datatypes
   (Ptr, etc.) and for interfaces to foreign functions.

  Copying also has consequences for space and time usage: each
  copy uses more space and laziness is lost.

  If an object is copied:

  1) Is it evaluated first?
  2) Can it contain mutable objects?
  3) Can it contain Foreign pointers?
  4) Can it contain functions?
  5) Is sharing preserved within the object?
     e.g., does let x=1; y=(x,x,x) in (y,y,y) get sent as 3 objects or
     as 13 objects?
  6) Are cyclic objects ok?
     e.g., can 'let x=1:x in x' be sent?

  If it wasn't for mutable objects, foreign types and foreign
  functions, copying objects would be a straightforward but
  tedious issue.  Some of these issues can be avoided if objects
  are evaluated first because then the type can be used to determine
  whether mutable objects or foreign types are reachable.

\ x -> lookup x table

  As well as referring to 'table', this also refers to a standard
  library function.  We can either copy 'lookup' over or we can 
  use the function of the same name on the receiver.

  Copying the function over can get expensive because lots of
  Haskell libraries depend on other Haskell libraries and
  especially if we end up copying it over multiple times.

  Using the function with the same name runs the risk that the name
  is the same but the function itself is different.

  Foreign interfaces are a problem here as well.  Is the 'fopen' function
  on Linux the same as the 'fopen' function in FreeBSD or Windows?

  Avoiding the overhead of sharing functions more than once is
  probably easy enough.  Create a signature for each function by
  hashing all the code in the function and in every function and 
  object it depends on.  Don't transmit functions which the receiver
  already has.  If storing the function to a file, have the store
  operation omit functions listed in a table of signatures.  e.g., you
  might provide signatures for all the standard libraries in ghc 6.02
  on windows98.  (Since most GHC operations are the same on all
  platforms and in all recent releases, this ought to match many
  functions in ghc 5.04.)

foo :: T -> T
  If the receiving program contains a type 'T' and the sending
  program contains a type 'T', we might have to check that they
  are the same type so we have to choose an appropriate equality
  test for types.  Is structural equality enough or too strong
  or do we also want to insist that any data hiding property the
  original program had is preserved?

  A small corner of this question concerns portability of type
  representations between architectures with different word sizes
  and different endianness.

  Good answers probably exist for the type sharing problem 
  (it's a well-studied area).  The main problem will be picking
  among the different choices.

--
Alastair Reid   www.haskell-consulting.com