[Haskell-cafe] Serialization of (a -> b) and IO a

Alexander Solla ajs at 2piix.com
Thu Nov 11 17:59:39 EST 2010

On Nov 11, 2010, at 1:42 PM, Stephen Tetley wrote:

> [*] In case anyone looks up MacGuffin on Wikipedia, I don't think the
> description there is strictly accurate. A MacGuffin doesn't drive the
> plot so much as throw the viewer of the scent.

I think Hitchcock might disagree with you.

In any case, serializing functions is as easy as you want it to be.   
But, there is a big caveat:  You are basically embedding a compiler  
into your run-time.  It can be pretty minimal, relying only on facts  
known about recursively enumerable functions:

class Serialize a where
       serialize 	:: a -> ByteString
       unSerialize :: ByteString -> Maybe a  -- Parsers can fail

instance (Serialize a) => Serialize [a] where ...
instance (Serialize a, Serialize b) => Serialize (a, b) where ...

-- We can conclude that a and b must be enumerable from the  
requirement that
-- f is recursively enumerable:
instance (Serialize a, Enum a, Serialize b, Enum b) => Serialize (a ->  
b) where
	serialize f = serialize $ ( zip         [minBound..maxBound]
		                        (fmap f [minBound..maxBound]) )
-- A map instance could be better:  we trade some serialization time  
for more
-- deserialization time.  
instance (Serialize a, Serialize b) => Serialize (Map a b) where ...

instance (Serialize a, Serialize b) => Serialize (a -> b) where
	serialize f = serialize . fromList $ ( zip         [minBound..maxBound]
		                                   (fmap f [minBound..maxBound]) )
	deserialize map = \x -> lookup x (bytestring_decode_map map)
			            where bytestring_decode_map = ...

There are potentially big problems with this approach:
(i)   Many data types are not instances of Enum (though the really  
should be.  Deriving enumerations is not that hard.  I am willing to  
take one for the team to get GHC to figure out how to derive Enum on  
arbitrary types, but I'm not sure where to start.  Little help?)
(ii)  Time and memory.  We're not encoding a series of instructions  
for computing pure functions, but we're precomputing the results and  
saving them all for later.  This is at least O(size of the domain) *  
O(of the function on each element).  Not big in theoretical terms, but  
something like that could easily cause a factor of [1, \infty)   
slowdown in real code.
(iii)  Serialized objects must be pure.  In particular, you can't  
serialize general IO actions.  I see this as a plus.  It is still easy  
to write an algebra of serializable tokens and a non-serializable  
interpreter to generate IO actions from the tokens.  We do this kind  
of thing all the time -- we just don't serialize the tokens usually.

I think (ii) is the biggest problem.  And it is a big one.  We  
basically need something like template haskell for runtime systems in  
order to do quasi-quoting and compilation at run-time so we can avoid  
reifying the domain and its image under f.

The only thing that can serialize an (IO a) is the Haskell runtime,  
and it does it by running the action (and so putting its sub-steps in  
a series). 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20101111/a9e99183/attachment.html

More information about the Haskell-Cafe mailing list