# [Haskell-cafe] Serialization of (a -> b) and IO a

Alexander Solla ajs at 2piix.com
Thu Nov 11 17:59:39 EST 2010

On Nov 11, 2010, at 1:42 PM, Stephen Tetley wrote:

> [*] In case anyone looks up MacGuffin on Wikipedia, I don't think the
> description there is strictly accurate. A MacGuffin doesn't drive the
> plot so much as throw the viewer of the scent.

I think Hitchcock might disagree with you.

In any case, serializing functions is as easy as you want it to be.
But, there is a big caveat:  You are basically embedding a compiler
into your run-time.  It can be pretty minimal, relying only on facts

class Serialize a where
serialize 	:: a -> ByteString
unSerialize :: ByteString -> Maybe a  -- Parsers can fail

instance (Serialize a) => Serialize [a] where ...
instance (Serialize a, Serialize b) => Serialize (a, b) where ...

-- We can conclude that a and b must be enumerable from the
requirement that
-- f is recursively enumerable:
instance (Serialize a, Enum a, Serialize b, Enum b) => Serialize (a ->
b) where
serialize f = serialize $( zip [minBound..maxBound] (fmap f [minBound..maxBound]) ) -- A map instance could be better: we trade some serialization time for more -- deserialization time. instance (Serialize a, Serialize b) => Serialize (Map a b) where ... instance (Serialize a, Serialize b) => Serialize (a -> b) where serialize f = serialize . fromList$ ( zip         [minBound..maxBound]
(fmap f [minBound..maxBound]) )
deserialize map = \x -> lookup x (bytestring_decode_map map)
where bytestring_decode_map = ...

There are potentially big problems with this approach:
(i)   Many data types are not instances of Enum (though the really
should be.  Deriving enumerations is not that hard.  I am willing to
take one for the team to get GHC to figure out how to derive Enum on
arbitrary types, but I'm not sure where to start.  Little help?)
(ii)  Time and memory.  We're not encoding a series of instructions
for computing pure functions, but we're precomputing the results and
saving them all for later.  This is at least O(size of the domain) *
O(of the function on each element).  Not big in theoretical terms, but
something like that could easily cause a factor of [1, \infty)
slowdown in real code.
(iii)  Serialized objects must be pure.  In particular, you can't
serialize general IO actions.  I see this as a plus.  It is still easy
to write an algebra of serializable tokens and a non-serializable
interpreter to generate IO actions from the tokens.  We do this kind
of thing all the time -- we just don't serialize the tokens usually.

I think (ii) is the biggest problem.  And it is a big one.  We
basically need something like template haskell for runtime systems in
order to do quasi-quoting and compilation at run-time so we can avoid
reifying the domain and its image under f.

The only thing that can serialize an (IO a) is the Haskell runtime,
and it does it by running the action (and so putting its sub-steps in
a series).
-------------- next part --------------
An HTML attachment was scrubbed...