[Haskell-cafe] Serialization of (a -> b) and IO a

Thu Nov 11 09:36:38 EST 2010

On Thursday 11 November 2010 6:22:06 am Sjoerd Visscher wrote:
> What I'm wondering is if it would actually break things if serialize would
> not produce the same String for these functions.

Yes. It would break the (usual) mathematical underpinnings of Haskell.

> The reasoning above is used regularly to shoot down some really useful
> functionality. So what would go wrong if we chose to take the practical
> path, and leave aside the theoretical issues?

You would lose many uses of equational reasoning in your programs. Have you 
every substituted 'x * 2' for the expression 'x + x' in one of your programs, 
or vice versa? You can no longer do that, because someone may be serializing 
the function you're writing, checking how it's implemented, and relying it.

You'd lose the whole notion of 'the category of haskell types and functions' 
goodbye, too. Does f . id = f? Not if the former serializes as "f . id". We 
already have a weak case of this, since (\x -> undefined x) can be 
distinguished from undefined using seq, but that can be hand-waved away by not 
worrying about bottoms so much. That isn't going to work for serialize. There 
does exist practical Haskell stuff from category theory inspired people.

As a digression... When you get into things like dependent type theory, 
equality is actually incorporated into the language in a much more direct way. 
And there are various sorts of equality you can add to your type theory. Two 
common choices are:

  intensional equality: two values are provably equal if they evaluate to the
  same normal form

  extensional equality: this incorporates non-computational rules, like the
  point-wise equality of functions.

Now, in a type theory where equality is intensional, I cannot prove:

  (forall x. f x = g x) -> f = g

However, both these equalities (and others in between and on either side) are 
*compatible*, in that I cannot disprove the above theorem in an intensional 
theory.

What seq and serialize do is break from extensional equality, and allow us to 
disprove the above (perhaps not for seq within a hypothetical theory, since 
the invalidating case involves non-termination, but certainly for serialize). 
And that's a big deal, because extensional equality is handy for the above 
reasoning about programs.

> Also, the above two functions might not be exactly denotationally equal if
> the type is (Float -> Float), or the speed or memory use might be
> different. Could it not be that requiring them to be equal could just as
> well break things?

I would not be at all surprised if they are unequal for Float -> Float, 
considering how ugly floating point types are. That doesn't justify breaking 
equality for every other type.

Speed and memory use are typically disregarded for equality of (expressions 
denoting) functions. Merge sort and bubble sort are two algorithms for 
computing the same function, even though they vary wildly in these two 
characteristics. If you want a calculus for precisely reasoning about resource 
usage, then you would not necessarily be able to consider these equal.

But, Haskell is simply not such a calculus, and I don't think, "what if it 
were," is a good argument for actively breaking extensional equality in such a 
dramatic way (since the breakage would have nothing to do with precise 
reasoning about resource usage).

Gábor Lehel wrote:
> It is true that (a) per Luke Palmer, if we could serialize equal
> functions to equal representations then we could we could decide
> whether two pure functions were equal, which (if not done in the IO
> monad) would(?) break purity; and (b) per Dan Doel, if we wanted to
> implement our serialization in a way so that equal functions get equal
> representations, we couldn't do it, because it's an impossible
> problem.
>
> But these sort of cancel each other out, because (a) it's an
> impossible problem, and (b) we don't want to do it.

They do not cancel each other out. Rather, it goes like this:

a) Serializing functions gives you a decision procedure for function equality.
b) It is impossible to decide extensional equality of functions.

Together, these mean that any serialization procedure you define is *wrong*, 
in that it violates the semantics of pure Haskell expressions (the resulting 
decision procedure will answer 'False' for some pair of extensionally equal 
expressions f and g).

The way IO can get around this is that when you have:

  serialize (\x -> x + x :: Integer) :: IO String
  serialize (\x -> 2 * x :: Integer) :: IO String

you can say that the two IO actions are equal in the following sense:

  1) There is an equivalence class of expressions denoting the function in
     question. This equivalence class contains (\x -> x + x) and
     (\x -> 2 * x)
  2) The IO action produced by serialize selects a string representation
     for one of these expressions at random*.
    *) You probably won't ever observe "\x -> x + x" being the output of
       (serialize (\x -> 2 * x)), but that's just a coincidence.

This explanation is not available if serialize has type
(Integer -> Integer) -> String, however. Those have to be mathematical 
functions (of a sort) that select a unique representation for each equivalence 
class, or else we have to get a completely different (and much less nice) 
denotational semantics for Haskell.

----

Perhaps I missed it, but: has anyone explained why:

  (a -> b) -> IO String

is an unacceptable type? Why does it need to be just String? Folks often ask, 
"what benefit is purity?" Well, what is the benefit of making this function 
impure? GHC has a lot of normally impure functionality incorporated in a way 
that is mathematically sound (though not ideal). What are the things that have 
been shot down? Or does IO, ST, etc. qualify as getting shot down?

-- Dan