[Haskell-cafe] Non-deterministic function/expression types in Haskell?

Mon Jan 15 21:18:10 UTC 2018

Benjamin, 

I am a domain theorist and not an expert on writing compilers/interpreters. Hence I can not comment on your proposals of call-by-value and interpreter state. What I wanted to say with my catchphrase is that you have to do the same amount of work. But in the first case, where you have a DSL, the work (= the random monad) is visible to the programmer, whereas in the second case only the author of the compiler gets to see the monad. But the denotational semantics can be the same. 

If you go for a DSL (which is specifically what you want ot avoid), then a Kleisli category is how it is done. Look at Kleisli in Control.Arrow. For non-probabilistic non-determinism, take the category (Kleisli []). Every morphism a -> b then really is a Haskell function a -> [b], but the Arrow notation hides this. Then choose has the rather trivial implementation

choose :: Kleisli [] (a,a) a
choose = Kleisli (\(x,y) -> [x,y])

You could abstract this into a type class:

class (Arrow a) => Nondet a where
  choose :: a (x,x) x

Let's try this: 

import Prelude hiding ((.),id)
import Control.Category
import Control.Monad
data KleisliDN m x y = D (x -> y) | N (x -> m y)
instance Monad m => Category (KleisliDN m) where
  id = D id
  (D g) . (D f) = D (g . f)
  (N g) . (N f) = N (g <=< f)
  (D g) . (N f) = N (liftM g . f)
  (N g) . (D f) = N (g . f)

data DN m x = Det x | Nondet (m x)

Observe that Kleisli m () x is isomorophic to DN m x, so here you have explicit tagging with D and N on the value-level. 
If you want the compiler to reject certain combinations of D and N, then write a typeclass 

class (Category c, Category d) => Subcategory c d where
  liftMorphism :: c a b -> d a b

instance Monad m => Subcategory (->) (Kleisli m) where
  liftMorphism f = Kleisli (return.f)

and use the category (->) for D and Kleisli m for N. 

> Am 15.01.2018 um 19:17 schrieb Benjamin Redelings <benjamin.redelings at gmail.com>:
> 
> But this problem is exactly what I am proposing a solution for!  The idea is that 'normal' would have type 'double -> double -> double[N]', and therefore merging the expressions would be prohibited by the rule that says we cannot merge two identical expressions of type a[N] (see my original e-mail).
> 
> Interestingly, this could maybe used to handle cases like "let {x = unsafePerformIO $ readchar file; y = unsafePerformIO $ readchar file} in E".  We could define unsafePerformIO as having type
> 
> unsafePerformIO: IO a -> a[N].
> 
> This would NOT solve the problem that the code could perform the readchar's in either order, but it WOULD avoid merging x & y.  I guess one question is: does GHC avoid this merger already?  And, if so, does it avoid this merger by refusing to merge variables?  If GHC refuses to merge variables with identical ASTs that call unsafePerformIO then I would assert that it is ALREADY using the type system I am proposing.
> 
>>> sample:: (()->a[N]) -> a[N]
>> Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it.
> Hmm... I'm not sure this is right. I agree that type t[D] is isomorphic with () -> t[D], because it is easy enough to come up with an isomorphism f where (f (f_inverse value)) = value, and also (f_inverse (f (f_inverse value))) = value.
> 
>     f::forall level.(()->t[level]) -> t[level]
>     f dist = dist ()
> 
>     f::forall level.t[level] -> () -> t[level]
>     f_inverse value = \() -> value
> 
> But the whole point of having t[N] is that ()->t[N] should not be isomorphic to t[N].  Thus, if we evaluate let {x = f (f_inverse E); y = f (f_inverse E) in F} where E is non-deterministic, then I think x and y can be different.  
With my code above, the types are isomorphic: 

toKleisli :: DN m x -> KleisliDN m () x
toKleisli (Det x) = D (const x)
toKleisli (Nondet mx) = N (const mx)
fromKleisli :: KleisliDN m () x -> DN m x
fromKleisli (D f) = Det (f ())
fromKleisli (N f) = Nondet (f ())

() -> t[N] takes a unit and returns a process randomly generating a t. Since there is only one unit, there is only one process described by this function. 
I've seen functions of type () -> t in OCaml where they are used to bring lazy evaluation into an otherwise strict language. 
> I think this means that f is not an isomorphism when applied to non-deterministic expression, so that you should say "disregarding non-terminating or non-deterministic" computations.
> 
> There are some complications here, in that I am quantifying over [N,D] levels in the definitions of f and f_inverse, so that they take their [N,D] levels from their arguments.  I am treating variables as non-values, since they stand for entire expressions that might be substituted for them.  My hope is that this allows placing the [N,D] levels on the input and result types instead of on the arrow, but there could be problems with this.  It is counter-intuitive, since normal 0 1 () could yield the value 2, and 2 itself is not random, but was obtained randomly.  But I think that the system works if implemented, though it might not match the standard interpretation of types?
I don't get all of what you are saying here. But it reminds me of monadic bind. When you say:

do
  x <- normal 0 1
  if x == 2 then return True else return False

then the resulting Boolean is still in the monad, and the type of the overall expression tells you that the 2 is a random one. 

> 
>> choose :: I -> a -> a -> a
>> 
>> where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument.
> I'm not completely sure of the role of the real number here.  Are we saying that 'choose 0.6 0 1' would (for example) return 0 with probability 0.6 and 1 with probability 0.4?  
Exactly. And that is all you need for probabilistic choice, in the following categorical sense. The probabilistic powerdomain of a domain D is the free convex cone over D. A convex cone is a structure that supports the operation choose and multiplication by real numbers from the unit interval [0,1]. (Here we allow total probabilities less than 1, otherwise we don't have a domain. The true probability distributions sit at the top of this domain. The free convex cone over the two-element set {x,y} is a triangle, with "certainly x" at the top left, "certainly y" at the top right, and the constant zero measure at the bottom corner.)

> It seems that you could maybe define a function that takes 2 random numbers:
> 
> choose :: I -> I -> a -> a -> a
> 
> where
>   (a) choose u p 0 1 would return 0 if u<p and 1 otherwise.
But how is choose probabilistic, then? It's just an if-statement in disguise! 
>   (b) the number u is *implicitly* supplied to the function.  So, the user writes "choose 0.6 0 1".
>   (c) the type of main would be Main: I -> IO ().  This makes the program deterministic if we know the value of the real number, but probabilistic if we supply a uniform 0 1 random number.
That sounds like PRNG to me. If you know the seed, you know the outcome. 
> 
>> However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things:
>> (1) Define what the compiled program should do at runtime (operational semantics).
>> (2) Define what the program means, mathematically (denotational semantics).
> Hmm.... yeah I think the denotational semantics could be quite hard.  I think that Chung Chieh Shan is working on some aspect of this this, especially measures on R^n.
That was my point: With monads, the denotational semantics is easy. The operational semantics is hard. 
> 
>> For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain".
> Hmm... I'm guessing (having not looked at the papers below yet except kind of the Haraku paper) that the powerdomain for Double is the measurable sets (aka Borel sets) for the Reals...
No, the probabilistic powerspace of D is the set of all measures on the Borel sigma-algebra of D, ordered pointwise. That is, f <= g if f(A) <= g(A) for all measurable sets A. The beautiful thing about this is that the monadic bind is precisely integration against a measure. 

>>  There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't.
> Interesting.  I'll take a look at the papers you mention.  I have wondered where some of the restrictions come from.
It's whack-a-mole of the categorical kind. 
There are various subcategories of domains. For example, all Haskell types map to domains that have the additional property "bifinite". One can show that the probabilistic powerdomain of any domain is a domain, but the probabilistic powerdomain of a bifinite domain is never bifinite. Hence there can be no Haskell type that faithfully represents all the random computations of a given type. 
It is known that the probabilistic powerdomain of a "continuous" domain is again continuous. Sadly, the continuous domains don't have function spaces that are continuous. Thus you get a semantics for a language with choice, but where functions are not first-class citizens. 

Olaf