[Haskell-cafe] Non-deterministic function/expression types in Haskell?

Mon Jan 15 18:17:46 UTC 2018

Hi Olaf,

     Thanks!  This was helpful.  I'll engage below:

On 01/13/2018 04:45 PM, Olaf Klinke wrote:
> Long story short: You can stay inside Haskell's type system or extend it, but you end up with monads either way. I'll explain why.
I might see what you mean about ending up with monads either way. 
Specifically, I think its important to separate random distributions 
from their random samples, which might correspond (in a Random-like 
monad) to the difference between an action and performing the action.  
Is this something like what you are saying?  I think there may still be 
some extra value to the type system I mentioned in performing CSE 
(common sub-expression elimination) safely.

>
>> (a) why are monads a perfect solution to my problem?
> Two reasons. First: Probability distributions do form a monad [Giry,CS,Kock]. It's a mathematical fact. So why not exploit it? It gives you powerful combinators and powerful abstractions.
They certainly do.  I've written a haskell interpreter, and my 
definitions for a (very hacky) Random monad are here, with two different 
interpreters for it (sample and sample') below:

https://github.com/bredelings/BAli-Phy/blob/9c96374013453fe382e609cca357a2c0f3f154b2/modules/Distributions.hs#L14

I'm currently exploiting the monadic structure.  For example, given a 
distribution dist, I basically use 'sequence (replicate n dist)' to 
sample n i.i.d. values from it.  So, to some extent I could see the fact 
that Monads provide a 'sequence' function as a benefit.  In the 
framework I proposed we would have to write 'map sample (replicate n 
dist)', which does not seem very burdensome though.

The fact that you have to run these distributions inside of an 
interpreter also makes some things easy that would be difficult to do 
otherwise, since the interpreter (a) can carry around modifiable state 
and (b) creates a call stack, like in call-by-value languages.  In 
contrast if you say something like "let {x = normal 0 1 ()} in x*x", 
then it doesn't really have a call chain, since the thunk for x can get 
forced from multiple different contexts that were not responsible for 
the allocation of the thunk on the heap. Instead it seems that each heap 
location has a let-allocation chain, but without any intepreter state 
threaded through the chain.  Does that make sense?  Is there any 
literature on the let-allocation chain?  It seems like this would come 
up during debugging.

Despite all this, it still seems to me that there might be reasons not 
to "exploit" the monadic structure of probability distributions, at 
least not in the traditional fashion.  See below.
> Second: Because the requirements on the type modificators [N] and [D] are describing many features monads have. First observe that you don't really need [D] because any deterministic computation can be embedded into the non-deterministic computations by making the set of possibilities a singleton. That is precisely what 'return' does in the context of monads. (Think of return for the list monad.)
>> any function that takes a non-deterministic input must have a non-deterministic output.
> That is common with monads, too: There is no generic function that can extract values out of monads.
Hmm.... one reason I'm hesitant to "exploit" the monadic structure of 
Random is that I don't want to have the [D] values outside the monad and 
[N] values inside it.  The need to use a function like "return" to lift 
(if that is the right terminology) non-monadic values into the monad, 
and the need to run [N] terms in an interpreter to un-lift values of the 
monad seems to be more of an obstruction than a benefit.  It means that 
you can't do things that you could do with normal haskell functions.

For example, with a normal haskell function you could write `let {x = f 
0 x} in x`.  However, with a monadic object you can't write `do {x <- f 
0 x; return x}`.

Hmm.... I guess you could maybe do `let {x = interpret $ do {y <- f 0 x; 
return y}} in x` though, where 'interpret' is an interpreter for the 
monad.  Hmm.... I don't know.  This seems weird.  It certainly seems 
more verbose than `let {x = sample $ f 0 x} in x`.

OK!  So, let's say that 'unsafePerformIO' is an interpreter for the IO 
monad.  In my non-monadic framework, I am suggesting that we define:

sample:: (() -> a[N]) -> a[N]
sample dist = dist ()

In the monadic framework, we define something like

random_iterpreter = Random a -> IO a
sample :: Random a -> a
sample dist  = (unsafePerformIO . random_iterpreter) dist

Then, I think we get equivalent *expressiveness* without modifying the 
HM type system.  Furthermore, I think that (unsafePerformIO . 
random_interpreter) can be completely safe if we imagine that we can 
generate truly random variables somehow.  However, I think that the 
extension to the HM type system still is useful in solving some problems 
that are created by using unsafePerformIO with CSE (see below).

>
>> we should merge variables with identical values if the types are non-deterministic.
> That is problematic on several levels. First, is "identical values" decidable by the compiler? Second, equality might depend on the implementation, which might change.

I'm assuming we only consider expressions equivalent if they have the 
same syntax tree.  (So that should avoid problems with overloading ==).  
This leaves the second problem, you mention:

> For example, (normal 0 1) could contain a random number generator with an intrinsic state.
If we have "let {x=normal 0 1;y=normal 0 1} in E" then the two normal 0 
1 actions could be executed in either order.  However, for random 
samples I think that is not a problem in some situations, since the 
distributions would be the same.  However, regardless of whether we care 
about ordering, it is definitely wrong to merge x and y to get "let 
{x=normal 0 1} in E[y := x]".  I think this is the problem you are 
talking about.

But this problem is exactly what I am proposing a solution for!  The 
idea is that 'normal' would have type 'double -> double -> double[N]', 
and therefore merging the expressions would be prohibited by the rule 
that says we cannot merge two identical expressions of type a[N] (see my 
original e-mail).

Interestingly, this could maybe used to handle cases like "let {x = 
unsafePerformIO $ readchar file; y = unsafePerformIO $ readchar file} in 
E".  We could define unsafePerformIO as having type

unsafePerformIO: IO a -> a[N].

This would NOT solve the problem that the code could perform the 
readchar's in either order, but it WOULD avoid merging x & y.  I guess 
one question is: does GHC avoid this merger already?  And, if so, does 
it avoid this merger by refusing to merge variables?  If GHC refuses to 
merge variables with identical ASTs that call unsafePerformIO then I 
would assert that it is ALREADY using the type system I am proposing.

>> sample:: (()->a[N]) -> a[N]
> Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it.
Hmm... I'm not sure this is right. I agree that type t[D] is isomorphic 
with () -> t[D], because it is easy enough to come up with an 
isomorphism f where (f (f_inverse value)) = value, and also (f_inverse 
(f (f_inverse value))) = value.

     f::forall level.(()->t[level]) -> t[level]
     f dist = dist ()

     f::forall level.t[level] -> () -> t[level]
     f_inverse value = \() -> value

But the whole point of having t[N] is that ()->t[N] should not be 
isomorphic to t[N].  Thus, if we evaluate let {x = f (f_inverse E); y = 
f (f_inverse E) in F} where E is non-deterministic, then I think x and y 
can be different.  I think this means that f is not an isomorphism when 
applied to non-deterministic expression, so that you should say 
"disregarding non-terminating or non-deterministic" computations.

There are some complications here, in that I am quantifying over [N,D] 
levels in the definitions of f and f_inverse, so that they take their 
[N,D] levels from their arguments.  I am treating variables as 
non-values, since they stand for entire expressions that might be 
substituted for them.  My hope is that this allows placing the [N,D] 
levels on the input and result types instead of on the arrow, but there 
could be problems with this.  It is counter-intuitive, since normal 0 1 
() could yield the value 2, and 2 itself is not random, but was obtained 
randomly.  But I think that the system works if implemented, though it 
might not match the standard interpretation of types?

>
>> iid:: Int -> (() -> a[N]) -> [a[N]]
> The function Control.Monad.sequence plays this role for monads, with a slightly different type signature.
> Control.Monad.sequence . Data.List.repeat :: Monad m => m a -> m [a]
> Let the name not deceive you: It is not about sequential computations, rather about ordering the random variables sequentially.
Hmm... I don't completely understand this.  Are you saying that E1 >>= 
(\x -> E2) does not require that E1 is performed before E2?  That seems 
possible only if E2 does not use x.  But maybe I'm missing something.

>
>> (b) can we extend the HM type system to support non-determinism directly?
> Yes, you can. But then you have a different language. There many publications describing that [Draheim,Lago,Borgström,RP]. In a nutshell, start with the simply typed lambda calculus and add an operation
Hmm.... I intentionally attempt to define "non-deterministic" 
expressions instead of "probabilistic" expressions precisely to avoid 
this problem :-)  Thus, my version of 'choose'  simply states that 
'choose 1 2' can return either 1 or 2. It does not say anything about 
the probability of returning either choice, only that both are "valid" 
reductions.  Therefore, my version of choose would have type:

choose: a -> a -> a[N].

> choose :: I -> a -> a -> a
>
> where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument.
I'm not completely sure of the role of the real number here.  Are we 
saying that 'choose 0.6 0 1' would (for example) return 0 with 
probability 0.6 and 1 with probability 0.4?  It seems that you could 
maybe define a function that takes 2 random numbers:

choose :: I -> I -> a -> a -> a

where
   (a) choose u p 0 1 would return 0 if u<p and 1 otherwise.
   (b) the number u is *implicitly* supplied to the function.  So, the 
user writes "choose 0.6 0 1".
   (c) the type of main would be Main: I -> IO ().  This makes the 
program deterministic if we know the value of the real number, but 
probabilistic if we supply a uniform 0 1 random number.

> However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things:
> (1) Define what the compiled program should do at runtime (operational semantics).
> (2) Define what the program means, mathematically (denotational semantics).
Hmm.... yeah I think the denotational semantics could be quite hard.  I 
think that Chung Chieh Shan is working on some aspect of this this, 
especially measures on R^n.
> For (1), the prevailing approach seems to be to equate a probabilistic term with its behaviour under sampling. Markov Chains are popular [MCMC]. I read it's non-trivial, however, to find a Markov chain that behaves according to a mathematically defined probability distribution. The probability that you know more about that than I do is 1.
Ha ha :-)  Yes, designing MCMC approaches to sample from a distribution 
is hard.  Obviously for some expressions like sampling from a normal you 
don't need MCMC.  I would imagine that one approach to this is simply 
only allow primitives that you can get by transforming a Uniform[0,1] 
random variable, or perhaps fairly simple rejection sampling.  Then 
maybe MCMC-type things would be implemented as probabilistic programs 
(e.g. not primitive operations).

> For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain".
Hmm... I'm guessing (having not looked at the papers below yet except 
kind of the Haraku paper) that the powerdomain for Double is the 
measurable sets (aka Borel sets) for the Reals...
>   There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't.
Interesting.  I'll take a look at the papers you mention.  I have 
wondered where some of the restrictions come from.

> Finally, you might want to play with non-determinism other than the probabilistic one. For example, there is the infinite-search package on hackage, which provides a monad of plain non-deterministic choice beyond finite lists. It is even possible to define a Haskell probability monad in the same spirit. I can provide some code if you wish.
Thanks so much for explaining things to me in detail, and for the 
helpful links!

-BenRI

> Regards,
> Olaf
>
> References
> [Giry] http://dx.doi.org/10.1007/BFb0092872
> [CS] http://dx.doi.org/10.1007/s10485-013-9324-9
> [Kock] http://www.tac.mta.ca/tac/volumes/26/4/26-04.pdf
> [Draheim] http://www.springer.com/de/book/9783642551970
> [Lago] https://arxiv.org/abs/1104.0195
> [Borgström] https://arxiv.org/abs/1512.08990
> [RP] http://www.cs.tufts.edu/~nr/pubs/pmonad.pdf
> [MCMC] http://okmij.org/ftp/kakuritu/Hakaru10/index.html
> [DOM] http://www.worldscientific.com/worldscibooks/10.1142/6284