Benjamin Redelings benjamin.redelings at gmail.com
Mon Jan 15 18:17:46 UTC 2018

```Hi Olaf,

Thanks!  This was helpful.  I'll engage below:

On 01/13/2018 04:45 PM, Olaf Klinke wrote:
> Long story short: You can stay inside Haskell's type system or extend it, but you end up with monads either way. I'll explain why.
I might see what you mean about ending up with monads either way.
Specifically, I think its important to separate random distributions
from their random samples, which might correspond (in a Random-like
monad) to the difference between an action and performing the action.
Is this something like what you are saying?  I think there may still be
some extra value to the type system I mentioned in performing CSE
(common sub-expression elimination) safely.

>
>> (a) why are monads a perfect solution to my problem?
> Two reasons. First: Probability distributions do form a monad [Giry,CS,Kock]. It's a mathematical fact. So why not exploit it? It gives you powerful combinators and powerful abstractions.
They certainly do.  I've written a haskell interpreter, and my
definitions for a (very hacky) Random monad are here, with two different
interpreters for it (sample and sample') below:

https://github.com/bredelings/BAli-Phy/blob/9c96374013453fe382e609cca357a2c0f3f154b2/modules/Distributions.hs#L14

I'm currently exploiting the monadic structure.  For example, given a
distribution dist, I basically use 'sequence (replicate n dist)' to
sample n i.i.d. values from it.  So, to some extent I could see the fact
that Monads provide a 'sequence' function as a benefit.  In the
framework I proposed we would have to write 'map sample (replicate n
dist)', which does not seem very burdensome though.

The fact that you have to run these distributions inside of an
interpreter also makes some things easy that would be difficult to do
otherwise, since the interpreter (a) can carry around modifiable state
and (b) creates a call stack, like in call-by-value languages.  In
contrast if you say something like "let {x = normal 0 1 ()} in x*x",
then it doesn't really have a call chain, since the thunk for x can get
forced from multiple different contexts that were not responsible for
the allocation of the thunk on the heap. Instead it seems that each heap
location has a let-allocation chain, but without any intepreter state
threaded through the chain.  Does that make sense?  Is there any
literature on the let-allocation chain?  It seems like this would come
up during debugging.

Despite all this, it still seems to me that there might be reasons not
to "exploit" the monadic structure of probability distributions, at
least not in the traditional fashion.  See below.
> Second: Because the requirements on the type modificators [N] and [D] are describing many features monads have. First observe that you don't really need [D] because any deterministic computation can be embedded into the non-deterministic computations by making the set of possibilities a singleton. That is precisely what 'return' does in the context of monads. (Think of return for the list monad.)
>> any function that takes a non-deterministic input must have a non-deterministic output.
> That is common with monads, too: There is no generic function that can extract values out of monads.
Hmm.... one reason I'm hesitant to "exploit" the monadic structure of
Random is that I don't want to have the [D] values outside the monad and
[N] values inside it.  The need to use a function like "return" to lift
and the need to run [N] terms in an interpreter to un-lift values of the
monad seems to be more of an obstruction than a benefit.  It means that
you can't do things that you could do with normal haskell functions.

For example, with a normal haskell function you could write `let {x = f
0 x} in x`.  However, with a monadic object you can't write `do {x <- f
0 x; return x}`.

Hmm.... I guess you could maybe do `let {x = interpret \$ do {y <- f 0 x;
return y}} in x` though, where 'interpret' is an interpreter for the
monad.  Hmm.... I don't know.  This seems weird.  It certainly seems
more verbose than `let {x = sample \$ f 0 x} in x`.

OK!  So, let's say that 'unsafePerformIO' is an interpreter for the IO

sample:: (() -> a[N]) -> a[N]
sample dist = dist ()

In the monadic framework, we define something like

random_iterpreter = Random a -> IO a
sample :: Random a -> a
sample dist  = (unsafePerformIO . random_iterpreter) dist

Then, I think we get equivalent *expressiveness* without modifying the
HM type system.  Furthermore, I think that (unsafePerformIO .
random_interpreter) can be completely safe if we imagine that we can
generate truly random variables somehow.  However, I think that the
extension to the HM type system still is useful in solving some problems
that are created by using unsafePerformIO with CSE (see below).

>
>> we should merge variables with identical values if the types are non-deterministic.
> That is problematic on several levels. First, is "identical values" decidable by the compiler? Second, equality might depend on the implementation, which might change.

I'm assuming we only consider expressions equivalent if they have the
This leaves the second problem, you mention:

> For example, (normal 0 1) could contain a random number generator with an intrinsic state.
If we have "let {x=normal 0 1;y=normal 0 1} in E" then the two normal 0
1 actions could be executed in either order.  However, for random
samples I think that is not a problem in some situations, since the
distributions would be the same.  However, regardless of whether we care
about ordering, it is definitely wrong to merge x and y to get "let
{x=normal 0 1} in E[y := x]".  I think this is the problem you are

But this problem is exactly what I am proposing a solution for!  The
idea is that 'normal' would have type 'double -> double -> double[N]',
and therefore merging the expressions would be prohibited by the rule
that says we cannot merge two identical expressions of type a[N] (see my
original e-mail).

Interestingly, this could maybe used to handle cases like "let {x =
E".  We could define unsafePerformIO as having type

unsafePerformIO: IO a -> a[N].

This would NOT solve the problem that the code could perform the
readchar's in either order, but it WOULD avoid merging x & y.  I guess
one question is: does GHC avoid this merger already?  And, if so, does
it avoid this merger by refusing to merge variables?  If GHC refuses to
merge variables with identical ASTs that call unsafePerformIO then I
would assert that it is ALREADY using the type system I am proposing.

>> sample:: (()->a[N]) -> a[N]
> Observe that, disregarding non-terminating computations, any type t is isomorphic to () -> t. Knowing this, your statement seems to imply that a non-deterministic computation is identified with what you can sample from it.
Hmm... I'm not sure this is right. I agree that type t[D] is isomorphic
with () -> t[D], because it is easy enough to come up with an
isomorphism f where (f (f_inverse value)) = value, and also (f_inverse
(f (f_inverse value))) = value.

f::forall level.(()->t[level]) -> t[level]
f dist = dist ()

f::forall level.t[level] -> () -> t[level]
f_inverse value = \() -> value

But the whole point of having t[N] is that ()->t[N] should not be
isomorphic to t[N].  Thus, if we evaluate let {x = f (f_inverse E); y =
f (f_inverse E) in F} where E is non-deterministic, then I think x and y
can be different.  I think this means that f is not an isomorphism when
applied to non-deterministic expression, so that you should say
"disregarding non-terminating or non-deterministic" computations.

There are some complications here, in that I am quantifying over [N,D]
levels in the definitions of f and f_inverse, so that they take their
[N,D] levels from their arguments.  I am treating variables as
non-values, since they stand for entire expressions that might be
substituted for them.  My hope is that this allows placing the [N,D]
levels on the input and result types instead of on the arrow, but there
could be problems with this.  It is counter-intuitive, since normal 0 1
() could yield the value 2, and 2 itself is not random, but was obtained
randomly.  But I think that the system works if implemented, though it
might not match the standard interpretation of types?

>
>> iid:: Int -> (() -> a[N]) -> [a[N]]
> The function Control.Monad.sequence plays this role for monads, with a slightly different type signature.
> Control.Monad.sequence . Data.List.repeat :: Monad m => m a -> m [a]
> Let the name not deceive you: It is not about sequential computations, rather about ordering the random variables sequentially.
Hmm... I don't completely understand this.  Are you saying that E1 >>=
(\x -> E2) does not require that E1 is performed before E2?  That seems
possible only if E2 does not use x.  But maybe I'm missing something.

>
>> (b) can we extend the HM type system to support non-determinism directly?
> Yes, you can. But then you have a different language. There many publications describing that [Draheim,Lago,Borgström,RP]. In a nutshell, start with the simply typed lambda calculus and add an operation
Hmm.... I intentionally attempt to define "non-deterministic"
expressions instead of "probabilistic" expressions precisely to avoid
this problem :-)  Thus, my version of 'choose'  simply states that
'choose 1 2' can return either 1 or 2. It does not say anything about
the probability of returning either choice, only that both are "valid"
reductions.  Therefore, my version of choose would have type:

choose: a -> a -> a[N].

> choose :: I -> a -> a -> a
>
> where I is the the type of real numbers 0 <= x <= 1. 'choose' makes a weighted probabilistic choice between its second and third argument.
I'm not completely sure of the role of the real number here.  Are we
saying that 'choose 0.6 0 1' would (for example) return 0 with
probability 0.6 and 1 with probability 0.4?  It seems that you could
maybe define a function that takes 2 random numbers:

choose :: I -> I -> a -> a -> a

where
(a) choose u p 0 1 would return 0 if u<p and 1 otherwise.
(b) the number u is *implicitly* supplied to the function.  So, the
user writes "choose 0.6 0 1".
(c) the type of main would be Main: I -> IO ().  This makes the
program deterministic if we know the value of the real number, but
probabilistic if we supply a uniform 0 1 random number.

> However, in order to describe what programs in (lambda calculus + choose) actually _mean_, you need two things:
> (1) Define what the compiled program should do at runtime (operational semantics).
> (2) Define what the program means, mathematically (denotational semantics).
Hmm.... yeah I think the denotational semantics could be quite hard.  I
think that Chung Chieh Shan is working on some aspect of this this,
especially measures on R^n.
> For (1), the prevailing approach seems to be to equate a probabilistic term with its behaviour under sampling. Markov Chains are popular [MCMC]. I read it's non-trivial, however, to find a Markov chain that behaves according to a mathematically defined probability distribution. The probability that you know more about that than I do is 1.
Ha ha :-)  Yes, designing MCMC approaches to sample from a distribution
is hard.  Obviously for some expressions like sampling from a normal you
don't need MCMC.  I would imagine that one approach to this is simply
only allow primitives that you can get by transforming a Uniform[0,1]
random variable, or perhaps fairly simple rejection sampling.  Then
maybe MCMC-type things would be implemented as probabilistic programs
(e.g. not primitive operations).

> For (2), the mathematical meaning of ordinary Haskell programs are given via the following mapping. Every Haskell type t is associated with a domain D [DOM] and each Haskell function of type t -> t' is associated with a mathematical function f: D -> D' between the associated domains. Whenever non-determinism is involved, e.g. a probabilistic computation on type t, then instead of D one uses P(D), a suitable "powerdomain".
Hmm... I'm guessing (having not looked at the papers below yet except
kind of the Haraku paper) that the powerdomain for Double is the
measurable sets (aka Borel sets) for the Reals...
>   There are various P for various sorts of non-determinism (see the work of Gordon Plotkin), and each of them is a monad on the category of domains. A major problem is whether the P for probabilisitc choice works well with e.g. function types. That is why many papers restrict themselves to first-order functions. Another notoriously hard problem is to combine different sorts of non-determinism. It is like combining monads in Haskell: Some monads have monad transformers associated with them, but some don't.
Interesting.  I'll take a look at the papers you mention.  I have
wondered where some of the restrictions come from.

> Finally, you might want to play with non-determinism other than the probabilistic one. For example, there is the infinite-search package on hackage, which provides a monad of plain non-deterministic choice beyond finite lists. It is even possible to define a Haskell probability monad in the same spirit. I can provide some code if you wish.
Thanks so much for explaining things to me in detail, and for the

-BenRI

> Regards,
> Olaf
>
> References
> [Giry] http://dx.doi.org/10.1007/BFb0092872
> [CS] http://dx.doi.org/10.1007/s10485-013-9324-9
> [Kock] http://www.tac.mta.ca/tac/volumes/26/4/26-04.pdf
> [Draheim] http://www.springer.com/de/book/9783642551970
> [Lago] https://arxiv.org/abs/1104.0195
> [Borgström] https://arxiv.org/abs/1512.08990