[Haskell-cafe] Help on syntactic sugar for combining lazy & strict monads?

Sat Jul 31 08:22:20 UTC 2021

On Thu, 29 Jul 2021, Benjamin Redelings wrote:

> The idea of changing observation to look like `Observation a -> Dist a -> 
> Dist a` is interesting, but I am not sure if this works in practice.  
> Generally you cannot actually produce an exact sample from a distribution 
> plus an observation.  MCMC, for example, produces collections of samples that 
> you can average against, and the error decreases as the number of samples 
> increases.  But you can't generate a single point that is a sample from the 
> posterior.

Isn't that a problem of the concrete implementation of the probability 
monad (MCMC)? Certainly not something that I, as the modeller, would 
like to be concerned with. Other implementations might not have this 
limitation. 
Are we talking about the same thing, the mathematical conditional 
probability? This has the type I described, so it should not be 
an unusual design choice. Moreover, it is a partial function. What happens 
if I try to condition on an impossible observation in Bali-phy?

>
> Maybe it would be possible to change use separate types for distributions 
> from which you cannot directly sample?  Something like `Observation a -> 
> SampleableDist -> NonsampleableDist a`.

I haven't seen types of much else in your system, so I can not provide 
meaningful insights here. My concern would be that parts of models become 
tainted as Nonsampleable too easily, a trapdoor operation. But this is 
just guesswork, I don't grasp sampling well enough, apparently. Here is my 
attempt, please correct me if this is wrong.

Sampling is a monad morphism from a "true" monad of probability 
distributions (e.g. the Giry monad) to a state monad transforming a 
source of randomness, in such a way that any infinite stream of samples 
has the property that for any measurable set U, the probability measure of 
U equals the limit of the ratio of samples inside and outside of U, as the 
prefix of the infinite stream grows to infinity. 
Conditioning on an observation O should translate to the state monad in a 
way so that the sample-producer is now forbidden to output anything 
outside O. 
Hence conditioning on an impossible observation must produce a state 
transformer that never outputs anything: the bottom function.

>
> I will think about whether this would solve the problem with laziness...
>
> -BenRI
>
> On 7/29/21 11:35 PM, Benjamin Redelings wrote:
>> Hi Olaf,
>> 
>> I think you need to look at two things:
>> 
>> 1. The Giry monad, and how it deals with continuous spaces.
>> 
>> 2. The paper "Practical Probabilistic Programming with Monads" - 
>> https://doi.org/10.1145/2804302.2804317
>> 
>> Also, observing 2.0 from a continuous distribution is not nonsensical.
>> 
>> -BenRI
>> 
>> On 7/21/21 11:15 PM, Olaf Klinke wrote:
>>>> However, a lazy interpreter causes problems when trying to introduce
>>>> *observation* statements (aka conditioning statements) into the monad
>>>> [3].  For example,
>>>> 
>>>> run_lazy $ do
>>>>    x <- normal 0 1
>>>>    y <- normal x 1
>>>>    z <- normal y 1
>>>>    2.0 `observe_from` normal z 1
>>>>    return y
>>>> 
>>>> In the above code fragment, y will be forced because it is returned, and
>>>> y will force x.  However, the "observe_from" statement will never be
>>>> forced, because it does not produce a result that is demanded.
>>> 
>>> I'm very confused. If the observe_from statement is never demanded, then 
>>> what semantics should it have? What is the type of observe_from? It seems 
>>> it is
>>> a -> m a -> m ()
>>> for whatever monad m you are using. But conditioning usually is a function
>>> Observation a -> Dist a -> Dist a
>>> so you must use the result of the conditioning somehow. And isn't the 
>>> principle of Monte Carlo to approximate the posterior by sampling from it? 
>>> I tend to agree with your suggestion that observations and sampling can 
>>> not be mixed (in the same do-notation) but the latter have to be collected 
>>> in a prior, then conditioned by an observation.
>>> 
>>> What is the semantic connection between your sample and obsersvation 
>>> monad? What is the connection between both and the semantic probability 
>>> distributions? I claim that once you have typed everything, it becomes 
>>> clear where the problem is.
>>> 
>>> Olaf
>>> 
>>> P.S. It has always bugged me that probabilists use elements and events 
>>> interchangingly, while this can only be done on discrete spaces. So above 
>>> I would rather like to write
>>> (2.0==) `observe_from` (normal 0 1)
>>> which still is a non-sensical statement if (normal 0 1) is a continuous 
>>> distribution where each point set has probability zero.
>