# [Haskell-cafe] Help on syntactic sugar for combining lazy & strict monads?

Benjamin Redelings benjamin.redelings at gmail.com
Fri Jul 30 06:55:15 UTC 2021

```The idea of changing observation to look like `Observation a -> Dist a
-> Dist a` is interesting, but I am not sure if this works in practice.
Generally you cannot actually produce an exact sample from a
distribution plus an observation.  MCMC, for example, produces
collections of samples that you can average against, and the error
decreases as the number of samples increases.  But you can't generate a
single point that is a sample from the posterior.

Maybe it would be possible to change use separate types for
distributions from which you cannot directly sample?  Something like
`Observation a -> SampleableDist -> NonsampleableDist a`.

I will think about whether this would solve the problem with laziness...

-BenRI

On 7/29/21 11:35 PM, Benjamin Redelings wrote:
> Hi Olaf,
>
> I think you need to look at two things:
>
> 1. The Giry monad, and how it deals with continuous spaces.
>
> 2. The paper "Practical Probabilistic Programming with Monads" -
> https://doi.org/10.1145/2804302.2804317
>
> Also, observing 2.0 from a continuous distribution is not nonsensical.
>
> -BenRI
>
> On 7/21/21 11:15 PM, Olaf Klinke wrote:
>>> However, a lazy interpreter causes problems when trying to introduce
>>> *observation* statements (aka conditioning statements) into the monad
>>> [3].  For example,
>>>
>>> run_lazy \$ do
>>>    x <- normal 0 1
>>>    y <- normal x 1
>>>    z <- normal y 1
>>>    2.0 `observe_from` normal z 1
>>>    return y
>>>
>>> In the above code fragment, y will be forced because it is returned,
>>> and
>>> y will force x.  However, the "observe_from" statement will never be
>>> forced, because it does not produce a result that is demanded.
>>
>> I'm very confused. If the observe_from statement is never demanded,
>> then what semantics should it have? What is the type of observe_from?
>> It seems it is
>> a -> m a -> m ()
>> for whatever monad m you are using. But conditioning usually is a
>> function
>> Observation a -> Dist a -> Dist a
>> so you must use the result of the conditioning somehow. And isn't the
>> principle of Monte Carlo to approximate the posterior by sampling
>> from it? I tend to agree with your suggestion that observations and
>> sampling can not be mixed (in the same do-notation) but the latter
>> have to be collected in a prior, then conditioned by an observation.
>>
>> What is the semantic connection between your sample and obsersvation
>> monad? What is the connection between both and the semantic
>> probability distributions? I claim that once you have typed
>> everything, it becomes clear where the problem is.
>>
>> Olaf
>>
>> P.S. It has always bugged me that probabilists use elements and
>> events interchangingly, while this can only be done on discrete
>> spaces. So above I would rather like to write
>> (2.0==) `observe_from` (normal 0 1)
>> which still is a non-sensical statement if (normal 0 1) is a
>> continuous distribution where each point set has probability zero.
```