[Haskell-cafe] Help on syntactic sugar for combining lazy & strict monads?

Benjamin Redelings benjamin.redelings at gmail.com
Fri Jul 30 06:55:15 UTC 2021


The idea of changing observation to look like `Observation a -> Dist a 
-> Dist a` is interesting, but I am not sure if this works in practice.  
Generally you cannot actually produce an exact sample from a 
distribution plus an observation.  MCMC, for example, produces 
collections of samples that you can average against, and the error 
decreases as the number of samples increases.  But you can't generate a 
single point that is a sample from the posterior.

Maybe it would be possible to change use separate types for 
distributions from which you cannot directly sample?  Something like 
`Observation a -> SampleableDist -> NonsampleableDist a`.

I will think about whether this would solve the problem with laziness...

-BenRI

On 7/29/21 11:35 PM, Benjamin Redelings wrote:
> Hi Olaf,
>
> I think you need to look at two things:
>
> 1. The Giry monad, and how it deals with continuous spaces.
>
> 2. The paper "Practical Probabilistic Programming with Monads" - 
> https://doi.org/10.1145/2804302.2804317
>
> Also, observing 2.0 from a continuous distribution is not nonsensical.
>
> -BenRI
>
> On 7/21/21 11:15 PM, Olaf Klinke wrote:
>>> However, a lazy interpreter causes problems when trying to introduce
>>> *observation* statements (aka conditioning statements) into the monad
>>> [3].  For example,
>>>
>>> run_lazy $ do
>>>    x <- normal 0 1
>>>    y <- normal x 1
>>>    z <- normal y 1
>>>    2.0 `observe_from` normal z 1
>>>    return y
>>>
>>> In the above code fragment, y will be forced because it is returned, 
>>> and
>>> y will force x.  However, the "observe_from" statement will never be
>>> forced, because it does not produce a result that is demanded.
>>
>> I'm very confused. If the observe_from statement is never demanded, 
>> then what semantics should it have? What is the type of observe_from? 
>> It seems it is
>> a -> m a -> m ()
>> for whatever monad m you are using. But conditioning usually is a 
>> function
>> Observation a -> Dist a -> Dist a
>> so you must use the result of the conditioning somehow. And isn't the 
>> principle of Monte Carlo to approximate the posterior by sampling 
>> from it? I tend to agree with your suggestion that observations and 
>> sampling can not be mixed (in the same do-notation) but the latter 
>> have to be collected in a prior, then conditioned by an observation.
>>
>> What is the semantic connection between your sample and obsersvation 
>> monad? What is the connection between both and the semantic 
>> probability distributions? I claim that once you have typed 
>> everything, it becomes clear where the problem is.
>>
>> Olaf
>>
>> P.S. It has always bugged me that probabilists use elements and 
>> events interchangingly, while this can only be done on discrete 
>> spaces. So above I would rather like to write
>> (2.0==) `observe_from` (normal 0 1)
>> which still is a non-sensical statement if (normal 0 1) is a 
>> continuous distribution where each point set has probability zero.


More information about the Haskell-Cafe mailing list