[Haskell-cafe] Lazy Probabilistic Programming System in Haskell

Olaf Klinke olf at aatal-apotheke.de
Wed Jul 21 21:59:30 UTC 2021


On Tue, 20 Jul 2021, Benjamin Redelings wrote:
>
> On 7/17/21 12:29 PM, Olaf Klinke wrote:
>>>> 
>>>> On 7/16/21 8:27 AM, Olaf Klinke wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> My program BAli-Phy implements probabilistic programming with models 
>>>>> written as Haskell programs.
>>>>> 
>>>>> http://www.bali-phy.org/models.php
>>>> 
>>>> Dear Benjamin,
>>>> last time you announced BAli-Phy I pestered you with questions about
>>>> semantics. In the meantime there was a discussion [1] on this list
>>>> regarding desirable properties of probabilistic languages and monads in
>>>> general. A desirable property of any probabilistic language is that
>>>> when you define a distribution but map a constant function over it,
>>>> then this has the same computational cost as returning the constant
>>>> directly. Can you say anything about that?
>>>> Cheers,
>>>> Olaf
>>>> 
>>>> [1]
>>> https://mail.haskell.org/pipermail/haskell-cafe/2020-November/132905.html 
>>>> 
>>>> 
>>>> 
>>> 
>>> On Fri, 16 Jul 2021, Benjamin Redelings wrote:
>>> 
>>> 
>>> Hi Olaf,
>>> 
>>> Are you asking if
>>> 
>>>     run $ (const y) <$> normal 0 1
>>> has the same cost as
>>> 
>>>     run $ return y
>>> 
>>> for some interpreter `run`?
>>> 
>>> Yes, the cost is the same.  In do-notation, we would have
>>> 
>>>     run $ do
>>>         x <- normal 0 1
>>>         return $ (const y x)
>>> 
>>> Since `const y` never forces `x`, no time is spent evaluating `run $ 
>>> normal 0 1`.  That is basically
>>> what I mean by saying that the language is lazy.
>>> 
>>> -BenRI
>>> 
>> 
>> Awesome! That is something you can not have with (random number-)state 
>> based implementations, as far as I know, because
>> x <- normal 0 1
>> at least splits the random number generator. Hence running the above ten 
>> thousand times even without evaluating the x does have a non-neglegible 
>> cost. So how did you implement the lazyness you described above? Do you 
>> have thunks just like in Haskell?
>> 
>> Last time I remarked that the online documentation contains no proper 
>> definition of the model language. Some examples with explanations of 
>> individual lines are not enough, IMHO. That appears not to have changed 
>> since the last release. So why don't you include a list of keywords and 
>> built-in functions and their meaning/semantics?
>> 
>> Regards,
>> Olaf
>
> Hi Olaf,
>
> 1. My VM is based on Sestoft (1997) "Deriving a lazy abstract machine".  
> However, when a closure is changeable, we do not overwrite the closure with 
> its results.  Instead we store a allocate a new heap location for the result, 
> and record a pointer from the closure to its result.  This allows us to erase 
> results when modifiable variables change, while retaining reduction steps 
> that do not depend on the modifiable variables.
>
> The VM does have thunks.  This means that there is an O(1) cost for "x <- 
> normal 0 1" in the program fragment above.  So, it is more precise to say 
> that
>
>     run $ (const y) <$> normal 0 1
>
> has the same cost as
>
>     run $ let x = (run $ normal 0 1) in return y
>
> I think for my purposes, an O(1) cost is fine.  But if you want NO cost at 
> all, then I think this would require code optimization to eliminate the thunk 
> allocated for x.
>
> 2. In response to the question about splitting the random number generator, I 
> have a few thoughts.
>
> 2a. Splitting the random number generator, should lead to an O(1) cost per 
> unforced random sample.  I think an O(1) cost is fine, unless the overhead is 
> very high.
>
> 2b. In my code, the VM gets random numbers from a runtime library api that 
> delivers true random numbers.  This could be implemented by a hardware 
> instruction for example.  In practice it is delivered by a pseudorandom 
> number generate with its own internal state.  However, for my purposes, I 
> think both are fine.

Yes, O(1) cost is probably okay, as long as the constant is low.
Not being an expert on sampling, I don't know how expensive generator 
splits really are. If one has very deeply nested models, that might 
eventually lead to problems.

>
> 3. I would be happy to add more documentation, if it is actually helpful in 
> figuring out the language.  When learning HTML, I did not read the formal 
> specification, but modified existing examples. 

My first address, to stay with this analogy, is the html tag list 
reference.

> Are the examples actually 
> hard to follow?  I am afraid that if I try and write "formal" specifications, 
> then there will always be someone who declares them improper and not formal 
> enough. 

At least there should be an explanation on what you, as the author, think 
the operational semantics are. Whether this is formally proven or 
formalized is secondary.

> But if the documentation is simply bad (which is probably true), I 
> would be happy to improve it.  Does that make sense?

Examples should in any case be shown early and often. And to be fair, 
the documentation is probably good for the intended audience, 
bioinformaticians. Since my interest is more theoretical, I was missing 
some bits. But don't let that worry you too much. During my time in 
bioinformatics I definitely spent too much time worrying about the 
theoretical foundations. Tried to tell a statistician about monads once 
and got back a bewildered stare.

>
>> So why don't you include a list of keywords and built-in functions and 
>> their meaning/semantics? 
> This seems reasonable.  I will try to do that.
>
> BTW, part of the confusion might stem from the fact that the language has two 
> monads: a strict monad for observations, and a lazy monad for random 
> sampling.  Perhaps you would have insights on my question in my previous 
> e-mail about combining lazy and strict monads?
>
> take care,
>
>
> -BenRI
>
>

I do remember that I was confused about your typing. The model language 
has no typing information attached. What would an example model look like 
with type annotation? How do your two monads compare with traditional 
monadic-probabilistic code, where there is only one monad of random 
values?

Olaf


More information about the Haskell-Cafe mailing list