[Haskell-cafe] Lazy Probabilistic Programming System in Haskell
benjamin.redelings at gmail.com
Tue Jul 20 18:57:25 UTC 2021
On 7/17/21 12:29 PM, Olaf Klinke wrote:
>>> On 7/16/21 8:27 AM, Olaf Klinke wrote:
>>>> My program BAli-Phy implements probabilistic programming with
>>>> models written as Haskell programs.
>>> Dear Benjamin,
>>> last time you announced BAli-Phy I pestered you with questions about
>>> semantics. In the meantime there was a discussion  on this list
>>> regarding desirable properties of probabilistic languages and monads in
>>> general. A desirable property of any probabilistic language is that
>>> when you define a distribution but map a constant function over it,
>>> then this has the same computational cost as returning the constant
>>> directly. Can you say anything about that?
>> On Fri, 16 Jul 2021, Benjamin Redelings wrote:
>> Hi Olaf,
>> Are you asking if
>> run $ (const y) <$> normal 0 1
>> has the same cost as
>> run $ return y
>> for some interpreter `run`?
>> Yes, the cost is the same. In do-notation, we would have
>> run $ do
>> x <- normal 0 1
>> return $ (const y x)
>> Since `const y` never forces `x`, no time is spent evaluating `run $
>> normal 0 1`. That is basically
>> what I mean by saying that the language is lazy.
> Awesome! That is something you can not have with (random number-)state
> based implementations, as far as I know, because
> x <- normal 0 1
> at least splits the random number generator. Hence running the above
> ten thousand times even without evaluating the x does have a
> non-neglegible cost. So how did you implement the lazyness you
> described above? Do you have thunks just like in Haskell?
> Last time I remarked that the online documentation contains no proper
> definition of the model language. Some examples with explanations of
> individual lines are not enough, IMHO. That appears not to have
> changed since the last release. So why don't you include a list of
> keywords and built-in functions and their meaning/semantics?
1. My VM is based on Sestoft (1997) "Deriving a lazy abstract machine".
However, when a closure is changeable, we do not overwrite the closure
with its results. Instead we store a allocate a new heap location for
the result, and record a pointer from the closure to its result. This
allows us to erase results when modifiable variables change, while
retaining reduction steps that do not depend on the modifiable variables.
The VM does have thunks. This means that there is an O(1) cost for "x
<- normal 0 1" in the program fragment above. So, it is more precise to
run $ (const y) <$> normal 0 1
has the same cost as
run $ let x = (run $ normal 0 1) in return y
I think for my purposes, an O(1) cost is fine. But if you want NO cost
at all, then I think this would require code optimization to eliminate
the thunk allocated for x.
2. In response to the question about splitting the random number
generator, I have a few thoughts.
2a. Splitting the random number generator, should lead to an O(1) cost
per unforced random sample. I think an O(1) cost is fine, unless the
overhead is very high.
2b. In my code, the VM gets random numbers from a runtime library api
that delivers true random numbers. This could be implemented by a
hardware instruction for example. In practice it is delivered by a
pseudorandom number generate with its own internal state. However, for
my purposes, I think both are fine.
3. I would be happy to add more documentation, if it is actually helpful
in figuring out the language. When learning HTML, I did not read the
formal specification, but modified existing examples. Are the examples
actually hard to follow? I am afraid that if I try and write "formal"
specifications, then there will always be someone who declares them
improper and not formal enough. But if the documentation is simply bad
(which is probably true), I would be happy to improve it. Does that
> So why don't you include a list of keywords and built-in functions and
> their meaning/semantics?
This seems reasonable. I will try to do that.
BTW, part of the confusion might stem from the fact that the language
has two monads: a strict monad for observations, and a lazy monad for
random sampling. Perhaps you would have insights on my question in my
previous e-mail about combining lazy and strict monads?
More information about the Haskell-Cafe