[Haskell-cafe] representations of probability distributions

Mon Jul 3 21:14:23 UTC 2017

Dear cafe, 

following up the announcement of bali-phy and the discussion that ensued, I'd like ask the cafe for an overview of representations of probability distributions as data types. It seems that all packages mentioned to far use some flavour of Markov chain. Why? What else is there? Can someone recommend a survey publication/book/talk? 

Explicitly I am interested in:
- the data type, as in Haskell's type system
- how an element of the data type relates to the distribution it describes
- how the monad instance works (if there is one)
- how to compute conditionals (if computable)
- how to compute integrals (if computable)
- what base types are supported (finite types, reals, products ...)

Below is an incomplete, and possibly oversimplified (or even wrong), list of my own.
Additions and corrections welcome.
-- Olaf

1. Markov chains
- uses a random number generator
- type is something like 
  Markov a = a -> IO a
- repeated execution produces an infinite trace xs :: [a]
  where for each predicate p :: a -> Bool, the limit of
  (length (filter p (take n xs))) % n
  as n tends to infinity approaches the probability of p 
  under the distribution modeled. 
- monad instance? 
- conditionals?
- integrals?
- supported base types a in Markov a?

2. Riesz functionals: possibly the most functional representation
- representation as linear functionals 
  Riesz a = (a -> R) -> R
  for a suitable real number type R
- embed predicates in the type (a -> R) via characteristic functions
  characteristic p = \a -> if p a then 1 else 0
  Then for a functional phi :: Riesz a and p :: a -> Bool, 
  phi (characteristic p) is the probability of p.
- Monad instance is the same as for the continuation monad
- conditionals?
- integration is trivial: evaluation
- arbitrary base types
- but how would you even declare a uniform distribution in this type? 

3. Density functions
- represent a distribution on the reals as density function w.r.t. the Lebesgue measure
- limited to reals (and products thereof)
- data type possibly holds only parameters of parametric families (gaussian, beta, gamma, ...)
- computing conditionals only for matching pairs of prior and sampling function
  (conjugate priors), analytically
- integration analytically
- no monad instance because no arbitrary base type

4. Finite distributions
- type Dist a = [(a,R)] for suitable real number type R
- optimizations use search trees instead of flat lists
- basically composition of list and writer monads, 
  where R is regarded as monoid under multiplication
- represents a distribution of finite support. 
  If phi :: Dist a, then
  probability p = sum $ map snd $ filter (p.fst) phi
- conditioning possible if base type is of Eq class and R supports division
- gets unwieldy without Ord instance of base type to keep the list small

5. Valuations
- type Val a = (a -> Bool) -> R
  for suitable real number type R
- models mathematical definition closely
- monad bind is integration: requires exact reals R due to limit process
- conditioning possible if R supports division