wren ng thornton wren at freegeek.org
Tue Aug 24 00:29:23 EDT 2010

```Eugene Kirpichov wrote:
> Do there exist other nontrivial higher-order algorithms and datastructures?
> Is the field of higher-order algorithms indeed as unexplored as it seems?

Many algorithms in natural language processing can be captured by
higher-order algorithms parameterized by the choice of semiring (or
module space).

For example, consider the inference problem for hidden Markov models
(which are often used for things like determining the part of speech
tags for some sentence in natural language). To figure out the total
probability that the HMM is in some state at some time, you use the
Forward algorithm.[1] To figure out the probability of the most likely
state sequence that has a specific state at some time, you use the
Viterbi algorithm. To figure out not only the probability of the most
likely state sequence but also what that tag sequence actually is, you
can modify Viterbi to store back pointers.

All of these are the same algorithm, just with different (augmented)
semirings. In order to prevent underflow for very small probabilities,
we usually run these algorithms with probabilities in the log-domain.
Those variants are also the same algorithm, just taking the image of the
semiring under the logarithm functor:

Forward     : FW ([0,1], +, 0, *, 1)

Log Forward : FW ([-Inf,0], <+>, -Inf, +, 0)
where
-- Ignoring infinities...
x <+> y | x >= y    = x + log (1 + exp (y-x))
| otherwise = y + log (1 + exp (x-y))

Viterbi     : FW ([0,1], max, 0, *, 1)

Log Viterbi : FW ([-Inf,0], max, -Inf, +, 0)

ViterbiBP Q : FW (Maybe([0,1],Maybe Q), argmax, Nothing, <*>,
Just(1,Nothing))
where
-- Q = the type of the states in your HMM
mx <*> my = do
(px,x) <- mx
(py,y) <- my
return (px*py, y `mappend` x)

Log (ViterbiBP Q)
: FW ( Maybe([-Inf,0],Maybe Q)
, argmax, Nothing
, <+>, Just(0,Nothing))
where
mx <+> my = do
(px,x) <- mx
(py,y) <- my
return (px+py, y `mappend` x)

Using augmented semirings we can simplify the backpointer version
significantly in order to incorporate the optimizations usually
encountered in practice. That is, the Maybes are required to make it a
semiring, but we can optimize both of them away in practice, yielding an
augmented semiring over (Prob,Q) or (Log Prob, Q).

We get the same sort of thing for variants of the Backward algorithm
used in the Forward--Backward algorithm. Of course, there's nothing
special about HMMs here. We can extend the Forward--Backward algorithm
to operate over tree structures instead of just list structures. That
version is called the Inside--Outside algorithm. And semirings show up
all over the place in other algorithms too.

Of course, in hindsight this makes perfect sense: the powerset of the
free semiring over S is the set of all (automata theoretic) languages
over S. So semirings capture languages exactly; in the same way that
commutative monoids capture multisets, and monoids capture sequences.
This insight also extends to cover things like weighted-logic
programming languages, since we can use any semiring we like, not just
the Boolean probability semiring. Automata theoretic languages are
everywhere.

[1] Or you combine the Forward and Backward algorithms, depending on
what exactly you want. Same goes for the others.

--
Live well,
~wren
```