[Haskell-cafe] Higher-order algorithms

Tue Aug 24 00:29:23 EDT 2010

Eugene Kirpichov wrote:
> Do there exist other nontrivial higher-order algorithms and datastructures?
> Is the field of higher-order algorithms indeed as unexplored as it seems?

Many algorithms in natural language processing can be captured by 
higher-order algorithms parameterized by the choice of semiring (or 
module space).

For example, consider the inference problem for hidden Markov models 
(which are often used for things like determining the part of speech 
tags for some sentence in natural language). To figure out the total 
probability that the HMM is in some state at some time, you use the 
Forward algorithm.[1] To figure out the probability of the most likely 
state sequence that has a specific state at some time, you use the 
Viterbi algorithm. To figure out not only the probability of the most 
likely state sequence but also what that tag sequence actually is, you 
can modify Viterbi to store back pointers.

All of these are the same algorithm, just with different (augmented) 
semirings. In order to prevent underflow for very small probabilities, 
we usually run these algorithms with probabilities in the log-domain. 
Those variants are also the same algorithm, just taking the image of the 
semiring under the logarithm functor:

Forward     : FW ([0,1], +, 0, *, 1)

Log Forward : FW ([-Inf,0], <+>, -Inf, +, 0)
     where
     -- Ignoring infinities...
     x <+> y | x >= y    = x + log (1 + exp (y-x))
             | otherwise = y + log (1 + exp (x-y))

Viterbi     : FW ([0,1], max, 0, *, 1)

Log Viterbi : FW ([-Inf,0], max, -Inf, +, 0)

ViterbiBP Q : FW (Maybe([0,1],Maybe Q), argmax, Nothing, <*>, 
Just(1,Nothing))
     where
     -- Q = the type of the states in your HMM
     mx <*> my = do
         (px,x) <- mx
         (py,y) <- my
         return (px*py, y `mappend` x)

Log (ViterbiBP Q)
     : FW ( Maybe([-Inf,0],Maybe Q)
          , argmax, Nothing
          , <+>, Just(0,Nothing))
     where
     mx <+> my = do
         (px,x) <- mx
         (py,y) <- my
         return (px+py, y `mappend` x)

Using augmented semirings we can simplify the backpointer version 
significantly in order to incorporate the optimizations usually 
encountered in practice. That is, the Maybes are required to make it a 
semiring, but we can optimize both of them away in practice, yielding an 
augmented semiring over (Prob,Q) or (Log Prob, Q).

We get the same sort of thing for variants of the Backward algorithm 
used in the Forward--Backward algorithm. Of course, there's nothing 
special about HMMs here. We can extend the Forward--Backward algorithm 
to operate over tree structures instead of just list structures. That 
version is called the Inside--Outside algorithm. And semirings show up 
all over the place in other algorithms too.

Of course, in hindsight this makes perfect sense: the powerset of the 
free semiring over S is the set of all (automata theoretic) languages 
over S. So semirings capture languages exactly; in the same way that 
commutative monoids capture multisets, and monoids capture sequences. 
This insight also extends to cover things like weighted-logic 
programming languages, since we can use any semiring we like, not just 
the Boolean probability semiring. Automata theoretic languages are 
everywhere.

[1] Or you combine the Forward and Backward algorithms, depending on 
what exactly you want. Same goes for the others.

-- 
Live well,
~wren