[Haskell-cafe] Thoughts about redesigning "Num" type class

Tue Sep 15 12:27:11 UTC 2015

On Sat, Sep 12, 2015 at 3:15 PM, Jerzy Karczmarczuk
<jerzy.karczmarczuk at unicaen.fr> wrote:
> On the other hand, I would like to see where, in what kind of computations,
> the Rig type class would be useful

For a simple example, consider solving a sequence tagging problem with
a hidden Markov model. That is, an HMM is a pair of (1) a "transition
table" which gives the probability of moving from one state to
another, and (2) an "emission/generation/observation table" which
gives the probability of making an observation given that we're in a
particular state. The sequence tagging problem says, for some
particular HMM, if we're given a sequence of observations we want to
learn something about the sequence of states which would've generated
those observations. Of course the question is what exactly is this
"something" we want to learn? Things we could want to learn:

* what is the total (over all state sequences) probability of
generating the observed sequence?
* what is the probability of the most likely state sequence given the
observed sequence?
* what is the most likely state sequence (itself) given the observed sequence?
* what are the top N most likely states sequences...
...

We can implement a single algorithm that answers all of these
questions, the only difference is which semiring the algorithm uses
when doing stuff. For example, to get the total probability we use the
<+,*> semiring. To get the probability of the single most-likely
sequence we use the <max,+> semiring. To get the most-likely sequence
itself we can use the semiring obtained by modifying the <max,+>
semiring to keep track of backpointers. And so on.

More importantly, it's not just that we *can* come up with a single
unified algorithm, it's that this is the natural algorithm we'd come
up with for solving any of these problems individually. All these
problems can be solved by the same algorithm because they all have the
same structure as a dynamic program; the only difference is what the
values are that we store in the cells of the DP table and how we
combine those values.

Of course, there's nothing important about the fact that we're dealing
with HMMs nor probabilities. We can do the same sort of thing for any
notion of a first-order transition system; e.g., parsing with CFGs.

> (and how to squeeze infinity into it).

Whether your semiring has infinity or not doesn't matter (unless
infinity just so happens to be serving the role of the zero or unit).
So, no squeezing in to be done.

(If you're interested in *-semirings, then having infinities might
matter; but for that, see the blog post I mentioned previously.)

-- 
Live well,
~wren