[Haskell-cafe] vector-space and standard API for vectors
wren ng thornton
wren at freegeek.org
Sat Nov 6 04:06:49 EDT 2010
On 11/5/10 7:54 AM, Alexey Khudyakov wrote:
>
>> We already know that there are noncommutative modules/vectorspaces of
>> interest (e.g., modules over quaternions and modules over graph paths),
>> why not support them from the beginning? It seems like you're going out
>> of your way to exclude things that would be trivial to include. This is
>> exactly why this is my standard complaint against the various proposals
>> out there for new numeric hierarchies. People who are used to only using
>> R^n think the proposals are just fine, but none of the proposals capture
>> the structures I work with daily. Which means the new proposals are no
>> better than the Prelude for me.
>
> Could you tell what data structures do you use? It's difficult to think
> about them without concrete examples.
Data structures? That varies a lot depending on the task: Data.Map,
Data.Set, Data.IntMap, Data.IntSet, Data.Trie, Data.ByteString...
A lot of my concrete examples of semirings and modules come from natural
language processing tasks. One example I already mentioned is the
semiring of a collection of paths over a graph (so something like
Data.Set (Data.Seq Arc)). Path collections show up, for example, when
dealing with Markov chains and HMMs where the goal is to maximize or sum
the weights over all paths.
To make it clearer, a Markov chain is a probabilistic version of a
finite state automaton, so you have some set of nodes, and the arcs for
transitioning from one node to another have probabilistic weights on
them. An HMM is an extension of a Markov chain into a probabilistic
version of a Moore machine, so in addition to the probabilistic
transitions from state to state, we also have for each state a
probability distribution over emitted symbols.
An interesting problem for HMMs is this: given some observed sequence of
emitted symbols, reconstruct the most likely path of states which would
cause the symbol sequence to be emitted. A nearly identical problem is:
given some observed sequence of symbols, determine the total probability
of all state sequences which could have generated it.
In addition to the perspective of HMMs as probabilistic Moore machines,
there are two other perspectives which are helpful. One is the graphical
model perspective where we have something that looks a bit like this (in
fixed-width font):
Q0 -> Q1 -> Q2 -> ... -> Qn
| | |
v v v
S1 S2 Sn
Each Sk and Qk are random variables. The random variables Qk represent
being in some particular state q at time k, and the choice of which
state is drawn from a probability distribution based on the state
Q(k-1). The random variables Sk represent emitting some particular
symbol s at time k, and the choice of which symbol is drawn from a
distribution based on the state Qk.
The third perspective, which is the most helpful one for solving our two
problems, is if we take this graphical model and unfold it into a
trellis graph (ignoring the Sk variables for now). Each node in the
trellis represents an assignment of particular values to each of the
random variables. So if Q1 could take on values qA, qB, and qC then we'd
have three nodes for each of Q1=qA, Q1=qB, and Q1=qC. The arcs in the
trellis are weighted with the probability of transitioning from one node
to the next; so an arc Q1=q1 -> Q2=q2 has weight Pr(Q2=q2 | Q1=q1). A
path through the trellis represents a variable assignment, which is to
say a sequence of states in the Markov chain; and the weight of the path
is the probability of the Markov chain taking that path.
There is a general algorithm for solving the two problems I mentioned,
and ultimately they're the same algorithm except with different
semirings. Note that a collection of paths between two points on a graph
forms a semiring[1] where sum is the union of path collections and
product is the extension of paths[2], so the answers we want can be
gotten by semiring homomorphisms from the collection of paths to some
other domain. To get the probability of all state sequences which could
give rise to a given symbol sequence we can use the probability
semiring[3] ---which we can simplify to the metric space [0..1] with (+)
and (*), since the algorithm ensures that all events are disjoint. This
version is called the "forward algorithm". To get the probability the
most likely state sequence we can use the semiring [0..1] with max and
(+), which is called the "Viterbi algorithm". In practice we tend to use
the log version of these semirings in order to prevent underflow.
There's also a variant of the Viterbi algorithm which stores
backpointers to the most likely previous state, which makes it easier to
recover the most likely state sequence instead of just the
(log)probability of the sequence. The Viterbi algorithm with back
pointers is also a semiring: Maybe(Prob, Maybe State) with Nothing as
zero, Just(1,Nothing) as one, argmax as sum: mx<+>my = do { (px,_) <- mx
; (py,_) <- my ; if px > py then mx else my }, and product: mx<*>my = do
{ (px,x) <- mx ; (py,y) <- my ; Just (px*py, y `mplus` x)}.
The collection-of-paths semiring is noncommutative since extending the
end of a path is different than extending the beginning. The Viterbi
with backpointers semiring is noncommutative since we'll get different
backpointers depending on the order of arguments to the product. If we
extend our probabilities to use quantum probability theory then all of
these probabilistic semirings become noncommutative because of order
effects on quantum probabilities.
These HMM problems can also be thought of from the perspective of a
grammar, where HMMs happen to be restricted to produce linear derivation
"trees". Looking at other grammars like probabilistic CFGs allows us to
extend the idea of generating sequences to the idea of generating trees.
The forward-backward algorithm for sequences (of which the forward
algorithm is half) generalizes to the inside-outside algorithm for
trees. Ultimately, chart parsing algorithms like CKY are doing something
very similar to the Viterbi algorithm in order to determine the most
likely parse tree which would give rise to some observed sentence. So
all these semirings have tree-based analogues as well. This use of
semirings is just like the use of monoids in finger trees[4], except
that we generalize the idea from a monoid on one tree into a semiring on
a packed forest of trees. Naturally there are many such semirings which
are noncommutative since adding a left-child to a tree is different than
adding a right-child.
[1] Or actually a 2-semiring, much as a groupoid (2-group) is a
generalization of groups and a category (2-monoid) is a generalization
of a monoid.
[2] Note that the collection of paths semiring also has something like a
module structure. The "scalars" are single arcs in the underlying graph.
I only say that it's something like a module/vector space, because
whether the family of arcs supports its own semiring/ring/field
operations depends on the specific graph in question. If we assume
certain closure properties on the family of arcs, similar to the
transitive closure properties of composition in categories, then it is
indeed a module. However, we do not have those closure properties for
the specific example in question.
[3] That is the event space E with operations <+> and <*> with the empty
event as zero and the certainty event as one. Colloquially <+> is the
union of events and <*> is the intersection. For x,y in E their
probabilities are defined by Pr(x<+>y) = Pr(x) + Pr(y) - Pr(x,y) and
Pr(x<*>y) = Pr(x,y) where Pr(x,y) = Pr(x)*Pr(y) iff x and y are independent.
[4] http://apfelmus.nfshost.com/articles/monoid-fingertree.html
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list