[Haskell-cafe] How would you hack it?

Thu Jun 5 01:46:45 EDT 2008

On Wed, 4 Jun 2008, John Melesky wrote:

> So you use those occurrence statistics to pick a feasible next word
> (let's choose "system", since it's the highest probability here -- in
> practice you'd probably choose one randomly based on a weighted
> likelihood). Then you look for all the word pairs which start with
> "system", and choose the next word in the same fashion. Repeat for as
> long as you want.

"Markov chain" means, that you have a sequence of random experiments,
where the outcome of each experiment depends exclusively on a fixed number
(the level) of experiments immediately before the current one.

> Those word-pair statistics, when you have them for all the words in
> your vocabulary, comprise the first-level Markov data for your corpus.
>
> When you extend it to word triplets, it's second-level Markov data
> (and it will generate more reasonable fake text). You can build higher
> and higher Markov levels if you'd like.

If the level is too high, you will just reproduce the training text.