[Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Bulat Ziganshin bulat.ziganshin at gmail.com
Thu Jul 12 06:33:33 EDT 2007


Hello ajb,

Thursday, July 12, 2007, 12:16:22 PM, you wrote:
>> well, ppm differs from simple order-0 coder in that it uses previous
>> symbols to calculate probability. lzw differs from order-0 coder with
>> flat static probabilities in that it encodes a whole word each time.

> LZW doesn't have the concept of a "word" at all.

in this case why you proposes to encode lzw words with a huffman
codes? :)

> Both PPM and LZW build up contexts as they go by starting with one
> context for each symbol in the input alphabet, and by extending
> contexts with single symbols as they are found.  (Conceptually; the
> algorithms are slightly different, but they do the same job.)

> The difference has to do, almost completely, with coding.  LZW uses a
> degenerate model to decide what the next input symbol is.  It assumes
> that all "seen" contexts are equally likely, and hence uses a binary
> code to encode them.  (A "binary" code represents the numbers 0 to k-1
> using k codewords each of which is (log k) bits in length.)

oops. ppm build tree of contexts and use context to encode one char.
lzw builds dictionary of words and encode word in empty context.
encoding in empty context is equal to order-0 coder while "prediction by
partial matching" by definition is order-N, N>0 encoding. so lzw can't
be considered as degenerate case of ppm although programming techniques
(building tree of words/contexts) are pretty close


>> it will be more fair to say that lzw is order-0 coder with static flat
>> probabilities, you agree?

> That's the _coding_ phase, sure.

you are right. so lzw is just dictionary-based transformation which
replaces some words with special codes while ppm replaces chars with
their probabilities. i hope you will agree that ppm with flat codes
will be totally useless


>> you said about algorithm that efficiently encodes english words using
>> huffman codes. is such algorithm really exists?

> Ask Google about "word-based huffman".

about which particular algorithm you said? Moffat?


-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Haskell-Cafe mailing list