[Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Fri Jul 13 08:09:28 EDT 2007

G'day.

Quoting Bulat Ziganshin <bulat.ziganshin at gmail.com>:

> sorry, but you really misunderstand how PPM works and why it's so
> efficient.

I understand PPM.  I really do.

I think what you don't understand is why LZW does as well as it does
considering how simple it is.  It's because it's a poor but cheap
approximation to what PPM does.

Part of the misunderstanding is that you're looking at what PPM does
algorithmically, whereas I'm talking about what it's equivalent to
mathematically.  It turns out that pretty much all known text compression
algorithms are "equivalent" in the sense that they give the same
compression ration on the same text to within a (possibly large)
constant factor.

BZip, for example, gives a compression ratio that's almost always
within 5% or so of the best PPM variants given an input text.  How
could that be, since they're so different?

The answer is that they're actually _not_ that different, mathematically
speaking.

Cheers,
Andrew Bromage