[Haskell-cafe] Toy compression algorithms [was: A very edgy language]

Tue Jul 10 23:55:22 EDT 2007

G'day all.

Andrew Coppin wrote:

> > Actually, LZW works surprisingly well for such a trivial little
> > algorithm... When you compare it to the complexity of an arithmetic
> > coder driven by a high-order adaptive PPM model (theoretically the best
> > general-purpose algorithm that exists), it's amazing that anything so
> > simple as LZW should work at all!

Not really.  LZW is basically PPM with a static (and flat!) frequency
prediction model.  The contexts are build in a very similar way.

Quoting Bulat Ziganshin <bulat.ziganshin at gmail.com>:

> the devil in details. just imagine size of huffman table with 64k
> entries :)

If you're allowed to pick your code values, you can do this extremely
compactly.  Some compression schemes optimised for English separates
words from the intermediate space and assigns a Huffman code to each
word.  The tables aren't that big (though the space to hold the words
might be).

Cheers,
Andrew Bromage