[REBOL] Re: Entropy [was Re: Compression]
From: holger:rebol at: 18-Apr-2001 14:43
On Wed, Apr 18, 2001 at 03:52:34PM -0500, Joel Neely wrote:
> The key point is that an optimization hack that improves compression
> for a specific situations usually loses for others. English, to use
> your example, is highly redundant, and we can achieve the impressive
> rates you cite only by taking advantage of that specific redundancy.
> If we try to apply those same tricks to German, REBOL source code,
> or MP3 files, they will simply fail to help (at best) and can
> significantly degrade the compression (at worst).
Not really. PPM, e.g., is essentially Huffman with adaptive
context-sensitive probabilities, i.e. the prediction of the
next input symbol is not just based on the overall frequency but
also on some "state" determined by previously seen characters.
The main drawback of PPM is that dynamically maintaining probabilities
of multi-character sequences has a much higher overhead (memory and CPU)
than the simple couting needed for Huffman. That's why PPM is not usually
used in compression tools for the consumer market.
PPM (Partial Pattern Matching) is a generic model though that can be
applied to a very large class of input files, so I would not call it
an optimization hack. It would be a "hack" if the way the state and
probabilities are maintained had some hidden model-specific
assumptions, but that is not the case, at least not for PPM, DMC and BWT.
The only assumption made is that *some* local context exists, and that
is true for German, REBOL source code and a lot of other types of files.