Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Entropy [was Re: Compression]

From: holger:rebol at: 18-Apr-2001 14:43

On Wed, Apr 18, 2001 at 03:52:34PM -0500, Joel Neely wrote:
> The key point is that an optimization hack that improves compression > for a specific situations usually loses for others. English, to use > your example, is highly redundant, and we can achieve the impressive > rates you cite only by taking advantage of that specific redundancy. > If we try to apply those same tricks to German, REBOL source code, > or MP3 files, they will simply fail to help (at best) and can > significantly degrade the compression (at worst).
Not really. PPM, e.g., is essentially Huffman with adaptive context-sensitive probabilities, i.e. the prediction of the next input symbol is not just based on the overall frequency but also on some "state" determined by previously seen characters. The main drawback of PPM is that dynamically maintaining probabilities of multi-character sequences has a much higher overhead (memory and CPU) than the simple couting needed for Huffman. That's why PPM is not usually used in compression tools for the consumer market. PPM (Partial Pattern Matching) is a generic model though that can be applied to a very large class of input files, so I would not call it an optimization hack. It would be a "hack" if the way the state and probabilities are maintained had some hidden model-specific assumptions, but that is not the case, at least not for PPM, DMC and BWT. The only assumption made is that *some* local context exists, and that is true for German, REBOL source code and a lot of other types of files. -- Holger Kruse [holger--rebol--com]