Mailing List Archive: Re: Entropy [was Re: Compression]

[REBOL] Re: Entropy [was Re: Compression]

From: joel:neely:fedex at: 18-Apr-2001 17:22


Hi, Holger,

So as not to commit post-mortem equine assault, I'll be brief (for me
;-)

Holger Kruse wrote:
> On Wed, Apr 18, 2001 at 03:52:34PM -0500, Joel Neely wrote:
> > The key point is that an optimization hack that improves compression
> > for a specific situations usually loses for others...
>
> Not really. PPM, e.g., is essentially Huffman with adaptive
> context-sensitive probabilities, i.e. the prediction of the
> next input symbol is not just based on the overall frequency but
> also on some "state" determined by previously seen characters.
>

That's fine, and there are lots of adaptive algorithms which "learn"
how to handle each source stream... (but see below)

> The main drawback of PPM is that dynamically maintaining probabilities
> of multi-character sequences has a much higher overhead (memory and CPU)
> than the simple couting needed for Huffman. That's why PPM is not usually
> used in compression tools for the consumer market.
>

My experience with adaptive algorithms is that there's another drawback,
if compression ratio is the primary measure of success.  The knowledge
of the source model has to exist somewhere.  Special-case algorithms
put that knowledge in the algorithm, which minimizes the compressed data
for those messages that fit their model, but makes them lose for many
messages that don't fit.  Adaptive algorithms start off knowing nothing
about the specific message on a case-by-case basis, but learn as they
go.  This means that the knowledge about the model (which accumulates
during the processing of the message) is essentially embedded in the
compressed data stream itself.  Therefore the compressed data stream is
larger than it would be if that model were extracted (e.g. into the
algorithm or into out-of-band data), but the ability to find whatever
patterns are present protects the adaptive algorithm from spectacular
failure on pathological input.

> PPM (Partial Pattern Matching) is a generic model though that can be
> applied to a very large class of input files, so I would not call it
> an optimization hack. It would be a "hack" if the way the state and
> probabilities are maintained had some hidden model-specific
> assumptions, but that is not the case, at least not for PPM, DMC and BWT.
> The only assumption made is that *some* local context exists, and that
> is true for German, REBOL source code and a lot of other types of files.
>

If we agree (and I think we do) that PPM is not a special-case
optimization hack, then I really don't understand how it serves
as a counter-example to my original statement, which WAS about
special-case tweaks.  I still believe that statement holds up:

    The more one optimizes an algorithm for specific cases,
    the worse it becomes at dealing with other cases.

All I'm saying (and I think we're in agreement) is that there are no
free lunches.  Greater flexibility (adaptive algorithms) comes at the
price of less compression for specific classes of messages.  Greater
specialization for specific classes of messages produces better
compression ratios for that class but worse performance otherwise.

Classic computing science tradeoff.

Just for hack value, I modified my entropy calculator to figure
entropy for a 1-character context model, instead of the single-character
frequence model in the code I posted earlier.  A more sophisticated
model
clearly wins, as shown in the statistics below.  Entropy is in bits
per character; savings are percentages.

Program   Context   Source File 1 Statistics   Source File 2 Statistics
Version    width    Length  Entropy  Savings   Length  Entropy  Savings
-------   -------   ------  -------  -------   ------  -------  -------
   1      0 chars    2180   4.78717   40.160    3258   4.73303   40.837
   2      1 char     2180   2.94814   63.148    3258   2.82492   64.688

Again illustrating that the more sophisticated the model, the better the
compression, but the algorithm has to work harder and the increase in
model representation has to go somewhere (in the algorithm or embedded
in
the data stream).

Thanks again for some very thought-stimulating discussion!  It never
fails to make me think when I get into a conversation with you guys!

-jn-