[REBOL] Re: Compression
From: joel:neely:fedex at: 17-Apr-2001 17:12
Ryan Cole wrote:
> Joel Neely wrote:
> > ... Although the sequence "3/7" is a valid encoding for...
> > 0.42857142857142857142857142857142857142857142857...
> > ... I challenge you to find an equally compact encoding for...
> > 0.42857142857142857142857142857142857142857412857...
> Not exactly equally compact, but how about this:
> 23 / 7 S 6
> 23 divided by 7 skipping the first 6 digits.
> I know it does not work, but I could see the expression on your face
> from here. lol, lol, lol, lol.
> I am suprised you didnt here my laughs. Wew! I will be getting a good
> chuckle for awhile, thanks! Of course if I knew how to do such things
> Joel, you would have heared about it on the 6 o'clock news. ;^)
Well, sorry to disappoint on all three counts:
1) The correct answer is
2) Your visual imagination is better than my auditory perception. ;-)
3) I won't be on PBS, as the producers of Nova were not sufficiently
impressed with the production of the above answer. :-(
The real point of this little exercise, however, may have gotten lost
in the humor. Consider the following two messages:
Dear sir, we find that we are so overwhelmed with your ability
to perform arithmetic with numbers of more than two digits that
we would be delighted to accept your gracious invitation, and will
plan to visit your residence for an interview at your earliest
convenience. To be brief, yes.
Dear sir, madam, or whatever the case may be: Our attorneys will
be contacting you to serve a court order requiring that you cease
and desist any further attempts to harrass our receptionist or
other employees with your constant requests that we devote air
time to the fact that you can do arithmetic. To be blunt, no.
If those are the only two messages possible, I can compress each of
them to a single bit! The same would be true if each message were
composed in triplicate, with the second copy in Anglo-Saxon and the
third in Mandarin (which, for the sake of convenience, we'll assume
to be represented phonetically in ASCII).
The number of bits has nothing to do with the "length" of the message,
but with the total number of messages that are possible, and with
their relative probabilities.
Therefore, if someone claims to be able to condense the content of
3600 CDs onto a single floppy disk, I'd respond that it's entirely
possible IF THERE ARE ONLY A RELATIVELY SMALL NUMBER OF POSSIBLE
CDs. For example, if there are only 4096 possible CDs, one can
code each one by a 12-bit value, regardless of how much data may be
on each CD. Sending a 3600 CD message would then require only 675
bytes. (Of course, decompressing that 675 bytes would require that
the recipient already have access to the content of a full set of
OTOH, if a CD can contain 600Mb of arbitrary data, then a collection
of 3600 CDs equates to approximately 2.16 terabytes.
Any claim to be able to compress THAT collection of data down to
1.44Mb, if any CD is equally likely to appear, simply violates the
laws of physics.
It ain't gonna happen.
I mean no discouragement to Paul in his research into data compression
techniques. It's always possible to find special-case algorithms for
special-case data, and some of them have interesting applications,
such as JPEG compression of images and MP3 compression of music for
However, improved performance via a special-case approach always has
the cost of narrowing the range of its applicability and/or having
the effect you mentioned in an earlier post -- actually making data
outside its "target zone" grow significantly. What's interesting is
that there's actually a kind of conservation law here; the better it
gets inside the target zone, the worse it gets outside.
You can't win, you can't break even, and you can't get out of