New LFReD compression system REVEALED.
[1/1] from: tbrownell::shaw::ca at: 8-Mar-2002 17:45
Alan Parman wrote...
> However, I will remain politely skeptical until I see it in action.
Your wish is my command... find the hacked script at the bottom of this
post.
Example;
Original string @ 552 bits: "Ok, here's another test with some unusual words
like tiger and goat"
LFReD Compression @ 304 bits
The LFReD Compressed string can then be compressed using any lossless
compression method. Using Huffman and LFReD Compression brings this string
down to approx 78 bits.
This system currently covers 1,939 of the most common words i could find.
There's room for an additional 65,000+ that will be added to the code.
Notes:
- This demo and method is still buggy and hardly tested.
- It's not quite lossless, because parsing chokes on commas and quotes etc,
i've just removed them from the original. So punctuation needs some work
- The 1,939 words represent most of the shorter one, two and three letter
words, which don't compress as well as longer ones. Once the additional
65,000 words are added, the compression ratio will increase dramatically.
- Also, specific dictionary's can be added such as chemistry terms, medical
terms etc which would yield even higher results when compressing those types
of documents. eg: compressing "Streptococcus pneumoniae" into as little as
8 bits.
- The interface here can use some work, it's designed as a simple demo.
- I've let this part out into the world as I have another system that,
frankly, blows this one away.
What I do need is some way of dealing with the other, non traditional ascii
codes? Any ideas?
Enjoy
TBrownell
<---start--->
REBOL [title "LFReD Compression"]
encodes: read http://www.lfred.com/Lcomp/encodes.txt
encodes: parse encodes none
view layout [
across
vh2 "LFReD Compression Utility"
return
vh3 "---- Compression ----"
return
Text white "Path to uncompressed file (without %)"
return
text white "%" uncompd: field
return
Text white "Path and name for compressed file..."
return
text white "%"compd: field
return
button "Compress" [
val: read to-file uncompd/text
replace/all val {"} { ' }
replace/all val {.} { . }
replace/all val {?} { ? }
replace/all val {!} { ! }
replace/all val {,} {}
replace/all val {-} { - }
pval: parse val none
r: copy {}
c: copy {}
foreach val pval [x: 0 foreach [code word] encodes [if val = word [append c
code x: 1]]if x = 0 [append c "**" append r join val " "]]
insert c rejoin ["," r "X,"]
write to-file compd/text c
]
return
VH3 "---- Decompression ----"
return
Text white "Path to file for decompression..."
return
text white "%" DecomPath: field
return
t3: vtext 500x300 white 0.0.80
return
button "Decompress"[
decompath: to-file decompath/text
decom: read decompath
parse decom [thru "," copy b to "," thru "," copy c to end]
if not empty? b [b: parse b none]
n: copy {}
foreach [x y] c [zz: join x y either find zz "*" [append n rejoin [first b "
] b: next b][foreach [code word]encodes [if find/case code zz [append n
rejoin [word
"]]]]]
replace/all n { ' } {'}
replace/all n { . } {. }
replace/all n { ? } {? }
replace/all n { ! } {! }
replace/all n { - } {-}
t3/text: n
show t3
]]
----- Original Message -----
From: "alan parman" <[reboler--programmer--net]>
To: <[rebol-list--rebol--com]>
Sent: Tuesday, March 05, 2002 7:20 AM
Subject: [REBOL] Re: Rebol and a new compression system.
> Terry,
> Sounds great! Would have many uses where text is the medium (html, REBOL
scripts, e-mail, etc).
> However, I will remain politely skeptical until I see it in action.
> While I don't have much experience with the mechanics of compression, I
have read some about it.
> And from that I am heartened by your description of giving better
compression for a specific type of file ("This works only for communication
using standard english words..."). _A_Lot_ of work has been done in this
area, and _many_ claims of a better system have been proven false. But, many
compression schemes are tuned for _any_ type of file (standard zip , PKZIP
WinZip etc), so claims for a better system for a specific type of file are
plausible.
> I am openly skeptical about your (implied) claim that you can greatly
compress an already compressed file ("If a better compression system than
winzip is used, then the 213 would be even smaller."). While not
impossible, I would like to see if it is a general phenomenon for any
English text file.