New LFReD compression system REVEALED.

[1/1] from: tbrownell::shaw::ca at: 8-Mar-2002 17:45

Alan Parman wrote...

> However, I will remain politely skeptical until I see it in action.

Your wish is my command... find the hacked script at the bottom of this post. Example; Original string @ 552 bits: "Ok, here's another test with some unusual words like tiger and goat" LFReD Compression @ 304 bits The LFReD Compressed string can then be compressed using any lossless compression method. Using Huffman and LFReD Compression brings this string down to approx 78 bits. This system currently covers 1,939 of the most common words i could find. There's room for an additional 65,000+ that will be added to the code. Notes: - This demo and method is still buggy and hardly tested. - It's not quite lossless, because parsing chokes on commas and quotes etc, i've just removed them from the original. So punctuation needs some work - The 1,939 words represent most of the shorter one, two and three letter words, which don't compress as well as longer ones. Once the additional 65,000 words are added, the compression ratio will increase dramatically. - Also, specific dictionary's can be added such as chemistry terms, medical terms etc which would yield even higher results when compressing those types of documents. eg: compressing "Streptococcus pneumoniae" into as little as 8 bits. - The interface here can use some work, it's designed as a simple demo. - I've let this part out into the world as I have another system that, frankly, blows this one away. What I do need is some way of dealing with the other, non traditional ascii codes? Any ideas? Enjoy TBrownell <---start---> REBOL [title "LFReD Compression"] encodes: read http://www.lfred.com/Lcomp/encodes.txt encodes: parse encodes none view layout [ across vh2 "LFReD Compression Utility" return vh3 "---- Compression ----" return Text white "Path to uncompressed file (without %)" return text white "%" uncompd: field return Text white "Path and name for compressed file..." return text white "%"compd: field return button "Compress" [ val: read to-file uncompd/text replace/all val {"} { ' } replace/all val {.} { . } replace/all val {?} { ? } replace/all val {!} { ! } replace/all val {,} {} replace/all val {-} { - } pval: parse val none r: copy {} c: copy {} foreach val pval [x: 0 foreach [code word] encodes [if val = word [append c code x: 1]]if x = 0 [append c "**" append r join val " "]] insert c rejoin ["," r "X,"] write to-file compd/text c ] return VH3 "---- Decompression ----" return Text white "Path to file for decompression..." return text white "%" DecomPath: field return t3: vtext 500x300 white 0.0.80 return button "Decompress"[ decompath: to-file decompath/text decom: read decompath parse decom [thru "," copy b to "," thru "," copy c to end] if not empty? b [b: parse b none] n: copy {} foreach [x y] c [zz: join x y either find zz "*" [append n rejoin [first b " ] b: next b][foreach [code word]encodes [if find/case code zz [append n rejoin [word "]]]]] replace/all n { ' } {'} replace/all n { . } {. } replace/all n { ? } {? } replace/all n { ! } {! } replace/all n { - } {-} t3/text: n show t3 ]] ----- Original Message ----- From: "alan parman" <[reboler--programmer--net]> To: <[rebol-list--rebol--com]> Sent: Tuesday, March 05, 2002 7:20 AM Subject: [REBOL] Re: Rebol and a new compression system.

> Terry, > Sounds great! Would have many uses where text is the medium (html, REBOL

scripts, e-mail, etc).

> However, I will remain politely skeptical until I see it in action. > While I don't have much experience with the mechanics of compression, I

have read some about it.

> And from that I am heartened by your description of giving better

compression for a specific type of file ("This works only for communication using standard english words..."). _A_Lot_ of work has been done in this area, and _many_ claims of a better system have been proven false. But, many compression schemes are tuned for _any_ type of file (standard zip , PKZIP WinZip etc), so claims for a better system for a specific type of file are plausible.

> I am openly skeptical about your (implied) claim that you can greatly

compress an already compressed file ("If a better compression system than winzip is used, then the 213 would be even smaller."). While not impossible, I would like to see if it is a general phenomenon for any English text file.