Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Cleaning up swear words

 [1/6] from: btiffin:rogers at: 7-Jul-2007 3:32


Gentle listers, I've got some code to replace swears with comic book #$% stlye strings. Before I post it to rebol.org I thought maybe a few pros could check it over and kick me around a little bit. I'm still pretty new to parse, and this is my first ever charset. So any 'word' experts or parse gurus got any hints or advice? Happy lucky 070707 by the way... Cheers, Brian REBOL [ Title: "Momify bad words" Author: "Brian Tiffin" Date: 06-Jul-2007 File: %momify.r Purpose: "Translate bad words to cartoon speak" Version: 0.9.0 Comment: {Usage: do %momify.r clean "..."} ] ;; protect the global name space ;; do it an object or use global: do %momify context [ ;; It's a fairly incomplete list of bad words. 244 entries. ;; found at http://www.mattfacer.com/swear-filter/ ;; If you want to add to the list: ;; evaluate unhidelist, make your edits, evaluate hidelist, ;; cut'n' paste badwords.bin to badwords ;; then delete badwords.bin and badwords.raw ;; or for more protection use shred -zu on at least badwords.raw file unhidelist: does [ if any [not exists? %badwords.raw confirm "Overwrite badwords? "] [ write/lines %badwords.raw mold to string! first badwords foreach word next badwords [ write/lines/append %badwords.raw mold to string! word ] ] print "cover your eyes and edit %badwords.raw" ] ;; hidelist creates a list of words in binary so if REBOL ;; errors out no one will be exposed to the bad words. ;; After executing hidelist, you'll need to cut'n'paste include the ;; badwords.bin in the badwords definition hidelist: has [base words] [ base: system/options/binary-base system/options/binary-base: 64 words: load %badwords.raw while [not tail? words] [ change words to binary! first words words: next words ] save %badwords.bin compress mold head words system/options/binary-base: base print "Now cut'n'paste %badwords.bin to badwords: in source code" ] badwords: load decompress 64#{ eJx9l9tuqzgUhu/nKUaaFwDTVOWiFztqgaQNUqNdCB7NBYatuAmmKM2JjObd9zIB r2VSzUUkf7/Py+tA/r6/++vfOJq5/O3x8b8/O0rX5/IHkmyQsp/vbt9KJ1KgvqIr ZKt5W7B3pAvtiy/Yek25m7F9VaxpP6+TTc6S1lJ2OXsmY/LQ/7JXLZk8itDs+hE4 GVsPtPlxyuqFoWeP9lkjVfAlwrnkp0eqRAuLYpmxL1QuZRi0HK2hEll4yyZLz00Z bY263dF9KmvX81Gw5QSpZM0xQ6sovxHsLAtyKv9IbgQk2PxCyf+wR4NyM+NmXD2v wJYOVZINXy2loYOot+vRC9WJU4auS+5zmW5yXEOTF49fmgUeGcNkI5SxKatOGb4u OxGPzNi5+V/Ct2ITl/b59EyaWvdSpvGOr2ZUZRxe991SZJWlb5biukLFn7fKaK1v bn1ViTcbZS+NssOWWCUHsI69xq7w9AuQNe76lpfsTetAbq8pPDej03iVFGHV+w1/ CoiFeFiNybIDKF88fECCmH82dLbm+vVtpGuVs+pgE6ezDuBjk0JVHxBRrVGPhfEz HoEVTMzztJG5l3xlq2XVKx9TdxYu1nk7rWbR8DJ8s9RePJAKamzBGRy02lXxHUP2 PKBKiTA4YbwMqu1XWv21IjMdzINcwduT+doTzG07sqyiEmpZReOQQ0zTPiDsq615 t4RvZ8X2lax37xTzbj3x0WqgePFnvlruhbeEqJiPLaJHeSWMeLMVEs0662PLmruz TrizzrODSjBapYRXNXSHrQX6BAsmIvRJFod3l2UU956Uh8kFPQ2qjywUIeLXuV7F 5K5cQUwxX3HsV9VFYL+VBXM2IVm/I6gleKZe2VKib3NVoHKw/cRWdCRdFfEUNy9s yF2aytDQz6lamN00iRPSm1uGSQv1zsH7ifT5qm7WY2U0TlfJpH1N57I0dhxUOrtX YPbLejxbq4vTd2qe+ofv1cIeH92e7GHP1YMhn1oAiFgg9R3wg9aOEFsdcjhkbVCH mLpSoLgVA4OqI4Os16ljGu250XnJrN6R6VPBfY63U1XNL3FLaflJCFfpKLD6gotF 95RwP/8DW7PQ8jVQ4PTEgztF6Yq3tRWwi1HqZJ8pqN2mkoPiFIxWp+Jp2grjH0UY eJSSQ75aWBSP8k8RShe/4YoQbmRZGJR9YSqjbqEfdcQ41Jmp6ffpaaJ5Q9aO5kfa l1zw2/hKW0rGLlD39pA9PCSR0jxbtJPjq5r1Xl9AREP+ov0sbgULHELeUBd7cjEP QU2VLydKmEGBzpSkpLsAsZLkv15pxK2iClQqiEbyBQ9KQ/enGfwbCpfUwp3ybtGQ I3uaUOLueC7NoZ0SLa0bnV26v2vRRGL+LLypzMwtyqfAgV8Vv1lKWFXoS3C2Cv1c 06816YNMVm2y9DRS8H8OkGNXc60UFuFpO4L9yYl6xfb+QR2tE12/ayzFg3+CVGHB gZwlKujuUSlHZFm+U6LhXxCQK2rSB0Rfpdw8O9yby4GCulBDLitVJUnf5ccJY06f EP13dF5NaAVNJh71/8vM5A9N+AUA3xfwr9Gyl1b0f78//vkN4gHQ8mQPAAA} ;; Generate a parse rule rules: copy [] terms: exclude charset [#"^(00)" - #"^(ff)"] charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"] comicbook: "!-#$%^&*.~*&^%$#-!.~!-#$%^&*.~*&^%$#-!" foreach word badwords [ insert tail rules compose [ mark: (word) [terms | end] (to paren! compose [ change mark copy/part random comicbook (length? word) ]) | ] ] insert tail rules [skip] ;; ;; Get rid of badwords ;; momify: func [ "Replace bad words with comic book text" instr [string!] "String to clean - Modified" ][ parse/all instr [some rules] instr ] ;; uncomment to expose momify as clean, ;; or comment to hide all the words and then the usage is ;; a: do %momify.r a/momify "..." set 'clean :momify ]

 [2/6] from: btiffin:rogers at: 7-Jul-2007 3:38


On Saturday 07 July 2007 03:32, Brian Tiffin wrote:
> Gentle listers, > I've got some code to replace swears with comic book #$% stlye strings.
<<quoted lines omitted: 5>>
> Cheers, > Brian
For instance :) ... Just noticed a bug. The code cleans brass but leaves associate alone. Need a parse trick to advance to word boundaries and not just skip through the text. Cheers again.

 [3/6] from: Tom:Conlin:gm:ail at: 7-Jul-2007 1:05


using parse instead of parse/all will get word boundaries Brian Tiffin wrote:

 [4/6] from: santilli:gabriele:gma:il at: 7-Jul-2007 10:16


2007/7/7, Brian Tiffin <btiffin-rogers.com>:
> terms: exclude charset [#"^(00)" - #"^(ff)"] > charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
terms: complement charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"] Haven't looked at the rest yet. :) HTH, Gabriele.

 [5/6] from: btiffin:rogers at: 7-Jul-2007 4:50


Thanks much. Whether or not this is actually correct for the problem at hand, I learned a new thing. :) Tomc sent me the same shortcut. Cheers, On Saturday 07 July 2007 04:16, Gabriele Santilli wrote:

 [6/6] from: btiffin:rogers at: 8-Jul-2007 14:09


Hello again, Hope everyone got 'lucky' for the big 07. Anyway, thanks to Tom the code has changed so that it won't clean brass anymore. Now a new quandry. What to do with two bad words with no spacing. Is there a fairly fast way of checking for badword [terms | end | ... any other words] or should it just be left to clean brass (perhaps removing that badword as potentially not so bad and it is a common subword? Hmm...this gets a little six of one half dozen of the other...these word games. Who invented English? Did they not know anything about computers? Cheers, Brian On Saturday 07 July 2007 04:05, Tom wrote:
> using parse instead of parse/all will get word boundaries > > For instance :) ... Just noticed a bug. The code cleans brass but > > leaves associate alone. Need a parse trick to advance to word boundaries > > and not just skip through the text. > > > > Cheers again.
REBOL [ Title: "Momify bad words" Author: "Brian Tiffin" Date: 06-Jul-2007 File: %momify.r Purpose: "Translate bad words to cartoon speak" Version: 0.9.0 Comment: {Usage: do %momify.r clean "..."} ] ;; protect the global name space ;; do it an object or use global: do %momify context [ ;; It's a fairly incomplete list of bad words. 244 entries. ;; found at http://www.mattfacer.com/swear-filter/ ;; If you want to add to the list: ;; evaluate unhidelist, make your edits, evaluate hidelist, ;; cut'n' paste badwords.bin to badwords definition ;; then delete badwords.bin and badwords.raw ;; or for more protection use shred -zu on at least badwords.raw files unhidelist: does [ if any [not exists? %badwords.raw confirm "Overwrite badwords? "] [ write/lines %badwords.raw mold to string! first badwords foreach word next badwords [ write/lines/append %badwords.raw mold to string! word ] ] print "cover your eyes and edit %badwords.raw" ] ;; hidelist creates a list of words in binary so if REBOL ;; errors out no one will be exposed to the bad words. ;; Again executing hidelist, you'll need to cut'n'paste include the ;; badwords.bin in the badwords definition hidelist: has [base words] [ base: system/options/binary-base system/options/binary-base: 64 words: load %badwords.raw while [not tail? words] [ change words to binary! first words words: next words ] save %badwords.bin compress mold head words system/options/binary-base: base print "Now cut'n'paste %badwords.bin to badwords: in source code" ] badwords: load decompress 64#{ eJx9l9tuqzgUhu/nKUaaFwDTVOWiFztqgaQNUqNdCB7NBYatuAmmKM2JjObd9zIB r2VSzUUkf7/Py+tA/r6/++vfOJq5/O3x8b8/O0rX5/IHkmyQsp/vbt9KJ1KgvqIr ZKt5W7B3pAvtiy/Yek25m7F9VaxpP6+TTc6S1lJ2OXsmY/LQ/7JXLZk8itDs+hE4 GVsPtPlxyuqFoWeP9lkjVfAlwrnkp0eqRAuLYpmxL1QuZRi0HK2hEll4yyZLz00Z bY263dF9KmvX81Gw5QSpZM0xQ6sovxHsLAtyKv9IbgQk2PxCyf+wR4NyM+NmXD2v wJYOVZINXy2loYOot+vRC9WJU4auS+5zmW5yXEOTF49fmgUeGcNkI5SxKatOGb4u OxGPzNi5+V/Ct2ITl/b59EyaWvdSpvGOr2ZUZRxe991SZJWlb5biukLFn7fKaK1v bn1ViTcbZS+NssOWWCUHsI69xq7w9AuQNe76lpfsTetAbq8pPDej03iVFGHV+w1/ CoiFeFiNybIDKF88fECCmH82dLbm+vVtpGuVs+pgE6ezDuBjk0JVHxBRrVGPhfEz HoEVTMzztJG5l3xlq2XVKx9TdxYu1nk7rWbR8DJ8s9RePJAKamzBGRy02lXxHUP2 PKBKiTA4YbwMqu1XWv21IjMdzINcwduT+doTzG07sqyiEmpZReOQQ0zTPiDsq615 t4RvZ8X2lax37xTzbj3x0WqgePFnvlruhbeEqJiPLaJHeSWMeLMVEs0662PLmruz TrizzrODSjBapYRXNXSHrQX6BAsmIvRJFod3l2UU956Uh8kFPQ2qjywUIeLXuV7F 5K5cQUwxX3HsV9VFYL+VBXM2IVm/I6gleKZe2VKib3NVoHKw/cRWdCRdFfEUNy9s yF2aytDQz6lamN00iRPSm1uGSQv1zsH7ifT5qm7WY2U0TlfJpH1N57I0dhxUOrtX YPbLejxbq4vTd2qe+ofv1cIeH92e7GHP1YMhn1oAiFgg9R3wg9aOEFsdcjhkbVCH mLpSoLgVA4OqI4Os16ljGu250XnJrN6R6VPBfY63U1XNL3FLaflJCFfpKLD6gotF 95RwP/8DW7PQ8jVQ4PTEgztF6Yq3tRWwi1HqZJ8pqN2mkoPiFIxWp+Jp2grjH0UY eJSSQ75aWBSP8k8RShe/4YoQbmRZGJR9YSqjbqEfdcQ41Jmp6ffpaaJ5Q9aO5kfa l1zw2/hKW0rGLlD39pA9PCSR0jxbtJPjq5r1Xl9AREP+ov0sbgULHELeUBd7cjEP QU2VLydKmEGBzpSkpLsAsZLkv15pxK2iClQqiEbyBQ9KQ/enGfwbCpfUwp3ybtGQ I3uaUOLueC7NoZ0SLa0bnV26v2vRRGL+LLypzMwtyqfAgV8Vv1lKWFXoS3C2Cv1c 06816YNMVm2y9DRS8H8OkGNXc60UFuFpO4L9yYl6xfb+QR2tE12/ayzFg3+CVGHB gZwlKujuUSlHZFm+U6LhXxCQK2rSB0Rfpdw8O9yby4GCulBDLitVJUnf5ccJY06f EP13dF5NaAVNJh71/8vM5A9N+AUA3xfwr9Gyl1b0f78//vkN4gHQ8mQPAAA} ;; Replacement characters comicbook: "!-#$%^&*.~*&^%$#-!.~!-#$%^&*.~*&^%$#-!" ;; ensure words are shorter to longer sort/compare badwords func [a b] [sign? subtract length? a length? b] ;; non terminators nont: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"] ;; terminators terms: complement nont ;; big block of rules rules: copy [] reprules: copy [] ;; build a replacement rule for each badword foreach word badwords [ insert tail reprules compose/deep [ mark: [ (word) [terms | end] (to paren! compose [ change mark copy/part random comicbook (length? word) ]) :mark] | ] ] ;; then last alternate is to advance past non badword ;; then a rule to skip any terminators insert tail reprules [some nont any terms] ;; Generate the parse rule insert tail rules compose/only [any terms some (reprules)] ;; ;; Get rid of badwords ;; momify: func [ "Replace bad words with comic book text" instr [string!] "String to clean - Modified" ][ parse/all instr [some [rules]] instr ] ;; uncomment to expose momify as clean, ;; or comment to hide all the words and then the usage is ;; a: do %momify.r a/momify "..." ;; that is the only way to get at unhidelist and hidelist set 'clean :momify ]

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted