Cleaning up swear words
[1/6] from: btiffin:rogers at: 7-Jul-2007 3:32
Gentle listers,
I've got some code to replace swears with comic book #$% stlye strings.
Before I post it to rebol.org I thought maybe a few pros could check it over
and kick me around a little bit. I'm still pretty new to parse, and this is
my first ever charset. So any 'word' experts or parse gurus got any hints or
advice?
Happy lucky 070707 by the way...
Cheers,
Brian
REBOL [
Title: "Momify bad words"
Author: "Brian Tiffin"
Date: 06-Jul-2007
File: %momify.r
Purpose: "Translate bad words to cartoon speak"
Version: 0.9.0
Comment: {Usage: do %momify.r clean "..."}
]
;; protect the global name space
;; do it an object or use global: do %momify
context [
;; It's a fairly incomplete list of bad words. 244 entries.
;; found at http://www.mattfacer.com/swear-filter/
;; If you want to add to the list:
;; evaluate unhidelist, make your edits, evaluate hidelist,
;; cut'n' paste badwords.bin to badwords
;; then delete badwords.bin and badwords.raw
;; or for more protection use shred -zu on at least badwords.raw file
unhidelist: does [
if any [not exists? %badwords.raw confirm "Overwrite badwords? "] [
write/lines %badwords.raw mold to string! first badwords
foreach word next badwords [
write/lines/append %badwords.raw mold to string! word
]
]
print "cover your eyes and edit %badwords.raw"
]
;; hidelist creates a list of words in binary so if REBOL
;; errors out no one will be exposed to the bad words.
;; After executing hidelist, you'll need to cut'n'paste include the
;; badwords.bin in the badwords definition
hidelist: has [base words] [
base: system/options/binary-base system/options/binary-base: 64
words: load %badwords.raw
while [not tail? words] [
change words to binary! first words words: next words
]
save %badwords.bin compress mold head words
system/options/binary-base: base
print "Now cut'n'paste %badwords.bin to badwords: in source code"
]
badwords: load decompress 64#{
eJx9l9tuqzgUhu/nKUaaFwDTVOWiFztqgaQNUqNdCB7NBYatuAmmKM2JjObd9zIB
r2VSzUUkf7/Py+tA/r6/++vfOJq5/O3x8b8/O0rX5/IHkmyQsp/vbt9KJ1KgvqIr
ZKt5W7B3pAvtiy/Yek25m7F9VaxpP6+TTc6S1lJ2OXsmY/LQ/7JXLZk8itDs+hE4
GVsPtPlxyuqFoWeP9lkjVfAlwrnkp0eqRAuLYpmxL1QuZRi0HK2hEll4yyZLz00Z
bY263dF9KmvX81Gw5QSpZM0xQ6sovxHsLAtyKv9IbgQk2PxCyf+wR4NyM+NmXD2v
wJYOVZINXy2loYOot+vRC9WJU4auS+5zmW5yXEOTF49fmgUeGcNkI5SxKatOGb4u
OxGPzNi5+V/Ct2ITl/b59EyaWvdSpvGOr2ZUZRxe991SZJWlb5biukLFn7fKaK1v
bn1ViTcbZS+NssOWWCUHsI69xq7w9AuQNe76lpfsTetAbq8pPDej03iVFGHV+w1/
CoiFeFiNybIDKF88fECCmH82dLbm+vVtpGuVs+pgE6ezDuBjk0JVHxBRrVGPhfEz
HoEVTMzztJG5l3xlq2XVKx9TdxYu1nk7rWbR8DJ8s9RePJAKamzBGRy02lXxHUP2
PKBKiTA4YbwMqu1XWv21IjMdzINcwduT+doTzG07sqyiEmpZReOQQ0zTPiDsq615
t4RvZ8X2lax37xTzbj3x0WqgePFnvlruhbeEqJiPLaJHeSWMeLMVEs0662PLmruz
TrizzrODSjBapYRXNXSHrQX6BAsmIvRJFod3l2UU956Uh8kFPQ2qjywUIeLXuV7F
5K5cQUwxX3HsV9VFYL+VBXM2IVm/I6gleKZe2VKib3NVoHKw/cRWdCRdFfEUNy9s
yF2aytDQz6lamN00iRPSm1uGSQv1zsH7ifT5qm7WY2U0TlfJpH1N57I0dhxUOrtX
YPbLejxbq4vTd2qe+ofv1cIeH92e7GHP1YMhn1oAiFgg9R3wg9aOEFsdcjhkbVCH
mLpSoLgVA4OqI4Os16ljGu250XnJrN6R6VPBfY63U1XNL3FLaflJCFfpKLD6gotF
95RwP/8DW7PQ8jVQ4PTEgztF6Yq3tRWwi1HqZJ8pqN2mkoPiFIxWp+Jp2grjH0UY
eJSSQ75aWBSP8k8RShe/4YoQbmRZGJR9YSqjbqEfdcQ41Jmp6ffpaaJ5Q9aO5kfa
l1zw2/hKW0rGLlD39pA9PCSR0jxbtJPjq5r1Xl9AREP+ov0sbgULHELeUBd7cjEP
QU2VLydKmEGBzpSkpLsAsZLkv15pxK2iClQqiEbyBQ9KQ/enGfwbCpfUwp3ybtGQ
I3uaUOLueC7NoZ0SLa0bnV26v2vRRGL+LLypzMwtyqfAgV8Vv1lKWFXoS3C2Cv1c
06816YNMVm2y9DRS8H8OkGNXc60UFuFpO4L9yYl6xfb+QR2tE12/ayzFg3+CVGHB
gZwlKujuUSlHZFm+U6LhXxCQK2rSB0Rfpdw8O9yby4GCulBDLitVJUnf5ccJY06f
EP13dF5NaAVNJh71/8vM5A9N+AUA3xfwr9Gyl1b0f78//vkN4gHQ8mQPAAA}
;; Generate a parse rule
rules: copy []
terms: exclude charset [#"^(00)" - #"^(ff)"]
charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
comicbook: "!-#$%^&*.~*&^%$#-!.~!-#$%^&*.~*&^%$#-!"
foreach word badwords [
insert tail rules compose [
mark: (word) [terms | end] (to paren! compose [
change mark copy/part random comicbook (length? word)
]) |
]
]
insert tail rules [skip]
;;
;; Get rid of badwords
;;
momify: func [
"Replace bad words with comic book text"
instr [string!] "String to clean - Modified"
][
parse/all instr [some rules]
instr
]
;; uncomment to expose momify as clean,
;; or comment to hide all the words and then the usage is
;; a: do %momify.r a/momify "..."
set 'clean :momify
]
[2/6] from: btiffin:rogers at: 7-Jul-2007 3:38
On Saturday 07 July 2007 03:32, Brian Tiffin wrote:
> Gentle listers,
> I've got some code to replace swears with comic book #$% stlye strings.
<<quoted lines omitted: 5>>
> Cheers,
> Brian
For instance :) ... Just noticed a bug. The code cleans brass but leaves
associate alone. Need a parse trick to advance to word boundaries and not
just skip through the text.
Cheers again.
[3/6] from: Tom:Conlin:g:mail at: 7-Jul-2007 1:05
using parse instead of parse/all will get word boundaries
Brian Tiffin wrote:
[4/6] from: santilli:gabriele:gmai:l at: 7-Jul-2007 10:16
2007/7/7, Brian Tiffin <btiffin-rogers.com>:
> terms: exclude charset [#"^(00)" - #"^(ff)"]
> charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
terms: complement charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
Haven't looked at the rest yet. :)
HTH,
Gabriele.
[5/6] from: btiffin:rogers at: 7-Jul-2007 4:50
Thanks much. Whether or not this is actually correct for the problem at hand,
I learned a new thing. :) Tomc sent me the same shortcut.
Cheers,
On Saturday 07 July 2007 04:16, Gabriele Santilli wrote:
[6/6] from: btiffin:rogers at: 8-Jul-2007 14:09
Hello again,
Hope everyone got 'lucky' for the big 07.
Anyway, thanks to Tom the code has changed so that it won't clean brass
anymore. Now a new quandry. What to do with two bad words with no spacing.
Is there a fairly fast way of checking for badword [terms | end | ... any
other words] or should it just be left to clean brass (perhaps removing that
badword as potentially not so bad and it is a common subword?
Hmm...this gets a little six of one half dozen of the other...these word
games. Who invented English? Did they not know anything about computers?
Cheers,
Brian
On Saturday 07 July 2007 04:05, Tom wrote:
> using parse instead of parse/all will get word boundaries
> > For instance :) ... Just noticed a bug. The code cleans brass but
> > leaves associate alone. Need a parse trick to advance to word boundaries
> > and not just skip through the text.
> >
> > Cheers again.
REBOL [
Title: "Momify bad words"
Author: "Brian Tiffin"
Date: 06-Jul-2007
File: %momify.r
Purpose: "Translate bad words to cartoon speak"
Version: 0.9.0
Comment: {Usage: do %momify.r clean "..."}
]
;; protect the global name space
;; do it an object or use global: do %momify
context [
;; It's a fairly incomplete list of bad words. 244 entries.
;; found at http://www.mattfacer.com/swear-filter/
;; If you want to add to the list:
;; evaluate unhidelist, make your edits, evaluate hidelist,
;; cut'n' paste badwords.bin to badwords definition
;; then delete badwords.bin and badwords.raw
;; or for more protection use shred -zu on at least badwords.raw files
unhidelist: does [
if any [not exists? %badwords.raw confirm "Overwrite badwords? "] [
write/lines %badwords.raw mold to string! first badwords
foreach word next badwords [
write/lines/append %badwords.raw mold to string! word
]
]
print "cover your eyes and edit %badwords.raw"
]
;; hidelist creates a list of words in binary so if REBOL
;; errors out no one will be exposed to the bad words.
;; Again executing hidelist, you'll need to cut'n'paste include the
;; badwords.bin in the badwords definition
hidelist: has [base words] [
base: system/options/binary-base system/options/binary-base: 64
words: load %badwords.raw
while [not tail? words] [
change words to binary! first words words: next words
]
save %badwords.bin compress mold head words
system/options/binary-base: base
print "Now cut'n'paste %badwords.bin to badwords: in source code"
]
badwords: load decompress 64#{
eJx9l9tuqzgUhu/nKUaaFwDTVOWiFztqgaQNUqNdCB7NBYatuAmmKM2JjObd9zIB
r2VSzUUkf7/Py+tA/r6/++vfOJq5/O3x8b8/O0rX5/IHkmyQsp/vbt9KJ1KgvqIr
ZKt5W7B3pAvtiy/Yek25m7F9VaxpP6+TTc6S1lJ2OXsmY/LQ/7JXLZk8itDs+hE4
GVsPtPlxyuqFoWeP9lkjVfAlwrnkp0eqRAuLYpmxL1QuZRi0HK2hEll4yyZLz00Z
bY263dF9KmvX81Gw5QSpZM0xQ6sovxHsLAtyKv9IbgQk2PxCyf+wR4NyM+NmXD2v
wJYOVZINXy2loYOot+vRC9WJU4auS+5zmW5yXEOTF49fmgUeGcNkI5SxKatOGb4u
OxGPzNi5+V/Ct2ITl/b59EyaWvdSpvGOr2ZUZRxe991SZJWlb5biukLFn7fKaK1v
bn1ViTcbZS+NssOWWCUHsI69xq7w9AuQNe76lpfsTetAbq8pPDej03iVFGHV+w1/
CoiFeFiNybIDKF88fECCmH82dLbm+vVtpGuVs+pgE6ezDuBjk0JVHxBRrVGPhfEz
HoEVTMzztJG5l3xlq2XVKx9TdxYu1nk7rWbR8DJ8s9RePJAKamzBGRy02lXxHUP2
PKBKiTA4YbwMqu1XWv21IjMdzINcwduT+doTzG07sqyiEmpZReOQQ0zTPiDsq615
t4RvZ8X2lax37xTzbj3x0WqgePFnvlruhbeEqJiPLaJHeSWMeLMVEs0662PLmruz
TrizzrODSjBapYRXNXSHrQX6BAsmIvRJFod3l2UU956Uh8kFPQ2qjywUIeLXuV7F
5K5cQUwxX3HsV9VFYL+VBXM2IVm/I6gleKZe2VKib3NVoHKw/cRWdCRdFfEUNy9s
yF2aytDQz6lamN00iRPSm1uGSQv1zsH7ifT5qm7WY2U0TlfJpH1N57I0dhxUOrtX
YPbLejxbq4vTd2qe+ofv1cIeH92e7GHP1YMhn1oAiFgg9R3wg9aOEFsdcjhkbVCH
mLpSoLgVA4OqI4Os16ljGu250XnJrN6R6VPBfY63U1XNL3FLaflJCFfpKLD6gotF
95RwP/8DW7PQ8jVQ4PTEgztF6Yq3tRWwi1HqZJ8pqN2mkoPiFIxWp+Jp2grjH0UY
eJSSQ75aWBSP8k8RShe/4YoQbmRZGJR9YSqjbqEfdcQ41Jmp6ffpaaJ5Q9aO5kfa
l1zw2/hKW0rGLlD39pA9PCSR0jxbtJPjq5r1Xl9AREP+ov0sbgULHELeUBd7cjEP
QU2VLydKmEGBzpSkpLsAsZLkv15pxK2iClQqiEbyBQ9KQ/enGfwbCpfUwp3ybtGQ
I3uaUOLueC7NoZ0SLa0bnV26v2vRRGL+LLypzMwtyqfAgV8Vv1lKWFXoS3C2Cv1c
06816YNMVm2y9DRS8H8OkGNXc60UFuFpO4L9yYl6xfb+QR2tE12/ayzFg3+CVGHB
gZwlKujuUSlHZFm+U6LhXxCQK2rSB0Rfpdw8O9yby4GCulBDLitVJUnf5ccJY06f
EP13dF5NaAVNJh71/8vM5A9N+AUA3xfwr9Gyl1b0f78//vkN4gHQ8mQPAAA}
;; Replacement characters
comicbook: "!-#$%^&*.~*&^%$#-!.~!-#$%^&*.~*&^%$#-!"
;; ensure words are shorter to longer
sort/compare badwords func [a b] [sign? subtract length? a length? b]
;; non terminators
nont: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
;; terminators
terms: complement nont
;; big block of rules
rules: copy []
reprules: copy []
;; build a replacement rule for each badword
foreach word badwords [
insert tail reprules compose/deep [
mark: [ (word) [terms | end] (to paren! compose [
change mark copy/part random comicbook (length? word)
]) :mark] |
]
]
;; then last alternate is to advance past non badword
;; then a rule to skip any terminators
insert tail reprules [some nont any terms]
;; Generate the parse rule
insert tail rules compose/only [any terms some (reprules)]
;;
;; Get rid of badwords
;;
momify: func [
"Replace bad words with comic book text"
instr [string!] "String to clean - Modified"
][
parse/all instr [some [rules]]
instr
]
;; uncomment to expose momify as clean,
;; or comment to hide all the words and then the usage is
;; a: do %momify.r a/momify "..."
;; that is the only way to get at unhidelist and hidelist
set 'clean :momify
]
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted