[REBOL] Re: Cleaning up swear words
From: btiffin:rogers at: 8-Jul-2007 14:09
Hello again,
Hope everyone got 'lucky' for the big 07.
Anyway, thanks to Tom the code has changed so that it won't clean brass
anymore. Now a new quandry. What to do with two bad words with no spacing.
Is there a fairly fast way of checking for badword [terms | end | ... any
other words] or should it just be left to clean brass (perhaps removing that
badword as potentially not so bad and it is a common subword?
Hmm...this gets a little six of one half dozen of the other...these word
games. Who invented English? Did they not know anything about computers?
Cheers,
Brian
On Saturday 07 July 2007 04:05, Tom wrote:
> using parse instead of parse/all will get word boundaries
> > For instance :) ... Just noticed a bug. The code cleans brass but
> > leaves associate alone. Need a parse trick to advance to word boundaries
> > and not just skip through the text.
> >
> > Cheers again.
REBOL [
Title: "Momify bad words"
Author: "Brian Tiffin"
Date: 06-Jul-2007
File: %momify.r
Purpose: "Translate bad words to cartoon speak"
Version: 0.9.0
Comment: {Usage: do %momify.r clean "..."}
]
;; protect the global name space
;; do it an object or use global: do %momify
context [
;; It's a fairly incomplete list of bad words. 244 entries.
;; found at http://www.mattfacer.com/swear-filter/
;; If you want to add to the list:
;; evaluate unhidelist, make your edits, evaluate hidelist,
;; cut'n' paste badwords.bin to badwords definition
;; then delete badwords.bin and badwords.raw
;; or for more protection use shred -zu on at least badwords.raw files
unhidelist: does [
if any [not exists? %badwords.raw confirm "Overwrite badwords? "] [
write/lines %badwords.raw mold to string! first badwords
foreach word next badwords [
write/lines/append %badwords.raw mold to string! word
]
]
print "cover your eyes and edit %badwords.raw"
]
;; hidelist creates a list of words in binary so if REBOL
;; errors out no one will be exposed to the bad words.
;; Again executing hidelist, you'll need to cut'n'paste include the
;; badwords.bin in the badwords definition
hidelist: has [base words] [
base: system/options/binary-base system/options/binary-base: 64
words: load %badwords.raw
while [not tail? words] [
change words to binary! first words words: next words
]
save %badwords.bin compress mold head words
system/options/binary-base: base
print "Now cut'n'paste %badwords.bin to badwords: in source code"
]
badwords: load decompress 64#{
eJx9l9tuqzgUhu/nKUaaFwDTVOWiFztqgaQNUqNdCB7NBYatuAmmKM2JjObd9zIB
r2VSzUUkf7/Py+tA/r6/++vfOJq5/O3x8b8/O0rX5/IHkmyQsp/vbt9KJ1KgvqIr
ZKt5W7B3pAvtiy/Yek25m7F9VaxpP6+TTc6S1lJ2OXsmY/LQ/7JXLZk8itDs+hE4
GVsPtPlxyuqFoWeP9lkjVfAlwrnkp0eqRAuLYpmxL1QuZRi0HK2hEll4yyZLz00Z
bY263dF9KmvX81Gw5QSpZM0xQ6sovxHsLAtyKv9IbgQk2PxCyf+wR4NyM+NmXD2v
wJYOVZINXy2loYOot+vRC9WJU4auS+5zmW5yXEOTF49fmgUeGcNkI5SxKatOGb4u
OxGPzNi5+V/Ct2ITl/b59EyaWvdSpvGOr2ZUZRxe991SZJWlb5biukLFn7fKaK1v
bn1ViTcbZS+NssOWWCUHsI69xq7w9AuQNe76lpfsTetAbq8pPDej03iVFGHV+w1/
CoiFeFiNybIDKF88fECCmH82dLbm+vVtpGuVs+pgE6ezDuBjk0JVHxBRrVGPhfEz
HoEVTMzztJG5l3xlq2XVKx9TdxYu1nk7rWbR8DJ8s9RePJAKamzBGRy02lXxHUP2
PKBKiTA4YbwMqu1XWv21IjMdzINcwduT+doTzG07sqyiEmpZReOQQ0zTPiDsq615
t4RvZ8X2lax37xTzbj3x0WqgePFnvlruhbeEqJiPLaJHeSWMeLMVEs0662PLmruz
TrizzrODSjBapYRXNXSHrQX6BAsmIvRJFod3l2UU956Uh8kFPQ2qjywUIeLXuV7F
5K5cQUwxX3HsV9VFYL+VBXM2IVm/I6gleKZe2VKib3NVoHKw/cRWdCRdFfEUNy9s
yF2aytDQz6lamN00iRPSm1uGSQv1zsH7ifT5qm7WY2U0TlfJpH1N57I0dhxUOrtX
YPbLejxbq4vTd2qe+ofv1cIeH92e7GHP1YMhn1oAiFgg9R3wg9aOEFsdcjhkbVCH
mLpSoLgVA4OqI4Os16ljGu250XnJrN6R6VPBfY63U1XNL3FLaflJCFfpKLD6gotF
95RwP/8DW7PQ8jVQ4PTEgztF6Yq3tRWwi1HqZJ8pqN2mkoPiFIxWp+Jp2grjH0UY
eJSSQ75aWBSP8k8RShe/4YoQbmRZGJR9YSqjbqEfdcQ41Jmp6ffpaaJ5Q9aO5kfa
l1zw2/hKW0rGLlD39pA9PCSR0jxbtJPjq5r1Xl9AREP+ov0sbgULHELeUBd7cjEP
QU2VLydKmEGBzpSkpLsAsZLkv15pxK2iClQqiEbyBQ9KQ/enGfwbCpfUwp3ybtGQ
I3uaUOLueC7NoZ0SLa0bnV26v2vRRGL+LLypzMwtyqfAgV8Vv1lKWFXoS3C2Cv1c
06816YNMVm2y9DRS8H8OkGNXc60UFuFpO4L9yYl6xfb+QR2tE12/ayzFg3+CVGHB
gZwlKujuUSlHZFm+U6LhXxCQK2rSB0Rfpdw8O9yby4GCulBDLitVJUnf5ccJY06f
EP13dF5NaAVNJh71/8vM5A9N+AUA3xfwr9Gyl1b0f78//vkN4gHQ8mQPAAA}
;; Replacement characters
comicbook: "!-#$%^&*.~*&^%$#-!.~!-#$%^&*.~*&^%$#-!"
;; ensure words are shorter to longer
sort/compare badwords func [a b] [sign? subtract length? a length? b]
;; non terminators
nont: charset [#"a" - #"z" #"A" - #"Z" #"0" - #"9"]
;; terminators
terms: complement nont
;; big block of rules
rules: copy []
reprules: copy []
;; build a replacement rule for each badword
foreach word badwords [
insert tail reprules compose/deep [
mark: [ (word) [terms | end] (to paren! compose [
change mark copy/part random comicbook (length? word)
]) :mark] |
]
]
;; then last alternate is to advance past non badword
;; then a rule to skip any terminators
insert tail reprules [some nont any terms]
;; Generate the parse rule
insert tail rules compose/only [any terms some (reprules)]
;;
;; Get rid of badwords
;;
momify: func [
"Replace bad words with comic book text"
instr [string!] "String to clean - Modified"
][
parse/all instr [some [rules]]
instr
]
;; uncomment to expose momify as clean,
;; or comment to hide all the words and then the usage is
;; a: do %momify.r a/momify "..."
;; that is the only way to get at unhidelist and hidelist
set 'clean :momify
]