[REBOL] Re: strip tags
From: hijim:pronet at: 4-Nov-2001 20:32
Thanks to all who gave easy ways to strip html tags. The code below seems to work fine
with my own html
files. I can load the web page source with
my-file: read to-url http-field/text
my-area/text: my-file
Then I can remove the tags and extra spaces and newlines with
replace/all my-area/text "<a href" "*** " ; retain links
replace/all my-area/text "</" "<"
replace/all my-area/text "<p>" "^/"
replace/all my-area/text "<h1>" "^/^/"
replace/all my-area/text "<h2>" "^/^/"
replace/all my-area/text "<h3>" "^/^/"
replace/all my-area/text "<h4>" "^/^/"
replace/all my-area/text "<li>" "* "
replace/all my-area/text "<hr>" "^/----------------------------------^/"
parse my-area/text
[any [to "<" begin: thru ">" ending: (remove/part begin ending) :begin]]
loop 5 [replace/all my-area/text " " " "]
replace/all my-area/text " " " "
loop 20 [replace/all my-area/text "^/^/^/" "^/^/"]
Jim
Mike wrote: