[REBOL] Re: using parse to remove tags from html string
From: mgh520:yah:oo at: 19-Sep-2001 14:36
I picked this up again the other day, determined to solve it. It turns out I was really
close and debugging this has been helpful for understanding parsing, so I thought
I'd post my results. So here's a pretty simple and concise way to remove all tags from
a string (such as removing all html tags):
parse test [any [to "<" begin: thru ">" ending: (remove/part begin ending) :begin]]
The part I was missing before was the :begin at the end. What this does is still a bit
hazy. Here is a quote from an example in the /core users guide (search for
to find it):
Notice the :mark word used above. It sets the input to a new position. The insert function
returns the new position just past the insert of the current time. The word
mark is used to set the input to that position.
So as I understand it, :begin in my example resets the position in begin, so that the
next parse will find the first instance of '<'. Without :begin, it will not see that
'<' and will end up leaving the name tag, <Name>, in the string.
Here's a good way to display what is actually happening with this code:
parse test [any [to "<" begin: thru ">" ending: (print ["begin..." begin] print ["ending..."
ending] remove/part begin ending) :begin]]
This will show you what begin and ending are set to each pass through the string.
the play by play:
1. to "<" -- moves to the first instance of the '<' character in the string
2. begin: -- sets begin to the series starting at that position
3. thru ">" -- moves to the first character *after* the next '>' character
4. ending: -- sets ending to the series starting at that position
5. (remove/part begin ending) -- removes from series 'begin everything up until start
of 'ending. Since begin is pointing at the actual 'test series, when we do the
remove we are modifying test
6. :begin -- moves the insert position (I attempted to explain this above. If anyone
can better explain it, I'd be grateful).
and that's it! I know it's not much, but I was so excited when I finally got this to
work! Thanks for everyone's help, this is a great list.