Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: using parse to remove tags from html string

From: mgh520::yahoo at: 19-Sep-2001 14:36

I picked this up again the other day, determined to solve it. It turns out I was really close and debugging this has been helpful for understanding parsing, so I thought I'd post my results. So here's a pretty simple and concise way to remove all tags from a string (such as removing all html tags): parse test [any [to "<" begin: thru ">" ending: (remove/part begin ending) :begin]] The part I was missing before was the :begin at the end. What this does is still a bit hazy. Here is a quote from an example in the /core users guide (search for :mark to find it): Notice the :mark word used above. It sets the input to a new position. The insert function returns the new position just past the insert of the current time. The word mark is used to set the input to that position. So as I understand it, :begin in my example resets the position in begin, so that the next parse will find the first instance of '<'. Without :begin, it will not see that first '<' and will end up leaving the name tag, <Name>, in the string. Here's a good way to display what is actually happening with this code: parse test [any [to "<" begin: thru ">" ending: (print ["begin..." begin] print ["ending..." ending] remove/part begin ending) :begin]] This will show you what begin and ending are set to each pass through the string. the play by play: 1. to "<" -- moves to the first instance of the '<' character in the string 2. begin: -- sets begin to the series starting at that position 3. thru ">" -- moves to the first character *after* the next '>' character 4. ending: -- sets ending to the series starting at that position 5. (remove/part begin ending) -- removes from series 'begin everything up until start of 'ending. Since begin is pointing at the actual 'test series, when we do the remove we are modifying test 6. :begin -- moves the insert position (I attempted to explain this above. If anyone can better explain it, I'd be grateful). and that's it! I know it's not much, but I was so excited when I finally got this to work! Thanks for everyone's help, this is a great list. mike