Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: parsing html : is this correct ?

From: rotenca:telvia:it at: 7-Jun-2002 2:35

Hi Anton,
> Well done, you have discovered a bug in 'parse, > I think. (It could also be 'remove ?). > > html: {<script ------------------></script><script>I should be > removed</script>} > html2: {<script -----------x-------></script><script>I should be > removed</script>} > > html rule: [ > any [ > (print "~~~ any block ~~~") > to "<script" mark1: (?? mark1) > thru "/script>" mark2: ( > ?? mark2 > remove/part mark1 mark2 > ?? mark1 > ) > :mark1 > (?? mark1) > ] to end > ] > > parse/all html rule > prin "^/" > parse/all html2 rule > prin "^/" > > ?? html > ?? html2 > > halt
The problem is given by the interaction of remove with parse, but it is not a bug. At every match, parse remember the position at which the parsing process arrived, in your example this position is exactly mark2. When you remove at least 1 + length? mark2 chars, starting from mark1, you put mark2 (and the internal parse position index) beyond the end of the string, like happens in this simulation: mark1: "123" mark2: next mark1 remove/part mark1 1 + (length? mark2) mark2 == ** Script Error: Out of range or past end When parse restarts, it check its internal index position and sees it is beyond the end of the string, so it stops and does not execute your :mark1 command. You must be sure that the position of parse is not beyond the end of the string. You can do something like this to fix the problem: html rule: [ any [ (print "~~~ any block ~~~") to "<script" mark1: (?? mark1) thru "/script>" mark2: :mark1 ;go back to mark1 before removing chars ( ?? mark2 remove/part mark1 mark2 ?? mark1 ) (?? mark1) ] to end ] In our example we put the parse internal position index to the mark1 position before removing chars and not after. So we can be sure not to invalidate the internal position index of parse. --- Ciao Romano