Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

replacing urls in html using parse function...

 [1/4] from: jean::holzammer::faedv-n::bayern::de at: 2-Sep-2002 9:52


Hi, I try to replace all href and src assignments in a html document by my own using the parse function. Here are my attempts (step by step) so far. Look at the one but last function call. It seems the parser is not at the right position after replacing the first url. I expected it to replace all (any keyword before rule) urls at once. I know there are lots of parsing experts on this list. Probably just some minor changes would be necessary. So have a look. Thanx in advance. Jean a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf} parse/all a [any [ [[thru {href="}] | [thru {src="}]] copy text to {"} (print text)]] parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"} position2: (change/part position1 "neu" position2 print position2)]] parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"} position2: (change/part position1 "neu" position2 print position1 print position2 print "")]]
>> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] copy text to {"}
(print text)]] http://www.ann.lu http://bla.org == false
>> >> >> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position2)]] http://bla.org" fdfdf == false
>> >> >> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position1 print position2 print "")]] neu" dfdfdf src="http://bla.org" fdfdf http://bla.org" fdfdf == false
>> >> a
== {assas href="neu" dfdfdf src="http://bla.org" fdfdf}
>> >> >> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position1 print position2 prin t "")]] neu" dfdfdf src="http://bla.org" fdfdf dfdfdf srchttp://bla.org" fdfdf neu" fdfdf ** Script Error: Out of range or past end ** Where: halt-view ** Near: print position2 print ""
>> a
== {assas href="neu" dfdfdf src="neu" fdfdf}

 [2/4] from: al:bri:xtra at: 2-Sep-2002 22:22


Here's what Jean wrote (with a lot more spacing to make it easier to see what's going on): [ Rebol [] a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf} print "Parse 1" parse/all a [ any [ [ [ thru {href="} ] | [ thru {src="} ] ] copy text to {"} ( print text ) ] ] probe a print "Parse 2" parse/all a [ any [ [ [ thru {href="} ] | [ thru {src="} ] ] position1: to {"} position2: ( change/part position1 "neu" position2 print position2 ) ] ] probe a print "Parse 3" parse/all a [ any [ [ [ thru {href="} ] | [ thru {src="} ] ] position1: to {"} position2: ( change/part position1 "neu" position2 print position1 print position2 print "" ) ] ] probe a halt ] Jean wrote:
> It seems the parser is not at the right position after replacing the first
url. That's sort of right. Have a look at the console printout: Parse 1 http://www.ann.lu http://bla.org {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf} ; Here's the second one coming up: Parse 2 http://bla.org" fdfdf {assas href="neu" dfdfdf src="http://bla.org" fdfdf} Note that 'position2 seems to have printed out _too_ little information. At first glance, it should have printed: " dfdfdf src="http://bla.org" fdfdf Note that there 14 characters missing. Let's look at that 'change again: change/part position1 "neu" position2 The difference in length between "neu" and "http://www.ann.lu" is 14 characters. Clearly, when the 'change takes place, the 14 characters: p://www.ann.lu are removed from string value referred to by 'a. So that effectively moves the place that 'position2 is refering to by 14 characters further along. Parse 3 neu" dfdfdf src="http://bla.org" fdfdf dfdfdf srchttp://bla.org" fdfdf neu" fdfdf ** Script Error: Out of range or past end ** Where: do-boot ** Near: print position2 print "" Now when the value "http://bla.org" is replaced with "neu" in the third parse rule, the string is made another 11 characters shorter, so 'position2 falls off the end of the string, and you get the error message: ** Script Error: Out of range or past end To avoid this problem, readjust the position of 'position2. One can get the new position as a result of the 'change function:
>> help change
USAGE: CHANGE series value /part range /only /dup count DESCRIPTION: Changes a value in a series and returns the series after the change. So your parse action (inside the paren!) will look something like: position2: change/part position1 neu" position2 and the parse rule (outside the paren!) will look something like: ] position1: to {"} position2: ( ; blah blah blah... ) :position2 ; Here's where the position is reset. I hope that helps! Andrew Martin ICQ: 26227169 http://valley.150m.com/

 [3/4] from: gscottjones:mchsi at: 2-Sep-2002 5:58


From: "Holzammer, Jean"
<snip> > a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
<<quoted lines omitted: 6>>
> position2 print "")]] <snip>
Hi, Jean, Andrew's response came in just as I was finishing my hack. As he points out, there are hazards in changing the string (or block) which is being parsed: it throws off the index, in short (my words). Before I understood what exactly what is going on with this type of (logic) error, I got into the habit of creating new strongs (or blocks) out of the old ones. It somehow lacks the elegance, but is easier for me to see what I am doing. So here is a different way to accomplish the same thing: a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf} new-html: copy "" parse/all a [ any [ copy blob [[thru {href="}] | [thru {src="}]] ( new-html: join new-html blob new-html: join new-html "neu" ) to {"} ] copy blob to end (new-html: join new-html blob) (print new-html) ] --Scott Jones

 [4/4] from: jean:holzammer:faedv-n:bayern at: 5-Sep-2002 9:33


Hi Andrew, hi Scott. Your suggestions work for me. Thanks, especially for also explaining not only the how-to but also the why ! Jean

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted