replacing urls in html using parse function...
[1/4] from: jean::holzammer::faedv-n::bayern::de at: 2-Sep-2002 9:52
Hi,
I try to replace all href and src assignments in a html document by my own
using the parse function.
Here are my attempts (step by step) so far. Look at the one but last
function call. It seems the parser is not at the right position after
replacing the first url. I expected it to replace all (any keyword before
rule) urls at once.
I know there are lots of parsing experts on this list. Probably just some
minor changes would be necessary.
So have a look. Thanx in advance.
Jean
a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
parse/all a [any [ [[thru {href="}] | [thru {src="}]] copy text to {"}
(print text)]]
parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position2)]]
parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position1 print
position2 print "")]]
>> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] copy text to {"}
(print text)]]
http://www.ann.lu
http://bla.org
== false
>>
>>
>> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position2)]]
http://bla.org" fdfdf
== false
>>
>>
>> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
== {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position1 print
position2
print "")]]
neu" dfdfdf src="http://bla.org" fdfdf
http://bla.org" fdfdf
== false
>>
>> a
== {assas href="neu" dfdfdf src="http://bla.org" fdfdf}
>>
>>
>> parse/all a [any [ [[thru {href="}] | [thru {src="}]] position1: to {"}
position2: (change/part position1 "neu" position2 print position1 print
position2 prin
t "")]]
neu" dfdfdf src="http://bla.org" fdfdf
dfdfdf srchttp://bla.org" fdfdf
neu" fdfdf
** Script Error: Out of range or past end
** Where: halt-view
** Near: print position2 print ""
>> a
== {assas href="neu" dfdfdf src="neu" fdfdf}
[2/4] from: al:bri:xtra at: 2-Sep-2002 22:22
Here's what Jean wrote (with a lot more spacing to make it easier to see
what's going on):
[
Rebol []
a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
print "Parse 1"
parse/all a [
any [
[
[
thru {href="}
]
| [
thru {src="}
]
] copy text to {"} (
print text
)
]
]
probe a
print "Parse 2"
parse/all a [
any [
[
[
thru {href="}
]
| [
thru {src="}
]
] position1: to {"} position2: (
change/part position1 "neu" position2
print position2
)
]
]
probe a
print "Parse 3"
parse/all a [
any [
[
[
thru {href="}
]
| [
thru {src="}
]
] position1: to {"} position2: (
change/part position1 "neu" position2
print position1
print position2
print ""
)
]
]
probe a
halt
]
Jean wrote:
> It seems the parser is not at the right position after replacing the first
url.
That's sort of right. Have a look at the console printout:
Parse 1
http://www.ann.lu
http://bla.org
{assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
; Here's the second one coming up:
Parse 2
http://bla.org" fdfdf
{assas href="neu" dfdfdf src="http://bla.org" fdfdf}
Note that 'position2 seems to have printed out _too_ little information. At
first glance, it should have printed:
" dfdfdf src="http://bla.org" fdfdf
Note that there 14 characters missing. Let's look at that 'change again:
change/part position1 "neu" position2
The difference in length between "neu" and "http://www.ann.lu" is 14
characters. Clearly, when the 'change takes place, the 14 characters:
p://www.ann.lu
are removed from string value referred to by 'a. So that effectively
moves the place that 'position2 is refering to by 14 characters further
along.
Parse 3
neu" dfdfdf src="http://bla.org" fdfdf
dfdfdf srchttp://bla.org" fdfdf
neu" fdfdf
** Script Error: Out of range or past end
** Where: do-boot
** Near: print position2
print ""
Now when the value "http://bla.org" is replaced with "neu" in the third
parse rule, the string is made another 11 characters shorter, so 'position2
falls off the end of the string, and you get the error message:
** Script Error: Out of range or past end
To avoid this problem, readjust the position of 'position2. One can get the
new position as a result of the 'change function:
>> help change
USAGE:
CHANGE series value /part range /only /dup count
DESCRIPTION:
Changes a value in a series and returns the series after the change.
So your parse action (inside the paren!) will look something like:
position2: change/part position1
neu" position2
and the parse rule (outside the paren!) will look something like:
] position1: to {"} position2: (
; blah blah blah...
) :position2 ; Here's where the position is reset.
I hope that helps!
Andrew Martin
ICQ: 26227169 http://valley.150m.com/
[3/4] from: gscottjones:mchsi at: 2-Sep-2002 5:58
From: "Holzammer, Jean"
<snip>
> a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
<<quoted lines omitted: 6>>
> position2 print "")]]
<snip>
Hi, Jean,
Andrew's response came in just as I was finishing my hack. As he points
out, there are hazards in changing the string (or block) which is being
parsed: it throws off the index, in short (my words). Before I understood
what exactly what is going on with this type of (logic) error, I got into
the habit of creating new strongs (or blocks) out of the old ones. It
somehow lacks the elegance, but is easier for me to see what I am doing. So
here is a different way to accomplish the same thing:
a: {assas href="http://www.ann.lu" dfdfdf src="http://bla.org" fdfdf}
new-html: copy ""
parse/all a [
any [
copy blob
[[thru {href="}] | [thru {src="}]]
(
new-html: join new-html blob
new-html: join new-html "neu"
)
to {"}
]
copy blob
to end
(new-html: join new-html blob)
(print new-html)
]
--Scott Jones
[4/4] from: jean:holzammer:faedv-n:bayern at: 5-Sep-2002 9:33
Hi Andrew, hi Scott. Your suggestions work for me. Thanks, especially for
also explaining not only the how-to but also the why !
Jean
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted