Parse doing weird things...
[1/8] from: mat:eurogamer at: 19-Mar-2001 15:20
Heya,
While I'm at it;
I've noticed this issue repeatedly in my data mining scripts. The
parse command sometimes just will not work with the following format;
webpage: read http://www.yaddayadda.com
datamine: make string! ""
parse webpage [thru "something" copy datamine to "somethingelse"]
In the vast bulk of cases it works but right now I'm sitting in my
Rebol console with a bag load of HTML fragment in a variable and parse
turns up false every time. Now the rules DO match the content in the
html!
Specifically;
parse tmp/content [thru "Etymology:" copy DictEnty to "<br>"]
Here's everything I've tried that I can remember.
tmp/content comes from HTTP-TOOLS and I've tried loading it into
another variable, no luck.
Etymology: *does* exist. If I put 'to end' instead of the <br> tag
then it parses true. Find also turns up Etymology as well.
It doesn't appear to matter WHAT I put in the quotes at the end. IE I
can hunt for a couple of plain letters that are in the string directly
following Etymology. It doesn't appear to make any difference that
this is a tag.
Finally, all of this worked before and now it doesn't. In the past to
cure this random parse blowout, I've had to cut down on the data I'm
trying to parse.
All of the above leads me to think there is some sort of restriction
to how much data can be parsed before some sort of
feature/limitation/bug is run into.
Anyone seen this before? Comments?
--
Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee
http://www.eurogamer.net | http://www.eurogamer-network.com
[2/8] from: mat:eurogamer at: 19-Mar-2001 15:26
Heya Mat,
MB> All of the above leads me to think there is some sort of restriction
MB> to how much data can be parsed before some sort of
MB> feature/limitation/bug is run into.
So I tried to fix this myself;
>> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:" copy DictEnty
to "<br>"]
== false
The above line should limit the parsing to a 256 character length
string starting from exactly where we were going to 'thru' to anyhow.
It *still* doesn't work.
I mean look for yourself!
>> print copy/part find tmp/content "Etymology:" 256
Etymology: Portuguese<i>peão</i>&French<i>pion,</i>fromMedievalLatin<i>pedon-,pedo</i>footsoldier--moreat<ahref="dicti
onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</font></a><br>Date: 1609<br><b>1</b><b>:</b>anyofvariousworkersinIndia,SriLa
>>
There's the Etymology: and there's the goddamn <br>! Why the hell
doesn't it work?! Argh!
--
Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee
http://www.eurogamer.net | http://www.eurogamer-network.com
[3/8] from: petr:krenzelok:trz:cz at: 19-Mar-2001 17:23
----- Original Message -----
From: "Mat Bettinson" <[mat--eurogamer--net]>
To: <[rebol-list--rebol--com]>
Sent: Monday, March 19, 2001 4:26 PM
Subject: [REBOL] Re: Parse doing weird things...
> Heya Mat,
>
> MB> All of the above leads me to think there is some sort of restriction
> MB> to how much data can be parsed before some sort of
> MB> feature/limitation/bug is run into.
>
> So I tried to fix this myself;
>
> >> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:"
copy DictEnty to "<br>"]
> == false
> The above line should limit the parsing to a 256 character length
<<quoted lines omitted: 3>>
> >> print copy/part find tmp/content "Etymology:" 256
> Etymology:
Portuguese<i>peão</i>&French<i>pion,</i>fromMedievalLatin<i>pedon
-,pedo</i>footsoldier--moreat<ahref="dicti
> onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</font></a><br>Date:
1609<br><b>1</b><b>:</b>anyofvariousworkersinIndia,SriLa
> >>
>
> There's the Etymology: and there's the goddamn <br>! Why the hell
> doesn't it work?! Argh!
seems to work here....
->> parse str [thru "Etymology:" copy DictEntry to "<br>" to end]
== true
->> print dictentry
Portuguese<i>peão</i>&French<i>pion,</i>fromMedievalLatin<i>pedon
-,pedo</i>footsoldier--moreat<ahref="dicti
onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</font></a>
->>
maybe you could try to use parse/all to take speces into account ....
Cheers,
-pekr-
[4/8] from: ptretter:norcom2000 at: 19-Mar-2001 10:19
instead of using make string! "" try make string! 0
Paul Tretter
----- Original Message -----
From: "Mat Bettinson" <[mat--eurogamer--net]>
To: <[rebol-list--rebol--com]>
Sent: Monday, March 19, 2001 9:26 AM
Subject: [REBOL] Re: Parse doing weird things...
> Heya Mat,
>
> MB> All of the above leads me to think there is some sort of restriction
> MB> to how much data can be parsed before some sort of
> MB> feature/limitation/bug is run into.
>
> So I tried to fix this myself;
>
> >> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:"
copy DictEnty to "<br>"]
> == false
> The above line should limit the parsing to a 256 character length
<<quoted lines omitted: 3>>
> >> print copy/part find tmp/content "Etymology:" 256
> Etymology:
Portuguese<i>peão</i>&French<i>pion,</i>fromMedievalLatin<i>pedon
-,pedo</i>footsoldier--moreat<ahref="dicti
[5/8] from: jelinem1:nationwide at: 19-Mar-2001 10:34
If the problem really is that you are looking for a TRUE return from
'parse, then the explaination is simple. 'parse will return TRUE only if it
can successfully match the entire string according to the given rules.
Placing "to end" as your final rule should give you the result you desire.
Even though 'parse is returning FALSE, your word should be set to the
substring between "something" and "somethingelse". What I do in your case
is to set 'datamine to NONE previous to the 'parse, then check its value
afterward to determine whether it was set to a string!.
- Michael Jelinek
Mat Bettinson <[mat--eurogamer--net]>@rebol.com on 03/19/2001 09:20:49 AM
From: Mat Bettinson <[mat--eurogamer--net]>@rebol.com on 03/19/2001 09:20 AM
Please respond to [rebol-list--rebol--com]
Sent by: [rebol-bounce--rebol--com]
To: Mat Bettinson <[rebol-list--rebol--com]>
cc:
Subject: [REBOL] Parse doing weird things...
Heya,
While I'm at it;
I've noticed this issue repeatedly in my data mining scripts. The
parse command sometimes just will not work with the following format;
webpage: read http://www.yaddayadda.com
datamine: make string! ""
parse webpage [thru "something" copy datamine to "somethingelse"]
In the vast bulk of cases it works but right now I'm sitting in my
Rebol console with a bag load of HTML fragment in a variable and parse
turns up false every time. Now the rules DO match the content in the
html!
Specifically;
parse tmp/content [thru "Etymology:" copy DictEnty to "<br>"]
Here's everything I've tried that I can remember.
tmp/content comes from HTTP-TOOLS and I've tried loading it into
another variable, no luck.
Etymology: *does* exist. If I put 'to end' instead of the <br> tag
then it parses true. Find also turns up Etymology as well.
It doesn't appear to matter WHAT I put in the quotes at the end. IE I
can hunt for a couple of plain letters that are in the string directly
following Etymology. It doesn't appear to make any difference that
this is a tag.
Finally, all of this worked before and now it doesn't. In the past to
cure this random parse blowout, I've had to cut down on the data I'm
trying to parse.
All of the above leads me to think there is some sort of restriction
to how much data can be parsed before some sort of
feature/limitation/bug is run into.
Anyone seen this before? Comments?
--
Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee
http://www.eurogamer.net | http://www.eurogamer-network.com
[6/8] from: mat:eurogamer at: 19-Mar-2001 17:07
Heya jelinem1,
jnc> If the problem really is that you are looking for a TRUE return from
jnc> 'parse, then the explaination is simple. 'parse will return TRUE only if it
jnc> can successfully match the entire string according to the given rules.
jnc> Placing "to end" as your final rule should give you the result you desire.
Hmm OK, but it always appeared to return true if it worked. IE there's
the correct value loaded into the var. When it returns false, I
generally get nothing.
--
Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee
http://www.eurogamer.net | http://www.eurogamer-network.com
[7/8] from: mat:eurogamer at: 19-Mar-2001 17:07
Heya Paul,
PT> instead of using make string! "" try make string! 0
Result is the same.
--
Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee
http://www.eurogamer.net | http://www.eurogamer-network.com
[8/8] from: ingo:2b1 at: 20-Mar-2001 0:15
Hi Mat, Petr,
have you noticed?
Mats rule:
> > >> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:"
> copy DictEnty to "<br>"]
> > == false
Petrs rule:
> ->> parse str [thru "Etymology:" copy DictEntry to "<br>" to end]
> == true
Petr silently included the "to end" at the end, in the original rule,
when <br> is found, and the rule is worked thruogh, there are still
characters in the string to be parsed, so parse thinks "Nah, that can't
be right". Include "to end" to always get to the end of the string,
and you are done.
(Why did it work berfore? Don't know, maybe the "<br>" tag was at the
end of the string?)
kind regards,
Ingo
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted