Parse doing weird things...

[1/8] from: mat:eurogamer at: 19-Mar-2001 15:20

Heya, While I'm at it; I've noticed this issue repeatedly in my data mining scripts. The parse command sometimes just will not work with the following format; webpage: read http://www.yaddayadda.com datamine: make string! "" parse webpage [thru "something" copy datamine to "somethingelse"] In the vast bulk of cases it works but right now I'm sitting in my Rebol console with a bag load of HTML fragment in a variable and parse turns up false every time. Now the rules DO match the content in the html! Specifically; parse tmp/content [thru "Etymology:" copy DictEnty to " "] Here's everything I've tried that I can remember. tmp/content comes from HTTP-TOOLS and I've tried loading it into another variable, no luck. Etymology: *does* exist. If I put 'to end' instead of the tag then it parses true. Find also turns up Etymology as well. It doesn't appear to matter WHAT I put in the quotes at the end. IE I can hunt for a couple of plain letters that are in the string directly following Etymology. It doesn't appear to make any difference that this is a tag. Finally, all of this worked before and now it doesn't. In the past to cure this random parse blowout, I've had to cut down on the data I'm trying to parse. All of the above leads me to think there is some sort of restriction to how much data can be parsed before some sort of feature/limitation/bug is run into. Anyone seen this before? Comments? -- Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee http://www.eurogamer.net | http://www.eurogamer-network.com

[2/8] from: mat:eurogamer at: 19-Mar-2001 15:26

Heya Mat, MB> All of the above leads me to think there is some sort of restriction MB> to how much data can be parsed before some sort of MB> feature/limitation/bug is run into. So I tried to fix this myself;

>> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:" copy DictEnty to " "]

== false The above line should limit the parsing to a 256 character length string starting from exactly where we were going to 'thru' to anyhow. It *still* doesn't work. I mean look for yourself!

>> print copy/part find tmp/content "Etymology:" 256

Etymology: Portuguesepeão&Frenchpion,fromMedievalLatinpedon-,pedofootsoldier--moreat<ahref="dicti onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</a> Date: 1609 1:anyofvariousworkersinIndia,SriLa

There's the Etymology: and there's the goddamn ! Why the hell doesn't it work?! Argh! -- Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee http://www.eurogamer.net | http://www.eurogamer-network.com

[3/8] from: petr:krenzelok:trz:cz at: 19-Mar-2001 17:23

----- Original Message ----- From: "Mat Bettinson" <[mat--eurogamer--net]> To: <[rebol-list--rebol--com]> Sent: Monday, March 19, 2001 4:26 PM Subject: [REBOL] Re: Parse doing weird things...

> Heya Mat, > > MB> All of the above leads me to think there is some sort of restriction > MB> to how much data can be parsed before some sort of > MB> feature/limitation/bug is run into. > > So I tried to fix this myself; > > >> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:"

copy DictEnty to " "]

> == false > The above line should limit the parsing to a 256 character length

<<quoted lines omitted: 3>>

> >> print copy/part find tmp/content "Etymology:" 256 > Etymology:

Portuguesepeão&Frenchpion,fromMedievalLatinpedon -,pedofootsoldier--moreat<ahref="dicti

> onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</a> Date:

1609 1:anyofvariousworkersinIndia,SriLa

> >> > > There's the Etymology: and there's the goddamn ! Why the hell > doesn't it work?! Argh!

seems to work here.... ->> parse str [thru "Etymology:" copy DictEntry to " " to end] == true ->> print dictentry Portuguesepeão&Frenchpion,fromMedievalLatinpedon -,pedofootsoldier--moreat<ahref="dicti onary?book=Dictionary&va=pawn"><fontsize="-1">PAWN</a> ->> maybe you could try to use parse/all to take speces into account .... Cheers, -pekr-

[4/8] from: ptretter:norcom2000 at: 19-Mar-2001 10:19

instead of using make string! "" try make string! 0 Paul Tretter ----- Original Message ----- From: "Mat Bettinson" <[mat--eurogamer--net]> To: <[rebol-list--rebol--com]> Sent: Monday, March 19, 2001 9:26 AM Subject: [REBOL] Re: Parse doing weird things...

copy DictEnty to " "]

> == false > The above line should limit the parsing to a 256 character length

<<quoted lines omitted: 3>>

> >> print copy/part find tmp/content "Etymology:" 256 > Etymology:

Portuguesepeão&Frenchpion,fromMedievalLatinpedon -,pedofootsoldier--moreat<ahref="dicti

[5/8] from: jelinem1:nationwide at: 19-Mar-2001 10:34

If the problem really is that you are looking for a TRUE return from 'parse, then the explaination is simple. 'parse will return TRUE only if it can successfully match the entire string according to the given rules. Placing "to end" as your final rule should give you the result you desire. Even though 'parse is returning FALSE, your word should be set to the substring between "something" and "somethingelse". What I do in your case is to set 'datamine to NONE previous to the 'parse, then check its value afterward to determine whether it was set to a string!. - Michael Jelinek Mat Bettinson <[mat--eurogamer--net]>@rebol.com on 03/19/2001 09:20:49 AM From: Mat Bettinson <[mat--eurogamer--net]>@rebol.com on 03/19/2001 09:20 AM Please respond to [rebol-list--rebol--com] Sent by: [rebol-bounce--rebol--com] To: Mat Bettinson <[rebol-list--rebol--com]> cc: Subject: [REBOL] Parse doing weird things... Heya, While I'm at it; I've noticed this issue repeatedly in my data mining scripts. The parse command sometimes just will not work with the following format; webpage: read http://www.yaddayadda.com datamine: make string! "" parse webpage [thru "something" copy datamine to "somethingelse"] In the vast bulk of cases it works but right now I'm sitting in my Rebol console with a bag load of HTML fragment in a variable and parse turns up false every time. Now the rules DO match the content in the html! Specifically; parse tmp/content [thru "Etymology:" copy DictEnty to " "] Here's everything I've tried that I can remember. tmp/content comes from HTTP-TOOLS and I've tried loading it into another variable, no luck. Etymology: *does* exist. If I put 'to end' instead of the tag then it parses true. Find also turns up Etymology as well. It doesn't appear to matter WHAT I put in the quotes at the end. IE I can hunt for a couple of plain letters that are in the string directly following Etymology. It doesn't appear to make any difference that this is a tag. Finally, all of this worked before and now it doesn't. In the past to cure this random parse blowout, I've had to cut down on the data I'm trying to parse. All of the above leads me to think there is some sort of restriction to how much data can be parsed before some sort of feature/limitation/bug is run into. Anyone seen this before? Comments? -- Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee http://www.eurogamer.net | http://www.eurogamer-network.com

[6/8] from: mat:eurogamer at: 19-Mar-2001 17:07

Heya jelinem1, jnc> If the problem really is that you are looking for a TRUE return from jnc> 'parse, then the explaination is simple. 'parse will return TRUE only if it jnc> can successfully match the entire string according to the given rules. jnc> Placing "to end" as your final rule should give you the result you desire. Hmm OK, but it always appeared to return true if it worked. IE there's the correct value loaded into the var. When it returns false, I generally get nothing. -- Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee http://www.eurogamer.net | http://www.eurogamer-network.com

[7/8] from: mat:eurogamer at: 19-Mar-2001 17:07

Heya Paul, PT> instead of using make string! "" try make string! 0 Result is the same. -- Mat Bettinson - EuroGamer's Gaming Evangelist with a Goatee http://www.eurogamer.net | http://www.eurogamer-network.com

[8/8] from: ingo:2b1 at: 20-Mar-2001 0:15

Hi Mat, Petr, have you noticed? Mats rule:

> > >> parse copy/part find tmp/content "Etymology:" 256 [thru "Etymology:" > copy DictEnty to " "] > > == false

Petrs rule:

> ->> parse str [thru "Etymology:" copy DictEntry to " " to end] > == true

Petr silently included the "to end" at the end, in the original rule, when is found, and the rule is worked thruogh, there are still characters in the string to be parsed, so parse thinks "Nah, that can't be right". Include "to end" to always get to the end of the string, and you are done. (Why did it work berfore? Don't know, maybe the " " tag was at the end of the string?) kind regards, Ingo

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted