Help on parsing

[1/6] from: gchillemi:aliceposta:it at: 13-Mar-2004 16:25

Hi, I am a newbie to Rebol. I need a little help on parsing: I have these 2 possible inputs: 1) "KW1 555 KW1 333 KW2 444 " 2) "KW1 555 KW2 666 KW2 444 " I need to extract the value of KW1 and KW2 , or KW1 itself. If I parse using (1): any[[to "KW1" copy myresult to ] | [to "KW1" to "KW2" copy myresult2 to ]] (1) is parsed correctly by the first block of the rule but (2) is not (and the reason is clear) If I exchange the blocks of the rule, changing block1 to block2 and vice versa the result is still wrong because any[[to "KW1" to "KW2" copy myresult2 to ] | [to "KW1" copy myresult to ]] Will parse string (2) correctly but string (1) KW1 555 KW1 333 KW2 444 Is parsed from the value of the first KW1 to KW2 (which is not a piece of the first part !) and finally to Thank you in advance for you answers ! Giuseppe Chillemi

[2/6] from: Izkata:comcast at: 13-Mar-2004 12:47

I learned how to parse at http://www.codeconscious.com/rebol/parse-tutorial.html, so you may want to check that out. Be forewarned, it is kind of long, but there is a table of contents so you can jump around in it.

[3/6] from: tomc:darkwing:uoregon at: 13-Mar-2004 12:39

On Sat, 13 Mar 2004, Giuseppe Chillemi wrote:

> Hi, > > I am a newbie to Rebol. I need a little help on parsing: > > I have these 2 possible inputs: > > 1) "KW1 555 KW1 333 KW2 444 " > > 2) "KW1 555 KW2 666 KW2 444 " > > I need to extract the value of KW1 and KW2 , or KW1 itself.

^ value?

> If I parse using (1): >

hmmm any[ [to "KW1" copy myresult to ] | [to "KW1" to "KW2" copy myresult2 to ] ] ^^^^^^^^ this second rule can never be reached since the first rule will allways accept any string the second rule could. this solution may not work for you directly since it is keyed to your example which is apt to be a simplification of your actual problem but it may help ;;; a recursive parse rule to copy the value from an unknown number of ;;; consecutive "KW value" pairs in a string ;;; possibaly separated with rule: [ "KW" ["1 " | "2 "] ; what the parser needs to reconize copy result integer! ; may not be integer in real case (append store result) ; store/use result immediatly opt ; there might be a trailing opt rule ; there might be another KW to reconize ]

>> store: copy [] parse s1 rule store

== ["555" "333" "444"]

>> store: copy [] parse s2 rule store

== ["555" "666" "444"] where s1 & s2 are your input strings hope that helps

> (1) is parsed correctly by the first block of the rule but (2) is not (and > the reason is clear) > > If I exchange the blocks of the rule, changing block1 to block2 and vice > versa the result is still wrong because > > any[[to "KW1" to "KW2" copy myresult2 to ]

| [to "KW1" copy myresult to ] ]

[4/6] from: gchillemi:aliceposta:it at: 14-Mar-2004 9:36

> > 1) "KW1 555 KW1 333 KW2 444 " > >

<<quoted lines omitted: 4>>

> > I need to extract the value of KW1 and KW2 , or KW1 itself. > ^ value?

(Yes) The whole problem is to parse a page and its addresses with phone and fax I may have an input string formed in this way: UNUSEFULL-TEXT name-prefix-keyword(type 1 or type2) NAME address-keyword ADDRESS-DATA one or more of telephone-prefix-keyword TELEPHONE-NUM zero or more of "-" fax-prefix-keyword FAX-NUM and finally a The whole sequence repeats until the end of page is reached. So, let me extend the strings of the previous message: KW1 555 KW1 333 KW2 444 KW1 555 KW2 666 KW2 444 KW1 = "Tel.:" KW2 = "Fax." Let's use the new information. The 2 ways the address block could appear are: 1) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: NNNN NNNNNNNN " Or 2) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN " Note that the sequence of "N" mean: 3 or more numbers to undefined number of numbers If I parse the following text... --- TEXT TO PARSE --- name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: NNNN NNNNNNNN (some unusefulltext) name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN --- END TEXT TO PARSE --- .....Using a logic like this:

> ;;; a recursive parse rule to copy the value from an unknown number of > ;;; consecutive "KW value" pairs in a string

<<quoted lines omitted: 6>>

> opt rule ; there might be another KW to reconize > ]

What prevents the sequence from going to the next address block ? Searching for "FAX.:" , in any wait I think to try to search it, let the parse instruction move to the next address block where it can find a FAX keyword ! I need to find a way to tell Rebol to search for "FAX.:" before the keyword and not after. In any way I tell to myself the correct rule it is implicit that REBOL will search in the whole text and not until However, a simple solution is to split the problem in 2 problems: searching for "tel." to and than parsing the resulting string (which may have a FAX inside of it) using another routine. But, is there a way to solve this problem using a single parse instruction ? Thanks again Giuseppe Chillemi

[5/6] from: inetw3:mindspring at: 14-Mar-2004 3:37

Hi Giuseppe, I don't know how you want this data returned but to print it out you can do....... get-k: parse/all "KW1 555 KW1 333 KW2 444 " " " == ["KW1 555 " "" "" "" " KW1 333 KW2 444 " "" "" ""] get-k: replace/all mold get-k {" K} {"K} get-k: do get-k == ["KW1 555 " "" "" "" "KW1 333 KW2 444 " "" "" ""] foreach get-ks get-k [if find get-ks "K" [print get-ks]] KW1 555 KW1 333 KW2 444 instead of using *if find get-ks "K" you can do something like this..... swich get-ks [ KW1 [if find get-ks "KW2" [do something here] ] KW2 [some code here] etc... ]

[6/6] from: tomc:darkwing:uoregon at: 15-Mar-2004 20:36

Hi Giuseppe Chillemi what you are asing for is not too much I cant be sure where your lines were suppose to end but I assume it is after the after the phone number. one way to be sure that you do not skip over too much looking for the fax number is to break it into individual lines when you read it. (if it is stored with a record per row) sonething like: foreach line read/lines %file [parse line rule] but you can also parse the whole thing by building a parse rule that looks at each line one at a time. the rule below could be made more flexible by writing rules to handle whitespace i.e. ws: [[any " "] | [any tab]] and sticking it in different places. but hopefully this will help. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; the sequence of "N" mean: 3 or more numbers to ;;; undefined number of numbers digit: charset {0123456789} phone: [3 digit any digit " " 3 digit some digit] ;;; could also say ;;; phone: [3 4 digit " " 7 9 digit] ;;; or what ever number, if you knew the ranges ;;; an object to store a record ... could also use a simple block mark: make object! [ name: copy "" address: copy "" phone: copy "" fax: copy "" ] ;;; a block to store the objects in marks: copy [] ;;; parse rule for a line -- I just use the word 'token' out of habit line: [ (m: make mark[]) any newline ; there may not be one at the start/end "name-keyword " copy token to " " (m/name: token) " " thru "address-keyword" copy token to (m/address: token) thru "Tel.: " copy token phone (m/phone: token) opt [" - Fax.: " copy token phone (m/fax: token)] any " " (append marks m) ] ;;; mind the wrap page: {name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: 1234 12345678 name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE unusefulltext Tel.: 1234 567891011 - Fax.: 1234 110198765 } parse/all page [some line] probe marks ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; On Sun, 14 Mar 2004, Giuseppe Chillemi wrote:

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted