[REBOL] R: Re: Help on parsing
From: gchillemi:aliceposta:it at: 14-Mar-2004 9:36
> > 1) "KW1 555 <br> KW1 333 KW2 444 <br>"
> > 2) "KW1 555 KW2 666 <br> KW2 444 <br>"
> > I need to extract the value of KW1 and KW2 , or KW1 itself.
> ^ value?
The whole problem is to parse a page and its addresses with phone and fax
I may have an input string formed in this way:
UNUSEFULL-TEXT name-prefix-keyword(type 1 or type2) NAME address-keyword
ADDRESS-DATA one or more of telephone-prefix-keyword TELEPHONE-NUM
zero or more of "-" fax-prefix-keyword FAX-NUM and finally
The whole sequence repeats until the end of page is reached.
So, let me extend the strings of the previous message:
KW1 555 <br> KW1 333 KW2 444 <br>
KW1 555 KW2 666 <br> KW2 444 <br>
KW1 = "Tel.:"
KW2 = "Fax."
Let's use the new information. The 2 ways the address block could appear
1) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR>
unusefulltext Tel.: NNNN NNNNNNNN <BR>"
2) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br>
unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR>"
Note that the sequence of "N" mean: 3 or more numbers to undefined number of
If I parse the following text...
--- TEXT TO PARSE ---
name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR>
unusefulltext Tel.: NNNN NNNNNNNN <BR>
name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br>
unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR>
--- END TEXT TO PARSE ---
.....Using a logic like this:
> ;;; a recursive parse rule to copy the value from an unknown number of
> ;;; consecutive "KW value" pairs in a string
> ;;; possibaly separated with <br>
> rule: [
> "KW" ["1 " | "2 "] ; what the parser needs to reconize
> copy result integer! ; may not be integer in real case
> (append store result) ; store/use result immediatly
> opt <br> ; there might be a trailing <br>
> opt rule ; there might be another KW to reconize
What prevents the sequence from going to the next address block ?
Searching for "FAX.:" , in any wait I think to try to search it, let the
parse instruction move to the next address block where it can find a FAX
I need to find a way to tell Rebol to search for "FAX.:" before the <BR>
keyword and not after. In any way I tell to myself the correct rule it is
implicit that REBOL will search in the whole text and not until <BR>
However, a simple solution is to split the problem in 2 problems: searching
for "tel." to <br> and than parsing the resulting string (which may have a
FAX inside of it) using another routine.
But, is there a way to solve this problem using a single parse instruction ?