Mailing List Archive: R: Re: Help on parsing

[REBOL] R: Re: Help on parsing

From: gchillemi:aliceposta:it at: 14-Mar-2004 9:36


> > 1)  "KW1 555 <br> KW1 333 KW2 444 <br>"
> >
> > 2)  "KW1 555 KW2 666 <br> KW2 444 <br>"
> >
> >
> >
> > I need to extract the value of KW1 and KW2 , or KW1 itself.
>                                                      ^ value?

(Yes)

The whole problem is to parse a page and its addresses with phone and fax

I may have an input string formed in this way:

UNUSEFULL-TEXT name-prefix-keyword(type 1 or type2) NAME address-keyword
ADDRESS-DATA       one or more of     telephone-prefix-keyword TELEPHONE-NUM

     zero or more of      "-" fax-prefix-keyword FAX-NUM        and finally
a    <BR>

The whole sequence repeats until the end of page is reached.

So, let me extend the strings of the previous message:

KW1 555 <br> KW1 333 KW2 444 <br>

KW1 555 KW2 666 <br> KW2 444 <br>

KW1 = "Tel.:"
KW2 = "Fax."

Let's use the new information. The 2 ways the address block could appear
are:

1) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR>
unusefulltext Tel.: NNNN NNNNNNNN <BR>"

Or

2) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br>
unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR>"

Note that the sequence of "N" mean: 3 or more numbers to undefined number of
numbers

If I parse the following text...

--- TEXT TO PARSE ---
name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR>
unusefulltext Tel.: NNNN NNNNNNNN <BR>

(some unusefulltext)

name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br>
unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR>
--- END TEXT TO PARSE ---

.....Using a logic like this:

> ;;; a recursive parse rule to copy the value from an unknown number of
> ;;; consecutive "KW value" pairs in a string
> ;;; possibaly separated with <br>
>
> rule: [
>     "KW" ["1 " | "2 "]       ; what the parser needs to reconize
>     copy result integer!     ; may not be integer in real case
>     (append store result)    ; store/use result immediatly
>     opt <br>                 ; there might be a trailing <br>
>     opt rule                 ; there might be another KW to reconize
> ]

What prevents the sequence from going to the next address block ?

Searching for "FAX.:" , in any wait I think to try to search it, let the
parse instruction move to the next address block where it can find a FAX
keyword !

I need to find a way to tell Rebol to search for "FAX.:" before the <BR>
keyword and not after. In any way I tell to myself the correct rule it is
implicit that REBOL will search in the whole text and not until <BR>

However, a simple solution is to split the problem in 2 problems: searching
for "tel." to <br> and than parsing the resulting string (which may have a
FAX inside of it) using another routine.

But, is there a way to solve this problem using a single parse instruction ?

Thanks again

Giuseppe Chillemi