Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] R: Re: Help on parsing

From: gchillemi:aliceposta:it at: 14-Mar-2004 9:36

> > 1) "KW1 555 <br> KW1 333 KW2 444 <br>" > > > > 2) "KW1 555 KW2 666 <br> KW2 444 <br>" > > > > > > > > I need to extract the value of KW1 and KW2 , or KW1 itself. > ^ value?
(Yes) The whole problem is to parse a page and its addresses with phone and fax I may have an input string formed in this way: UNUSEFULL-TEXT name-prefix-keyword(type 1 or type2) NAME address-keyword ADDRESS-DATA one or more of telephone-prefix-keyword TELEPHONE-NUM zero or more of "-" fax-prefix-keyword FAX-NUM and finally a <BR> The whole sequence repeats until the end of page is reached. So, let me extend the strings of the previous message: KW1 555 <br> KW1 333 KW2 444 <br> KW1 555 KW2 666 <br> KW2 444 <br> KW1 = "Tel.:" KW2 = "Fax." Let's use the new information. The 2 ways the address block could appear are: 1) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR> unusefulltext Tel.: NNNN NNNNNNNN <BR>" Or 2) "name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br> unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR>" Note that the sequence of "N" mean: 3 or more numbers to undefined number of numbers If I parse the following text... --- TEXT TO PARSE --- name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <BR> unusefulltext Tel.: NNNN NNNNNNNN <BR> (some unusefulltext) name-Keyword NAME-VALUE unusefulltext address-keyword ADDRESS-VALUE <br> unusefulltext Tel.: NNNN NNNNNNNN - Fax.: NNNN NNNNNNN <BR> --- END TEXT TO PARSE --- .....Using a logic like this:
> ;;; a recursive parse rule to copy the value from an unknown number of > ;;; consecutive "KW value" pairs in a string > ;;; possibaly separated with <br> > > rule: [ > "KW" ["1 " | "2 "] ; what the parser needs to reconize > copy result integer! ; may not be integer in real case > (append store result) ; store/use result immediatly > opt <br> ; there might be a trailing <br> > opt rule ; there might be another KW to reconize > ]
What prevents the sequence from going to the next address block ? Searching for "FAX.:" , in any wait I think to try to search it, let the parse instruction move to the next address block where it can find a FAX keyword ! I need to find a way to tell Rebol to search for "FAX.:" before the <BR> keyword and not after. In any way I tell to myself the correct rule it is implicit that REBOL will search in the whole text and not until <BR> However, a simple solution is to split the problem in 2 problems: searching for "tel." to <br> and than parsing the resulting string (which may have a FAX inside of it) using another routine. But, is there a way to solve this problem using a single parse instruction ? Thanks again Giuseppe Chillemi