Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] 5 simple pattern matching questions Re:(2)

From: rebol::techscribe::com at: 15-Sep-2000 2:04

Hi Andrew, you wrote:
>> 5. match any char: I think this is done by creating a bitset from a >charset from hex 000 to hex 255 and parsing on that, but it doesnt work, >e.g., >> bset: charset [ #"^(00)" - #"^(FF)" ] >> parse " " [ some bset ] >> >> fails > >Any one?
The reason it fails is that parse without the /all refinement ignores spaces. Since you are using some in the rule, you are insisting that at least one of the characters in bset must be contained in the string being parsed. Now bset does contain a space. And so does the string. But parse without /all ignores the space in the string and effectively treats it like an empty string. Accordingly parse behaves as though you had demanded at least one - even whitespace - character to be contained in an empty string:
>> parse "" [ some bset ]
== false Same result. The remedy: You have two choices, either have parse also include spaces by using the /all refinement
>> parse/all " " [ some bset ]
== true This would exclude an empty string, but it does permit strings that consist only of one or more spaces. Or - if you also want to permit empty strings - relax your requirement that the string must contain at least one of the characters, i.e. permit an empty string by using any instead of some
>> parse " " [ any bset ]
== true With respect to the remainder, I wonder if princepawn may have meant "Given a multiline string, in which the lines consist of 1 .. 4, how do you match the specific line that fulfills one or the either of these criteria ... For instance, in the first case, my guess is that he meant to say intercept all of the lines that contains cat" at the beginning of the line. If this is what he meant, then you would start with a multiline string with a few lines containing some stuff that has nothing to do with cats, and then identify that or those line(s) that contain(s) "cat" at the beginning of the line, and do something useful with that line. The same goes for the other tasks. I'm not sure that's what he means. Let's wait and see (time to catch a few hours of sleep ... ;-). At 08:00 PM 9/15/00 +1200, you wrote:
>> I am having problems switching my understanding of regular expressions to >the REBOL parse dialect. Could someone please tell me how to do each of the >following with parse? >> >> 1. match "cat" at the beginning of a line > >>> line: "cat mat" >== "cat mat" >>> parse line ["cat" to end] >== true >>> line: "mercata" >== "mercata" >>> parse line ["cat" to end] >== false >>> line: "cat" >== "cat" >>> parse line ["cat" to end] >== true > >> 2. match "cat", immediately preceded and followed by a word boundary , >e.g., match "the cat in" or "the cat" but not "mercata" > >>> line: "mercata" >== "mercata" >>> parse line [thru " cat " to end] >== false >>> line: "the cat in" >== "the cat in" >>> parse line [thru " cat " to end] >== true > >> 3. match "cat" on a line all by itself > >>> line: "cat" >== "cat" >>> parse line ["cat"] >== true >>> line: "cat " >== "cat " >>> parse line ["cat"] >== false > >> 4. match the empty string: I think this is >> parse string "" > >>> Line: "" >== "" >>> empty? line >== true > >> 5. match any char: I think this is done by creating a bitset from a >charset from hex 000 to hex 255 and parsing on that, but it doesnt work, >e.g., >> bset: charset [ #"^(00)" - #"^(FF)" ] >> parse " " [ some bset ] >> >> fails > >Any one? > >Andrew Martin >4/5 can't be tooooo bad... >ICQ: 26227169 >http://members.ncbi.com/AndrewMartin/ >http://members.xoom.com/AndrewMartin/ >-><- >
;- Elan [ : - ) ] author of REBOL: THE OFFICIAL GUIDE REBOL Press: The Official Source for REBOL Books http://www.REBOLpress.com visit me at http://www.TechScribe.com