Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

5 simple pattern matching questions

 [1/6] from: princepawn:mailandnews at: 15-Sep-2000 2:53


I am having problems switching my understanding of regular expressions to the REBOL parse dialect. Could someone please tell me how to do each of the following with parse? 1. match "cat" at the beginning of a line 2. match "cat", immediately preceded and followed by a word boundary , e.g., match "the cat in" or "the cat" but not "mercata" 3. match "cat" on a line all by itself 4. match the empty string: I think this is parse string "" 5. match any char: I think this is done by creating a bitset from a charset from hex 000 to hex 255 and parsing on that, but it doesnt work, e.g., bset: charset [ #"^(00)" - #"^(FF)" ] parse " " [ some bset ] fails

 [2/6] from: al:bri:xtra at: 15-Sep-2000 19:41


princepawn wrote:
> I am having problems switching my understanding of regular expressions to
the REBOL parse dialect. It would be nice to have a regular expression dialect in Rebol. Andrew Martin ICQ: 26227169 http://members.ncbi.com/AndrewMartin/ http://members.xoom.com/AndrewMartin/

 [3/6] from: al:bri:xtra at: 15-Sep-2000 20:00


> I am having problems switching my understanding of regular expressions to
the REBOL parse dialect. Could someone please tell me how to do each of the following with parse?
> 1. match "cat" at the beginning of a line >> line: "cat mat"
== "cat mat"
>> parse line ["cat" to end]
== true
>> line: "mercata"
== "mercata"
>> parse line ["cat" to end]
== false
>> line: "cat"
== "cat"
>> parse line ["cat" to end]
== true
> 2. match "cat", immediately preceded and followed by a word boundary ,
e.g., match "the cat in" or "the cat" but not "mercata"
>> line: "mercata"
== "mercata"
>> parse line [thru " cat " to end]
== false
>> line: "the cat in"
== "the cat in"
>> parse line [thru " cat " to end]
== true
> 3. match "cat" on a line all by itself >> line: "cat"
== "cat"
>> parse line ["cat"]
== true
>> line: "cat "
== "cat "
>> parse line ["cat"]
== false
> 4. match the empty string: I think this is > parse string "" >> Line: ""
== ""
>> empty? line
== true
> 5. match any char: I think this is done by creating a bitset from a
charset from hex 000 to hex 255 and parsing on that, but it doesnt work, e.g.,
> bset: charset [ #"^(00)" - #"^(FF)" ] > parse " " [ some bset ] > > fails
Any one? Andrew Martin 4/5 can't be tooooo bad... ICQ: 26227169 http://members.ncbi.com/AndrewMartin/ http://members.xoom.com/AndrewMartin/

 [4/6] from: d95-mjo:nada:kth:se at: 15-Sep-2000 10:48


On Fri, 15 Sep 2000 [princepawn--MailAndNews--com] wrote:
> I am having problems switching my understanding of regular expressions to the > REBOL parse dialect. Could someone please tell me how to do each of the
<<quoted lines omitted: 4>>
> parse " " [ some bset ] > fails
The problem here is that if you don't use parse/all, the space is ignored.
>> help parse
USAGE: PARSE input rules /all /case ... ... REFINEMENTS: /all -- Parses all chars including spaces. ... ... Try this: bset: charset [ #"^(00)" - #"^(FF)" ] parse/all " " [ some bset ] Or if you don't want to have to type all those weird chars: bset: complement charset "" parse/all " " [ some bset ] /Martin Johannesson, [d95-mjo--nada--kth--se]

 [5/6] from: rebol::techscribe::com at: 15-Sep-2000 2:04


Hi Andrew, you wrote:
>> 5. match any char: I think this is done by creating a bitset from a >charset from hex 000 to hex 255 and parsing on that, but it doesnt work,
<<quoted lines omitted: 4>>
>> fails >Any one?
The reason it fails is that parse without the /all refinement ignores spaces. Since you are using some in the rule, you are insisting that at least one of the characters in bset must be contained in the string being parsed. Now bset does contain a space. And so does the string. But parse without /all ignores the space in the string and effectively treats it like an empty string. Accordingly parse behaves as though you had demanded at least one - even whitespace - character to be contained in an empty string:
>> parse "" [ some bset ]
== false Same result. The remedy: You have two choices, either have parse also include spaces by using the /all refinement
>> parse/all " " [ some bset ]
== true This would exclude an empty string, but it does permit strings that consist only of one or more spaces. Or - if you also want to permit empty strings - relax your requirement that the string must contain at least one of the characters, i.e. permit an empty string by using any instead of some
>> parse " " [ any bset ]
== true With respect to the remainder, I wonder if princepawn may have meant "Given a multiline string, in which the lines consist of 1 .. 4, how do you match the specific line that fulfills one or the either of these criteria ... For instance, in the first case, my guess is that he meant to say intercept all of the lines that contains cat" at the beginning of the line. If this is what he meant, then you would start with a multiline string with a few lines containing some stuff that has nothing to do with cats, and then identify that or those line(s) that contain(s) "cat" at the beginning of the line, and do something useful with that line. The same goes for the other tasks. I'm not sure that's what he means. Let's wait and see (time to catch a few hours of sleep ... ;-). At 08:00 PM 9/15/00 +1200, you wrote:
>> I am having problems switching my understanding of regular expressions to >the REBOL parse dialect. Could someone please tell me how to do each of the
<<quoted lines omitted: 52>>
>http://members.xoom.com/AndrewMartin/ >-><-
;- Elan [ : - ) ] author of REBOL: THE OFFICIAL GUIDE REBOL Press: The Official Source for REBOL Books http://www.REBOLpress.com visit me at http://www.TechScribe.com

 [6/6] from: g:santilli:tiscalinet:it at: 16-Sep-2000 18:11


Hello [princepawn--MailAndNews--com]! On 15-Set-00, you wrote: p> I am having problems switching my understanding of regular p> expressions to the REBOL parse dialect. REBOL's parse dialect is mainly designed to parse "grammars", instead of doing pattern matching. So, there are a lot of things that are very simple to do with a regexps and quite difficult with PARSE, but there are also a lot of things that are incredibly simple to do with PARSE but almost impossible with regexps. p> 1. match "cat" at the beginning of a line lines: [ "cat" (print "found cat") thru newline | thru newline ] parse/all string [some lines] p> 2. match "cat", p> immediately preceded and followed by a word boundary , e.g., p> match "the cat in" or "the cat" but not "mercata" The simplest way:
>> found? find parse "the cat in" ",;.:!()?" "cat"
== true
>> found? find parse "the cat" ",;.:!()?" "cat"
== true
>> found? find parse "mercata" ",;.:!()?" "cat"
== false Using the parse dialect only: text: [some words] words: [ "cat" separator (print "found cat") | some word-char separator ] separator: [some sep-char | end] word-char: complement sep-char: charset " ,;.:!()?"
>> parse/all "the cat in" text
found cat == true
>> parse/all "the cat" text
found cat == true
>> parse/all "mercata" text
== true p> 3. match "cat" on a line all by itself Similar to 1.: lines: [ "cat" newline (print "found cat") | thru newline ] parse/all string [some lines] (Omit the /ALL refinement if you don't care about spaces.) p> 4. match the empty string: I think this is p> parse string "" Actually is PARSE/ALL STRING [END]. p> 5. match any char: I think this is done by creating a bitset p> from a charset from hex 000 to hex 255 and parsing on that, p> but it doesnt work, e.g., p> bset: charset [ #"^(00)" - #"^(FF)" ] p> parse " " [ some bset ] SKIP will match any char. So [SOME SKIP] will go to the end (like [TO END]). Anyway, the above works too: it's just that without /ALL PARSE ignores spaces, so treats " " as an empty string; an empty string does not contain any character... HTH, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted