Mailing List Archive: 5 simple pattern matching questions Re:(2)

[REBOL] 5 simple pattern matching questions Re:(2)

From: rebol::techscribe::com at: 15-Sep-2000 2:04


Hi Andrew,

you wrote:
>> 5. match any char: I think this is done by creating a bitset from a
>charset from hex 000 to hex 255 and parsing on that, but it doesnt work,
>e.g.,
>>  bset: charset [ #"^(00)" - #"^(FF)" ]
>>  parse " " [ some bset ]
>>
>> fails
>
>Any one?

The reason it fails is that parse without the /all refinement ignores spaces.

Since you are using some in the rule, you are insisting that at least one
of the characters in bset must be contained in the string being parsed. Now
bset does contain a space. And so does the string. But parse without /all
ignores the space in the string and effectively treats it like an empty
string. Accordingly parse behaves as though you had demanded at least one -
even whitespace - character to be contained in an empty string:

>> parse "" [ some bset ]
== false

Same result. The remedy:

You have two choices, either have parse also include spaces by using the
/all refinement

>> parse/all " " [ some bset ]
== true

This would exclude an empty string, but it does permit strings that consist
only of one or more spaces.

Or - if you also want to permit empty strings - relax your requirement that
the string must contain at least one of the characters, i.e. permit an
empty string by using any instead of some

>> parse " " [ any bset ]
== true

With respect to the remainder, I wonder if princepawn may have meant "Given
a multiline string, in which the lines consist of 1 .. 4, how do you match
the specific line that fulfills one or the either of these criteria ...

For instance, in the first case, my guess is that he meant to say
intercept all of the lines that contains
cat" at the beginning of the line.
If this is what he meant, then you would start with a multiline string with
a few lines containing some stuff that has nothing to do with cats, and
then identify that or those line(s) that contain(s) "cat" at the beginning
of the line, and do something useful with that line. The same goes for the
other tasks.

I'm not sure that's what he means. Let's wait and see (time to catch a few
hours of sleep ... ;-).

At 08:00 PM 9/15/00 +1200, you wrote:
>> I am having problems switching my understanding of regular expressions to
>the REBOL parse dialect. Could someone please tell me how to do each of the
>following with parse?
>>
>> 1. match "cat" at the beginning of a line
>
>>> line: "cat mat"
>== "cat mat"
>>> parse line ["cat" to end]
>== true
>>> line: "mercata"
>== "mercata"
>>> parse line ["cat" to end]
>== false
>>> line: "cat"
>== "cat"
>>> parse line ["cat" to end]
>== true
>
>> 2. match "cat", immediately preceded and followed by a word boundary ,
>e.g., match "the cat in" or "the cat" but not "mercata"
>
>>> line: "mercata"
>== "mercata"
>>> parse line [thru " cat " to end]
>== false
>>> line: "the cat in"
>== "the cat in"
>>> parse line [thru " cat " to end]
>== true
>
>> 3. match "cat" on a line all by itself
>
>>> line: "cat"
>== "cat"
>>> parse line ["cat"]
>== true
>>> line: "cat "
>== "cat "
>>> parse line ["cat"]
>== false
>
>> 4. match the empty string: I think this is
>>   parse string ""
>
>>> Line: ""
>== ""
>>> empty? line
>== true
>
>> 5. match any char: I think this is done by creating a bitset from a
>charset from hex 000 to hex 255 and parsing on that, but it doesnt work,
>e.g.,
>>  bset: charset [ #"^(00)" - #"^(FF)" ]
>>  parse " " [ some bset ]
>>
>> fails
>
>Any one?
>
>Andrew Martin
>4/5 can't be tooooo bad...
>ICQ: 26227169
>http://members.ncbi.com/AndrewMartin/
>http://members.xoom.com/AndrewMartin/
>-><-
>

;- Elan [ : - ) ]
    author of REBOL: THE OFFICIAL GUIDE
    REBOL Press: The Official Source for REBOL Books
    http://www.REBOLpress.com
    visit me at http://www.TechScribe.com