[REBOL] Re: 5 simple pattern matching questions
From: g:santilli:tiscalinet:it at: 16-Sep-2000 18:11
Hello [princepawn--MailAndNews--com]!
On 15-Set-00, you wrote:
p> I am having problems switching my understanding of regular
p> expressions to the REBOL parse dialect.
REBOL's parse dialect is mainly designed to parse "grammars",
instead of doing pattern matching. So, there are a lot of things
that are very simple to do with a regexps and quite difficult with
PARSE, but there are also a lot of things that are incredibly
simple to do with PARSE but almost impossible with regexps.
p> 1. match "cat" at the beginning of a line
lines: [
"cat" (print "found cat") thru newline
| thru newline
]
parse/all string [some lines]
p> 2. match "cat",
p> immediately preceded and followed by a word boundary , e.g.,
p> match "the cat in" or "the cat" but not "mercata"
The simplest way:
>> found? find parse "the cat in" ",;.:!()?" "cat"
== true
>> found? find parse "the cat" ",;.:!()?" "cat"
== true
>> found? find parse "mercata" ",;.:!()?" "cat"
== false
Using the parse dialect only:
text: [some words]
words: [
"cat" separator (print "found cat")
| some word-char separator
]
separator: [some sep-char | end]
word-char: complement sep-char: charset " ,;.:!()?"
>> parse/all "the cat in" text
found cat
== true
>> parse/all "the cat" text
found cat
== true
>> parse/all "mercata" text
== true
p> 3. match "cat" on a line all by itself
Similar to 1.:
lines: [
"cat" newline (print "found cat")
| thru newline
]
parse/all string [some lines]
(Omit the /ALL refinement if you don't care about spaces.)
p> 4. match the empty string: I think this is
p> parse string ""
Actually is PARSE/ALL STRING [END].
p> 5. match any char: I think this is done by creating a bitset
p> from a charset from hex 000 to hex 255 and parsing on that,
p> but it doesnt work, e.g.,
p> bset: charset [ #"^(00)" - #"^(FF)" ]
p> parse " " [ some bset ]
SKIP will match any char. So [SOME SKIP] will go to the end (like
[TO END]). Anyway, the above works too: it's just that without
/ALL PARSE ignores spaces, so treats " " as an empty string; an
empty string does not contain any character...
HTH,
Gabriele.
--
Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer
Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/