[REBOL] Re: Parse versus Regular Expressions
From: joel:neely:fedex at: 5-Apr-2003 7:46
Hi, Ladislav,
Thanks for the enlightening discussion!
Ladislav Mecir wrote:
> > in Perl would be something like
> >
> > $somestring =~ /^(a*)(b*)$/ and length ($1) == length ($2)
>
> , which is Perl, not RE...
>
Well, since there is no "pure RE" language, I have to use RE as
embedded in *some* programming language. I could have just as well
used Python, Ruby, Java, awk, etc... (The thought of using c for
this is just too painful to contemplate! ;-)
> >
> > $somestring =~ /^(a*)(b*)(c*)$/ and
> > length ($1) == length ($2) and
> > length ($2) == length ($3)
> >
> > and so on for four or more "pieces".
>
> Again, Perl, not RE. Parse dialect:
>
Again, I had to use some programming language if I wanted to show
a fragment of a program. The point was that most languages that use
RE (with which I'm familiar) make it easy to specify a declarative
pattern, then ask additional questions about the data if that pattern
is successfully matched (in this case, comparing the lengths of the
runs of "a", "b", and "c").
> x: [start: any #"a" end: (n: offset? start end) n #"b" n #"c"]
>
Very nice! Thanks for the solution!
Now let me scale differently: suppose I want to match consecutive,
equal-length runs of those three letters anywhere within the target
string? For example, all of the targets
"my dog has aaabbbccc fleas"
"aaadddeeeabc"
"abcccccccc"
"aabbaaaaabbcccc"
meet that criterion.
The previous solution transforms easily, as follows:
1) allow matching anywhere in the target -- this is implemented by
removing the BOS/EOS anchors (^ and $) from the pattern;
2) require at least one of each character (since zero of each is
an empty string that can be found anywhere an any target!) --
this is implemented by changing the * qualifier ("any") to
+ ("some") on all subpatterns;
3) recognize that extra "a"s at the beginning and/or extra "c"s at
the end don't disqualify the group -- this is implemented by
requiring only that there are at least as many "a" as "b" and
at least as many "c" as "b".
I'm "thinking out loud" here to show the thought process involve
in moving from one problem/solution to the next. The changes above
give us:
$somestring =~ /(a+)(b+)(c+)/ and
length ($1) >= length ($2) and
length ($2) <= length ($3)
So, let me ask here, how would you go about solving this next
variation on the theme? Would you transform the definition of X
above, or would you address it as a fresh problem with a different
strategy for solving?
-jn-
--
Polonius: ... What do you read, my lord?
Hamlet: Words, words, words.
_Hamlet_, Act II, Scene 2