[REBOL] Re: Parse versus Regular Expressions
From: lmecir:mbox:vol:cz at: 5-Apr-2003 23:50
Hi Joel,
> Thanks for the enlightening discussion!
I enjoy it too, although I am a little bit busy.
> > > in Perl would be something like
> > >
> > > $somestring =~ /^(a*)(b*)$/ and length ($1) == length ($2)
> >
> > , which is Perl, not RE...
> >
>
> Well, since there is no "pure RE" language, I have to use RE as
> embedded in *some* programming language. I could have just as well
> used Python, Ruby, Java, awk, etc... (The thought of using c for
> this is just too painful to contemplate! ;-)
I just wanted to underline the fact (not obvious from your explanations),
that RE provably cannot do such things. Only programming languages not using
the proper definition of RE can.
> Now let me scale differently: suppose I want to match consecutive,
> equal-length runs of those three letters anywhere within the target
> string? For example, all of the targets
>
> "my dog has aaabbbccc fleas"
> "aaadddeeeabc"
> "abcccccccc"
> "aabbaaaaabbcccc"
>
> meet that criterion.
I would use my TO-RULE function and the previous rule X as follows:
x: [start: any #"a" end: (n: offset? start end) n #"b" n #"c"]
z: to-rule x
parse "my dog has aaabbbccc fleas" [z to end]
The function:
fail: [end skip]
to-rule: function [
{generate a to A parse rule}
a [block!]
] [c f] [
compose/deep [
any [
[
(reduce [a]) ([(c: fail f: none) | (c: none f: fail) skip])
] c
] f
]
]
> The previous solution transforms easily, as follows:
>
> 1) allow matching anywhere in the target -- this is implemented by
> removing the BOS/EOS anchors (^ and $) from the pattern;
> 2) require at least one of each character (since zero of each is
> an empty string that can be found anywhere an any target!) --
> this is implemented by changing the * qualifier ("any") to
> + ("some") on all subpatterns;
> 3) recognize that extra "a"s at the beginning and/or extra "c"s at
> the end don't disqualify the group -- this is implemented by
> requiring only that there are at least as many "a" as "b" and
> at least as many "c" as "b".
>
> I'm "thinking out loud" here to show the thought process involve
> in moving from one problem/solution to the next. The changes above
> give us:
>
> $somestring =~ /(a+)(b+)(c+)/ and
> length ($1) >= length ($2) and
> length ($2) <= length ($3)
>
> So, let me ask here, how would you go about solving this next
> variation on the theme? Would you transform the definition of X
> above, or would you address it as a fresh problem with a different
> strategy for solving?
>
> -jn-
Let me ask you a question. What would be the result of your expression for:
"aabbbcccabc"
Regards
-Ladislav