Mailing List Archive: Re: Parse versus Regular Expressions

[REBOL] Re: Parse versus Regular Expressions

From: lmecir:mbox:vol:cz at: 5-Apr-2003 23:50


Hi Joel,

> Thanks for the enlightening discussion!

I enjoy it too, although I am a little bit busy.

> > > in Perl would be something like
> > >
> > >     $somestring =~ /^(a*)(b*)$/ and length ($1) == length ($2)
> >
> > , which is Perl, not RE...
> >
>
> Well, since there is no "pure RE" language, I have to use RE as
> embedded in *some* programming language.  I could have just as well
> used Python, Ruby, Java, awk, etc...  (The thought of using c for
> this is just too painful to contemplate! ;-)

I just wanted to underline the fact (not obvious from your explanations),
that RE provably cannot do such things. Only programming languages not using
the proper definition of RE can.

> Now let me scale differently:  suppose I want to match consecutive,
> equal-length runs of those three letters anywhere within the target
> string?  For example, all of the targets
>
>     "my dog has aaabbbccc fleas"
>     "aaadddeeeabc"
>     "abcccccccc"
>     "aabbaaaaabbcccc"
>
> meet that criterion.

I would use my TO-RULE function and the previous rule X as follows:

    x: [start: any #"a" end: (n: offset? start end) n #"b" n #"c"]
    z: to-rule x
    parse "my dog has aaabbbccc fleas" [z to end]

The function:

fail: [end skip]
to-rule: function [
    {generate a to A parse rule}
    a [block!]
] [c f] [
    compose/deep [
        any [
            [
                (reduce [a]) ([(c: fail f: none) | (c: none f: fail) skip])
            ] c
        ] f
    ]
]

> The previous solution transforms easily, as follows:
>
> 1)  allow matching anywhere in the target -- this is implemented by
>     removing the BOS/EOS anchors (^ and $) from the pattern;
> 2)  require at least one of each character (since zero of each is
>     an empty string that can be found anywhere an any target!) --
>     this is implemented by changing the * qualifier ("any") to
>     + ("some") on all subpatterns;
> 3)  recognize that extra "a"s at the beginning and/or extra "c"s at
>     the end don't disqualify the group -- this is implemented by
>     requiring only that there are at least as many "a" as "b" and
>     at least as many "c" as "b".
>
> I'm "thinking out loud" here to show the thought process involve
> in moving from one problem/solution to the next.  The changes above
> give us:
>
>     $somestring =~ /(a+)(b+)(c+)/ and
>         length ($1) >= length ($2)   and
>         length ($2) <= length ($3)
>
> So, let me ask here, how would you go about solving this next
> variation on the theme?  Would you transform the definition of X
> above, or would you address it as a fresh problem with a different
> strategy for solving?
>
> -jn-

Let me ask you a question. What would be the result of your expression for:

    "aabbbcccabc"

Regards
-Ladislav