Mailing List Archive: Re: Parse versus Regular Expressions

[REBOL] Re: Parse versus Regular Expressions

From: lmecir:mbox:vol:cz at: 9-Apr-2003 17:12


Hi all,

Joel wrote:

> PARSE seems to partake of the same issue as Prolog; its seems to be
> implemented in a way that makes it a hybrid between declarative and
> procedural programming, with the result that sometimes things have
> side effects or behaviors that can be difficult to figure out.


I have read some grammar articles and came to an opinion, that PARSE the most closely 
resembles lambda calculus, which I would call *purely procedural*. It looks so natural, 
that it confused not just Joel, I think.

> I've just tripped over a tidy example of a task that's trivial to
> solve if backtracking is available.
>
> Find a repeated group of characters at the end of a string (one or more
> characters, repeated two or more times consecutively).  For example,
>
>     "my doggyoggy"
>          11112222
>
>     "has as as as as "
>       111222333444555
>
>     "fleasssssss"
>          1234567
>
> Note that some cases can be solved in multiple ways:
>
>     "fleassssss"
>          111222
>
>     "fleassssss"
>          112233
>
>     "fleassssss"
>          123456
>
> in which case any solution is OK.
>
> The Perl/Python/etc-style regular expression which tests for this
> situation and provides the first occurrence of the repeated group
> of characters is
>
>     /(.+)\1+$/
>
> which, for those not familar with RE "line noise" ;-) can be read
>
>      (.+)      a group (the parens) of one or more (the plus)
>                    arbitrary characters (the dot)
>          \1    ... followed by the same group previously matched
>            +   ... one or more times
>             $  ... at the end of the string
>
> I'm really interested in how a PARSE expert would approach the same
> task.  Any takers?

To not start any flame wars, I would call the above expression an Enhanced

 Regular Expression, because as we will see below, it isn't a Regular Expression at all.


We have seen the PARSE solution. I used my freshly acquired "knowledge" of grammar stuff 
and tried to analyse the grammar. Here are the rules I came to(supposing, that the base 
alphabet is just "ab", to simplify the expressions, S is the starting symbol):

First, some Regular Rules:

S->aS
S->bS
S->T
T->aa
T->bb

Context Free Rules:

T->aVA
T->bVB
V->Ca
V->Db
V->aVa
V->bVb

Context Preserving Rules:

Ca->CE
Cb->CF
Da->DE
Db->DF

And, at last, Context Changing Rules:

Ea->aE
Eb->bE
Fa->aF
Fb->bF
EA->Aa
EB->Ba
FA->Ab
FB->Bb
CA->aa
CB->ba
DA->ab
DB->bb


Uff! This points to the fact, that the discussed grammar is neither Regular, nor Context 
Free, and not even Context Preserving and it can hardly be called "simple".

As opposed to this, the earlier PARSE rule example:

z: [#"a" z #"b" | none]

can be expressed as:

Regular Rule:

S->""

Context Free Rule:

S->aSb

i.e. it is a Context Free Grammar.

Regards
-Ladislav