Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Parse versus Regular Expressions

From: lmecir:mbox:vol:cz at: 9-Apr-2003 17:12

Hi all, Joel wrote:
> PARSE seems to partake of the same issue as Prolog; its seems to be > implemented in a way that makes it a hybrid between declarative and > procedural programming, with the result that sometimes things have > side effects or behaviors that can be difficult to figure out.
I have read some grammar articles and came to an opinion, that PARSE the most closely resembles lambda calculus, which I would call *purely procedural*. It looks so natural, that it confused not just Joel, I think.
> I've just tripped over a tidy example of a task that's trivial to > solve if backtracking is available. > > Find a repeated group of characters at the end of a string (one or more > characters, repeated two or more times consecutively). For example, > > "my doggyoggy" > 11112222 > > "has as as as as " > 111222333444555 > > "fleasssssss" > 1234567 > > Note that some cases can be solved in multiple ways: > > "fleassssss" > 111222 > > "fleassssss" > 112233 > > "fleassssss" > 123456 > > in which case any solution is OK. > > The Perl/Python/etc-style regular expression which tests for this > situation and provides the first occurrence of the repeated group > of characters is > > /(.+)\1+$/ > > which, for those not familar with RE "line noise" ;-) can be read > > (.+) a group (the parens) of one or more (the plus) > arbitrary characters (the dot) > \1 ... followed by the same group previously matched > + ... one or more times > $ ... at the end of the string > > I'm really interested in how a PARSE expert would approach the same > task. Any takers?
To not start any flame wars, I would call the above expression an Enhanced Regular Expression, because as we will see below, it isn't a Regular Expression at all. We have seen the PARSE solution. I used my freshly acquired "knowledge" of grammar stuff and tried to analyse the grammar. Here are the rules I came to(supposing, that the base alphabet is just "ab", to simplify the expressions, S is the starting symbol): First, some Regular Rules: S->aS S->bS S->T T->aa T->bb Context Free Rules: T->aVA T->bVB V->Ca V->Db V->aVa V->bVb Context Preserving Rules: Ca->CE Cb->CF Da->DE Db->DF And, at last, Context Changing Rules: Ea->aE Eb->bE Fa->aF Fb->bF EA->Aa EB->Ba FA->Ab FB->Bb CA->aa CB->ba DA->ab DB->bb Uff! This points to the fact, that the discussed grammar is neither Regular, nor Context Free, and not even Context Preserving and it can hardly be called "simple". As opposed to this, the earlier PARSE rule example: z: [#"a" z #"b" | none] can be expressed as: Regular Rule: S->"" Context Free Rule: S->aSb i.e. it is a Context Free Grammar. Regards -Ladislav