Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parse versus Regular Expressions

From: lmecir:mbox:vol:cz at: 5-Apr-2003 23:50

Hi Joel,
> Thanks for the enlightening discussion!
I enjoy it too, although I am a little bit busy.
> > > in Perl would be something like > > > > > > $somestring =~ /^(a*)(b*)$/ and length ($1) == length ($2) > > > > , which is Perl, not RE... > > > > Well, since there is no "pure RE" language, I have to use RE as > embedded in *some* programming language. I could have just as well > used Python, Ruby, Java, awk, etc... (The thought of using c for > this is just too painful to contemplate! ;-)
I just wanted to underline the fact (not obvious from your explanations), that RE provably cannot do such things. Only programming languages not using the proper definition of RE can.
> Now let me scale differently: suppose I want to match consecutive, > equal-length runs of those three letters anywhere within the target > string? For example, all of the targets > > "my dog has aaabbbccc fleas" > "aaadddeeeabc" > "abcccccccc" > "aabbaaaaabbcccc" > > meet that criterion.
I would use my TO-RULE function and the previous rule X as follows: x: [start: any #"a" end: (n: offset? start end) n #"b" n #"c"] z: to-rule x parse "my dog has aaabbbccc fleas" [z to end] The function: fail: [end skip] to-rule: function [ {generate a to A parse rule} a [block!] ] [c f] [ compose/deep [ any [ [ (reduce [a]) ([(c: fail f: none) | (c: none f: fail) skip]) ] c ] f ] ]
> The previous solution transforms easily, as follows: > > 1) allow matching anywhere in the target -- this is implemented by > removing the BOS/EOS anchors (^ and $) from the pattern; > 2) require at least one of each character (since zero of each is > an empty string that can be found anywhere an any target!) -- > this is implemented by changing the * qualifier ("any") to > + ("some") on all subpatterns; > 3) recognize that extra "a"s at the beginning and/or extra "c"s at > the end don't disqualify the group -- this is implemented by > requiring only that there are at least as many "a" as "b" and > at least as many "c" as "b". > > I'm "thinking out loud" here to show the thought process involve > in moving from one problem/solution to the next. The changes above > give us: > > $somestring =~ /(a+)(b+)(c+)/ and > length ($1) >= length ($2) and > length ($2) <= length ($3) > > So, let me ask here, how would you go about solving this next > variation on the theme? Would you transform the definition of X > above, or would you address it as a fresh problem with a different > strategy for solving? > > -jn-
Let me ask you a question. What would be the result of your expression for: "aabbbcccabc" Regards -Ladislav