Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parse versus Regular Expressions

From: joel:neely:fedex at: 5-Apr-2003 7:46

Hi, Ladislav, Thanks for the enlightening discussion! Ladislav Mecir wrote:
> > in Perl would be something like > > > > $somestring =~ /^(a*)(b*)$/ and length ($1) == length ($2) > > , which is Perl, not RE... >
Well, since there is no "pure RE" language, I have to use RE as embedded in *some* programming language. I could have just as well used Python, Ruby, Java, awk, etc... (The thought of using c for this is just too painful to contemplate! ;-)
> > > > $somestring =~ /^(a*)(b*)(c*)$/ and > > length ($1) == length ($2) and > > length ($2) == length ($3) > > > > and so on for four or more "pieces". > > Again, Perl, not RE. Parse dialect: >
Again, I had to use some programming language if I wanted to show a fragment of a program. The point was that most languages that use RE (with which I'm familiar) make it easy to specify a declarative pattern, then ask additional questions about the data if that pattern is successfully matched (in this case, comparing the lengths of the runs of "a", "b", and "c").
> x: [start: any #"a" end: (n: offset? start end) n #"b" n #"c"] >
Very nice! Thanks for the solution! Now let me scale differently: suppose I want to match consecutive, equal-length runs of those three letters anywhere within the target string? For example, all of the targets "my dog has aaabbbccc fleas" "aaadddeeeabc" "abcccccccc" "aabbaaaaabbcccc" meet that criterion. The previous solution transforms easily, as follows: 1) allow matching anywhere in the target -- this is implemented by removing the BOS/EOS anchors (^ and $) from the pattern; 2) require at least one of each character (since zero of each is an empty string that can be found anywhere an any target!) -- this is implemented by changing the * qualifier ("any") to + ("some") on all subpatterns; 3) recognize that extra "a"s at the beginning and/or extra "c"s at the end don't disqualify the group -- this is implemented by requiring only that there are at least as many "a" as "b" and at least as many "c" as "b". I'm "thinking out loud" here to show the thought process involve in moving from one problem/solution to the next. The changes above give us: $somestring =~ /(a+)(b+)(c+)/ and length ($1) >= length ($2) and length ($2) <= length ($3) So, let me ask here, how would you go about solving this next variation on the theme? Would you transform the definition of X above, or would you address it as a fresh problem with a different strategy for solving? -jn- -- Polonius: ... What do you read, my lord? Hamlet: Words, words, words. _Hamlet_, Act II, Scene 2