Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parse versus Regular Expressions

From: joel:neely:fedex at: 4-Apr-2003 6:57

Hi, Romano, Romano Paolo Tenca wrote:
> parse "ba" ["b" | "ba"] ;== false > > I think that this result is right, because the rule ["b" | "ba"] > is matched by "b", the failure occurs after the rule matched, > when parse find "a", so no backtracing happens here, because there > are not open alternative of failed rule. > > Do you think that parse should recover to any alternative rule > also in correctly matched ones? >
Yes. How would you read the rule above (aloud)? wouldn't you read ["b" | "ba"] as The string containing "b" or the string containing "ba". or some equivalent variation? Wouldn't most people assume that the specific string "ba" matches that description? The above (false) result is troubling to me for several reasons: 1) It makes the PARSE dialect a strange hybrid between declarative and procedural descriptions (as mentioned in my earlier post). 2) It totally breaks normal usage that "or" is symmetrical, even though we read the | as "or". 3) There are many aspects of REBOL design that are justified or defended in discussion on the basis that "REBOL was not designed for computing scientists or programmers, but for human beings", yet if you verbalize the above expression parse "ba" ["b" | "ba"] in "everyday people language" e.g. something like Is the sequence of letters "BA" accepted by a rule that will match "B" or "BA"? I believe that most people would say "Of course!" 4) It breaks the obvious, intuitive attempts to describe the way the parts of the PARSE dialect interact. I think most folks would find it convenient to be able to "reason by analogy" using such familar rules as a * (b + c) == (a * b) + (a * c) about the relationsips between, e.g. concatenation and choice. Notice, however, the differences in behavior of the following attempts at parse rules. parse "ba" ["b" | "ba"] ; == false parse "ba" ["b" ["" | "a"]] ; == false parse "ba" ["b" [end | "a"]] ; == true parse "ba" ["b" end | "ba"] ; == true parse "ba" ["b" end | "ba" end] ; == true parse "ba" [["b" | "ba"] end] ; == false AFAIK the only answers to "Why are these different?" would be a) a complicated description of the implementation, or b) a rationale based on emphasizing some other detail than the obvious "what would most people expect" issue. -jn- -- Polonius: ... What do you read, my lord? Hamlet: Words, words, words. _Hamlet_, Act II, Scene 2