[REBOL] Re: Parse versus Regular Expressions
From: joel:neely:fedex at: 4-Apr-2003 6:57
Hi, Romano,
Romano Paolo Tenca wrote:
> parse "ba" ["b" | "ba"] ;== false
>
> I think that this result is right, because the rule ["b" | "ba"]
> is matched by "b", the failure occurs after the rule matched,
> when parse find "a", so no backtracing happens here, because there
> are not open alternative of failed rule.
>
> Do you think that parse should recover to any alternative rule
> also in correctly matched ones?
>
Yes.
How would you read the rule above (aloud)? wouldn't you read
["b" | "ba"]
as
The string containing "b" or the string containing "ba".
or some equivalent variation? Wouldn't most people assume that the
specific string "ba" matches that description? The above (false)
result is troubling to me for several reasons:
1) It makes the PARSE dialect a strange hybrid between declarative
and procedural descriptions (as mentioned in my earlier post).
2) It totally breaks normal usage that "or" is symmetrical, even
though we read the | as "or".
3) There are many aspects of REBOL design that are justified or
defended in discussion on the basis that "REBOL was not designed
for computing scientists or programmers, but for human beings",
yet if you verbalize the above expression
parse "ba" ["b" | "ba"]
in "everyday people language" e.g. something like
Is the sequence of letters "BA" accepted by a rule that
will match "B" or "BA"?
I believe that most people would say "Of course!"
4) It breaks the obvious, intuitive attempts to describe the way
the parts of the PARSE dialect interact. I think most folks
would find it convenient to be able to "reason by analogy"
using such familar rules as
a * (b + c) == (a * b) + (a * c)
about the relationsips between, e.g. concatenation and choice.
Notice, however, the differences in behavior of the following
attempts at parse rules.
parse "ba" ["b" | "ba"] ; == false
parse "ba" ["b" ["" | "a"]] ; == false
parse "ba" ["b" [end | "a"]] ; == true
parse "ba" ["b" end | "ba"] ; == true
parse "ba" ["b" end | "ba" end] ; == true
parse "ba" [["b" | "ba"] end] ; == false
AFAIK the only answers to "Why are these different?" would be
a) a complicated description of the implementation, or
b) a rationale based on emphasizing some other detail than
the obvious "what would most people expect" issue.
-jn-
--
Polonius: ... What do you read, my lord?
Hamlet: Words, words, words.
_Hamlet_, Act II, Scene 2