[REBOL] Re: Parse versus Regular Expressions
From: joel:neely:fedex at: 5-Apr-2003 8:40
Hi, Gabriele,
Before we get too far down this road (and have to backtrack ;-), let
me state clearly that I'm not agitating for PARSE to be redesigned.
I am simply suggesting that each paradigm has its own "sweet spot".
I've always found that laying alternative strategies down side-by-
side and figuring out exactly where/why/how they differ has helped
me use both more effectively. That's my intent in this conversation.
Now let me try to reply to your questions/objections.
Gabriele Santilli wrote:
> JN> The specific example at hand required the programmer to take
> JN> action to make the first alternative fail before PARSE would
> JN> consider the second. It would be nice if failure at the end
> JN> of the *entire* rule also caused backtracking so the programmer
> JN> didn't have to do any extra work.
>
> That would mean that we can no more use subblocks for parsing.
>
> I.e., if PARSE worked as you suggest, how would one write this?
>
> rule1: ["a" | "b"]
> rule2: ["c" rule1 "d"]
>
> On "cad", based on your suggestion, rule1 would not reach the end
> of the string and thus backtrack.
>
By "*entire* rule" I meant the top-level one invoked by PARSE itself.
In other words, when failure occurs, try a different alternative.
Under this view, saying
parse "cad" rule2
would be equivalent to saying
Find a path from start to finish through
(start)--- "c" ---+--- "a" ---+--- "d" ---(finish)
\ /
'- "b" -'
that spells "cad".
This doesn't require any backtracking over RULE1, since failure only
occurs when there is no viable alternative for the present position
(and REBOL defines being at the end of the rule with more input
remaining to be matched as not viable).
Another way of saying this is that your RULE1 and RULE2 could be
regarded (under a pure, declarative, backtrackable model) as being
a little grammer that matches/generates the set of strings that
are constructed by following all possible paths from start to
finish:
"cad"
"cbd"
in this case.
> Notice that you can't backtrack the whole rule because it can be
> composed of a complex network of subrules, each one having '|
> alternatives. Which one should backtrack?
>
The standard answer is "the most recent". For example, nested rules
such as
["a" ["b" | "bc"] ["d" | "de"] "f"]
that contain more than one decision point can still be "explored" in
a systematic way. At the risk of the illustration being more bulky
than I'd like, we can trace the attempt to match a string to the
above rule using underlining to show which portions of the string and
rule have been matched:
String Rule Comment
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] just beginning
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match, no choice
- -
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 1st choice
-- - - in 1st group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail 1st choice
--* - - * in 2nd group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail 2nd choice
--* - - * in 2nd group
...at which point there are no alternatives left, so back up and
reconsider the previous decision...
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 2nd choice
--- - -- in 1st group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 1st choice
---- - -- - in 2nd group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail next pattern
----* - -- - * after 2nd group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 2nd choice
----- - -- -- in 2nd group
"abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match last part
------ - -- -- - SUCCESS!
I hesitate to draw the picture of that rule (for fear of adding more
bulk) but feel it may be helpful to clarify:
+- "b" --+ +- "d" --+
/ \ / \
(start)--- "a" -+ +--+ +- "e" ---(finish)
\ / \ /
+- "bc" -+ +- "de" -+
so the tracing laid out (in painful detail ;-) above is just exploring
systematically (left-to-right, top-to-bottom) through the above map
searching for a road from start to finish. Hitting a dead-end just
causes the last decision to be reconsidered.
> What happens to the code in parens?
>
That's my point. If we mix in the possibility of side-effects that
may occur while moving forward, then backtracking is *certainly*
more complicated to automate. One could either:
1) save the state of anything modified by a procedural action, and
restore that state upon backtracking across the action, or
2) explore the declarative pattern looking for a successful path,
and then take/commit all included actions at the end once a
successful path is located.
For those of us who use databases, this amounts to the same kind of
commit/rollback pattern that is used with a database transaction.
All I'm trying to point out is that in a procedural model, the
programmer has to do more work to implement backtracking, and it is
harder to do that if other procedural effects are intertwined within
the pure pattern matching.
> With REs you can do this, mainly because they are just patterns
> (and not grammars with possibly embedded code).
>
Agreed. Identification is done first, and then any actions are taken
after a match has been achieved.
Thanks for the interesting discussion!
-jn-
--
Polonius: ... What do you read, my lord?
Hamlet: Words, words, words.
_Hamlet_, Act II, Scene 2