Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Parse versus Regular Expressions

From: joel:neely:fedex at: 5-Apr-2003 8:40

Hi, Gabriele, Before we get too far down this road (and have to backtrack ;-), let me state clearly that I'm not agitating for PARSE to be redesigned. I am simply suggesting that each paradigm has its own "sweet spot". I've always found that laying alternative strategies down side-by- side and figuring out exactly where/why/how they differ has helped me use both more effectively. That's my intent in this conversation. Now let me try to reply to your questions/objections. Gabriele Santilli wrote:
> JN> The specific example at hand required the programmer to take > JN> action to make the first alternative fail before PARSE would > JN> consider the second. It would be nice if failure at the end > JN> of the *entire* rule also caused backtracking so the programmer > JN> didn't have to do any extra work. > > That would mean that we can no more use subblocks for parsing. > > I.e., if PARSE worked as you suggest, how would one write this? > > rule1: ["a" | "b"] > rule2: ["c" rule1 "d"] > > On "cad", based on your suggestion, rule1 would not reach the end > of the string and thus backtrack. >
By "*entire* rule" I meant the top-level one invoked by PARSE itself. In other words, when failure occurs, try a different alternative. Under this view, saying parse "cad" rule2 would be equivalent to saying Find a path from start to finish through (start)--- "c" ---+--- "a" ---+--- "d" ---(finish) \ / '- "b" -' that spells "cad". This doesn't require any backtracking over RULE1, since failure only occurs when there is no viable alternative for the present position (and REBOL defines being at the end of the rule with more input remaining to be matched as not viable). Another way of saying this is that your RULE1 and RULE2 could be regarded (under a pure, declarative, backtrackable model) as being a little grammer that matches/generates the set of strings that are constructed by following all possible paths from start to finish: "cad" "cbd" in this case.
> Notice that you can't backtrack the whole rule because it can be > composed of a complex network of subrules, each one having '| > alternatives. Which one should backtrack? >
The standard answer is "the most recent". For example, nested rules such as ["a" ["b" | "bc"] ["d" | "de"] "f"] that contain more than one decision point can still be "explored" in a systematic way. At the risk of the illustration being more bulky than I'd like, we can trace the attempt to match a string to the above rule using underlining to show which portions of the string and rule have been matched: String Rule Comment "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] just beginning "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match, no choice - - "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 1st choice -- - - in 1st group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail 1st choice --* - - * in 2nd group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail 2nd choice --* - - * in 2nd group which point there are no alternatives left, so back up and reconsider the previous decision... "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 2nd choice --- - -- in 1st group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 1st choice ---- - -- - in 2nd group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] fail next pattern ----* - -- - * after 2nd group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match 2nd choice ----- - -- -- in 2nd group "abcdef" ["a" ["b" | "bc"] ["d" | "de"] "f"] match last part ------ - -- -- - SUCCESS! I hesitate to draw the picture of that rule (for fear of adding more bulk) but feel it may be helpful to clarify: +- "b" --+ +- "d" --+ / \ / \ (start)--- "a" -+ +--+ +- "e" ---(finish) \ / \ / +- "bc" -+ +- "de" -+ so the tracing laid out (in painful detail ;-) above is just exploring systematically (left-to-right, top-to-bottom) through the above map searching for a road from start to finish. Hitting a dead-end just causes the last decision to be reconsidered.
> What happens to the code in parens? >
That's my point. If we mix in the possibility of side-effects that may occur while moving forward, then backtracking is *certainly* more complicated to automate. One could either: 1) save the state of anything modified by a procedural action, and restore that state upon backtracking across the action, or 2) explore the declarative pattern looking for a successful path, and then take/commit all included actions at the end once a successful path is located. For those of us who use databases, this amounts to the same kind of commit/rollback pattern that is used with a database transaction. All I'm trying to point out is that in a procedural model, the programmer has to do more work to implement backtracking, and it is harder to do that if other procedural effects are intertwined within the pure pattern matching.
> With REs you can do this, mainly because they are just patterns > (and not grammars with possibly embedded code). >
Agreed. Identification is done first, and then any actions are taken after a match has been achieved. Thanks for the interesting discussion! -jn- -- Polonius: ... What do you read, my lord? Hamlet: Words, words, words. _Hamlet_, Act II, Scene 2