Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parse versus Regular Expressions

From: lmecir:mbox:vol:cz at: 4-Apr-2003 22:14

Hi Joel,
> Ladislav Mecir wrote: > > > > I've heard, that PARSE was more general, than RE, although I > > didn't know why... > > > > I've yet to see any demonstration of that claim.
As I mentioned here, it is: z: [#a z #b | none] , which is a rule, that cannot be written using Regular Expressions. ...
> 1) It makes the PARSE dialect a strange hybrid between declarative > and procedural descriptions (as mentioned in my earlier post).
... Yes, PARSE dialect looks procedural. Nevertheless, all the "language stuff" is a hybrid between procedural and declarative descriptions: e.g. Regular Expressions and FSM's describe the same - Regular Languages. Similarly Grammars (more declarative) and Turing Machines (procedural) describe the same languages.
> 2) It totally breaks normal usage that "or" is symmetrical, even > though we read the | as "or".
In the Regular Languages article I am trying to read has been stated, that the Regular Languages can be described using the following minimal set of operator symbols used in addition to the usual characters of the base alphabet: 0 - an empty language symbol. The meaning of the symbol is, that no word is compatible with it. Analogical PARSE rule: [end skip] 1 - an empty word symbol. The meaning of the symbol is, that an empty word is compatible with it. Analogical PARSE rule: [] + - the addition operator. The meaning of it is, that a word is compatible with the A+B rule, if it is compatible with either the A or the B rule. This operator is symmetrical as opposed to analogical '| in a PARSE rule. . - the concatenation operator. While this operator looks like multiplication, it isn't symmetrical. The meaning: a word is compatible with the A.B rule, if it can be written as a concatenation of two words, where the first one is compatible with the A rule and the second one is compatible with the B rule. Analogical PARSE rule: [a b] * - the repetition operator. This is a unary operator. The meaning: a word is compatible with A*, if it is consists of any number of words, each of which is compatible with A. Analogical PARSE rule: [any a] ( ) - the parentheses to specify the priority of operators. The meaning: (A+B).C differs from A+(B.C) as usual. We can use square brackets in PARSE analogically. Why am I writing it here? These declarative rules have got only one symmetrical operator. I don't think, that many Rebol users miss its symmetry. The shape | of the analogical PARSE symbol is pointing at the difference. If somebody doesn't like the fact, that the word "or" hints, that it should be symmetrical, then he can use a different word "else", which is more procedural, less symmetrical and it is a better match. If you really want a symmetrical OR parse rule, it can be programmed: or-rule: func [ {Generate an A or B symmetrical PARSE rule} a b ] [ use [f g ma mb a' b' s a'' b''] copy/deep [ a': a b': b a'': [a' f: (ma: true)] b'': [b' g: (mb: true)] [ f: g: s: (ma: mb: false) opt a'' :s opt b'' ( if not ma [f: g] if not mb [g: f] f: either (index? f) <= (index? g) [f] [g] ma: either ma or mb [[:f]] [[end skip]] ) ma ] ] ] A test: oab: or-rule "a" "ab" oba: or-rule "ab" "a" parse "ab" oab ; == false parse "ab" oba ; == false parse "a" oab ; == true parse "a" oba ; == true Summary: PARSE rules are quite intuitive procedural description. Exceptions: 1) The stack overflow shall be handled as an exception, because it is highly unprobable, that PARSE will be able to produce a "reliable" result. 2) A simple thing, that "bothers" me is: parse "aa" [["a"] (print "match")] Every beginner reads it as follows: the rule ["a"] is matched against the string and the match is found. It is hard to "explain" to a beginner, why (despite of the match) the rule fails. 3) The fact, that "past tail positions mean failure" is an unintuitive property. It is even related to some ugly crashes. 4) A Garbage Collector bug causes PARSE crashes. 5) The TO and THRU words in PARSE should accept any rules. 6) There can be a NOT PARSE word. 7) There should be a LIT PARSE word for block parsing. 8) PARSE should behave like the normal natives with respect to RETURN and THROW. Regards -Ladislav