Mailing List Archive: Re: Parse versus Regular Expressions

[REBOL] Re: Parse versus Regular Expressions

From: joel:neely:fedex at: 3-Apr-2003 6:22


Hi, Ladislav,

Ladislav Mecir wrote:
> I've heard, that PARSE was more general, than RE, although I
> didn't know why...
>

I've yet to see any demonstration of that claim.  My perspective is
that each one has some things that it expresses conveniently, and
those two sets overlap (but are not identical). However, convenience
is not the same as generality.

PARSE seems to partake of the same issue as Prolog; its seems to be
implemented in a way that makes it a hybrid between declarative and
procedural programming, with the result that sometimes things have
side effects or behaviors that can be difficult to figure out.

> Noticeably, the "dumb" approach to the equation rewriting:
>
>     y: [[y #"a"] | #"b"]
>
> didn't "fire" any error, stack overflow, or any unusual exception.
> Nevertheless, it didn't work:
>
>     parse "b" y ; == true
>     parse "ba" y ; == false
>
> Any ideas?
>

    >> trace on  parse "ba" y  trace off
    Result: (unset)
    Trace:  parse (word)
    Trace:  "ba" (string)
    Trace:  y (word)
    Match: [y #"a"]
    Input: ba
    Match: [[y #"a"] | #"b"]
    Input: ba
    Match: [y #"a"]
    Input: ba
    Match: [[y #"a"] | #"b"]
    Input: ba
    Match: [y #"a"]

... after many more repetitions, followed by

    Match: [y #"a"]
    Input: ba
    Match: [[y #"a"] | #"b"]
    Input: ba
    Match: [y #"a"]
    Input: ba
    recover: #"b"
    Match: #"b"
    Input: ba
    Match: #"a"
    Input: aa
    Match: #"a"
    Input: aa
    recover: #"b"

... and many more repetitions of the last cycle, then

    Input: aa
    Match: #"a"
    Input: aa
    recover: #"b"
    Match: #"b"
    Input: ba
    Result: false (logic)
    Trace:  trace (word)
    Trace:  off (word)
    >>

Which leads to a couple of observations:

1)  The seemingly equivalent

        >> y: [#"b" | [y #"a"]]
        == [#"b" | [y #"a"]]

    doesn't work either.

        >> parse "b" y ; == true
        == true
        >> parse "ba" y ; == false
        == false
        >> trace on  parse "ba" y  trace off
        Result: (unset)
        Trace:  parse (word)
        Trace:  "ba" (string)
        Trace:  y (word)
        Match: #"b"
        Input: ba
        Result: false (logic)
        Trace:  trace (word)
        Trace:  off (word)

    which leads to the suspicion that backtracking in the case of
    failure (a standard technique in RE) isn't supported so well
    in (the current implementation of) PARSE .  IOW, an imaginary
    solution would traverse the tree below, allowing each use of
    Y to start a fresh backtrackable branch:

        y --+-- #"b"       failure (target string not exausted)
            |
            +-- [y #"a"]   alternative, expand Y
                 |
                 +-- #"b"  success, continue most recent alternative
                 |
                 +-- #"a"  success, end reached

2)  The fact that the nested occurrence of Y is taken as an
    imperative command instead of an alternative to be reconsidered
    later if necessary seems to support the PARSE/PROLOG analogy
    above.

3)  The trace of the earlier definition appears to descend and then
    "recover" without trying all possible alternatives.  The limit
    on this cycling is not documented AFAIK.  It would be useful to
    know about such limits.

-jn-

--
Polonius: ... What do you read, my lord?
Hamlet:       Words, words, words.
                                               _Hamlet_, Act II, Scene 2