Mailing List Archive: Re: parse parsing algorithm

[REBOL] Re: parse parsing algorithm

From: brian:hawley at: 13-Aug-2001 19:02


At 12:11 AM 8/14/01 +0200, Sander Hahn wrote:
>Does anyone know what parsing algorithm the "parse" buildin uses?

Recursive decent, LL(1), with limited backtracking and embedded
code blocks that allow you to do extremely clever hacks.

>When researching parsing strategies i came across this document:
>Parsing Techniques - A Practical Guide
>http://www.cs.vu.nl/~dick/PTAPG.html (out of print book of +/- 300 pages in
>pdf)

Thanks for the link!

>I have found that this grammar makes Rebol go in an infinite loop (Chapter
>4.2.3 from the book):
>
> >> S: [L #'a' M]
>== [L #'a' M]
> >> L: [L M | none]
>== [L M | none]
> >> M: [M M | none]
>== [M M | none]
> >> parse "a" S

Those rules use left-recursion, more suitable for LR parsing.
Also, both the L and M rules have no recursive fix-point. You
should use right-recursion in your rules, and make sure that
they will eventually hit a token.

Watch out for the backtracking as well. It's very powerful but
you need to plan carefully. In a set of alternate rules, put
the longest ones first, like this:
   [a b | a | none]
Since parse doesn't backtrack through embedded code blocks you
should put those at the end of the alternate, after the point
you are sure you have reached the right choice, like this:
   [a b (print "a b") | a (print "a")]
not like this:
   [a (print "a b") b | a (print "a")]

Other than backtracking, the trickiest part to watch out for
is parse rules that modify their input. When you change the
input in front of the current point of the parse, no problem.
When you change input that you have already passed, you have
to reset the parse location, or parse will get confused, i.e.:
   [to "<" a: thru ">" b: (remove/part a b)]
won't work right, but this will:
   [to "<" a: thru ">" b: (b: remove/part a b) :b]

>But the memory usage on my W2K machine doesn't seem to grow (2.936 K
>constant), no recursion?. Rebol does take 99% of the CPU time though.

Interesting. Table-based parser, maybe?

Brian Hawley