parse parsing algorithm

[1/2] from: shahn::cs::vu::nl at: 14-Aug-2001 0:11

Hello Everyone! I am new in the list and i hope this question wasn't asked before. Does anyone know what parsing algorithm the "parse" buildin uses? I really like the much more powerful context free parsing that Rebol uses instead of regular expressions used by Perl / PHP / Python / etc. I am trying to build something simular in another language (Python), just for fun. When researching parsing strategies i came across this document: Parsing Techniques - A Practical Guide http://www.cs.vu.nl/~dick/PTAPG.html (out of print book of +/- 300 pages in pdf) Does Rebol use the "Recursive Descent" Parsing Method? This method has the most direct mapping between the grammar and the parse method. Does one of you know more? --- I have found that this grammar makes Rebol go in an infinite loop (Chapter 4.2.3 from the book):

>> S: [L #'a' M]

== [L #'a' M]

>> L: [L M | none]

== [L M | none]

>> M: [M M | none]

== [M M | none]

>> parse "a" S

But the memory usage on my W2K machine doesn't seem to grow (2.936 K constant), no recursion?. Rebol does take 99% of the CPU time though. With kind regards, Sander Hahn [shahn--cs--vu--nl]

[2/2] from: brian:hawley at: 13-Aug-2001 19:02

At 12:11 AM 8/14/01 +0200, Sander Hahn wrote:

>Does anyone know what parsing algorithm the "parse" buildin uses?

Recursive decent, LL(1), with limited backtracking and embedded code blocks that allow you to do extremely clever hacks.

>When researching parsing strategies i came across this document: >Parsing Techniques - A Practical Guide >http://www.cs.vu.nl/~dick/PTAPG.html (out of print book of +/- 300 pages in >pdf)

Thanks for the link!

>I have found that this grammar makes Rebol go in an infinite loop (Chapter >4.2.3 from the book):

<<quoted lines omitted: 5>>

>== [M M | none] > >> parse "a" S

Those rules use left-recursion, more suitable for LR parsing. Also, both the L and M rules have no recursive fix-point. You should use right-recursion in your rules, and make sure that they will eventually hit a token. Watch out for the backtracking as well. It's very powerful but you need to plan carefully. In a set of alternate rules, put the longest ones first, like this: [a b | a | none] Since parse doesn't backtrack through embedded code blocks you should put those at the end of the alternate, after the point you are sure you have reached the right choice, like this: [a b (print "a b") | a (print "a")] not like this: [a (print "a b") b | a (print "a")] Other than backtracking, the trickiest part to watch out for is parse rules that modify their input. When you change the input in front of the current point of the parse, no problem. When you change input that you have already passed, you have to reset the parse location, or parse will get confused, i.e.: [to "<" a: thru ">" b: (remove/part a b)] won't work right, but this will: [to "<" a: thru ">" b: (b: remove/part a b) :b]

>But the memory usage on my W2K machine doesn't seem to grow (2.936 K >constant), no recursion?. Rebol does take 99% of the CPU time though.

Interesting. Table-based parser, maybe? Brian Hawley

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted