Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: parse parsing algorithm

From: brian:hawley at: 13-Aug-2001 19:02

At 12:11 AM 8/14/01 +0200, Sander Hahn wrote:
>Does anyone know what parsing algorithm the "parse" buildin uses?
Recursive decent, LL(1), with limited backtracking and embedded code blocks that allow you to do extremely clever hacks.
>When researching parsing strategies i came across this document: >Parsing Techniques - A Practical Guide > (out of print book of +/- 300 pages in >pdf)
Thanks for the link!
>I have found that this grammar makes Rebol go in an infinite loop (Chapter >4.2.3 from the book): > > >> S: [L #'a' M] >== [L #'a' M] > >> L: [L M | none] >== [L M | none] > >> M: [M M | none] >== [M M | none] > >> parse "a" S
Those rules use left-recursion, more suitable for LR parsing. Also, both the L and M rules have no recursive fix-point. You should use right-recursion in your rules, and make sure that they will eventually hit a token. Watch out for the backtracking as well. It's very powerful but you need to plan carefully. In a set of alternate rules, put the longest ones first, like this: [a b | a | none] Since parse doesn't backtrack through embedded code blocks you should put those at the end of the alternate, after the point you are sure you have reached the right choice, like this: [a b (print "a b") | a (print "a")] not like this: [a (print "a b") b | a (print "a")] Other than backtracking, the trickiest part to watch out for is parse rules that modify their input. When you change the input in front of the current point of the parse, no problem. When you change input that you have already passed, you have to reset the parse location, or parse will get confused, i.e.: [to "<" a: thru ">" b: (remove/part a b)] won't work right, but this will: [to "<" a: thru ">" b: (b: remove/part a b) :b]
>But the memory usage on my W2K machine doesn't seem to grow (2.936 K >constant), no recursion?. Rebol does take 99% of the CPU time though.
Interesting. Table-based parser, maybe? Brian Hawley