World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Geomol 1-May-2011 [5794]	Having skip as a keyword mean, you can't use that word as a variable.
BrianH 1-May-2011 [5795]	That doesn't work with string parsing.
Geomol 1-May-2011 [5796]	ok
BrianH 1-May-2011 [5797]	Most people tend to not use 'skip as a variable anyways, because of the SKIP function.
Geomol 1-May-2011 [5798]	I in general very much like the idea, that many rebol functions can take different datatypes and work anyway. But I was thinking, if parsing blocks and parsing strings is so different, that it should be two functions?
Maxim 1-May-2011 [5799x2]	and I always prefix my rules to have them stand out from keywords.
Maxim 1-May-2011 [5799x2]	nah, it would just use up another word. there is no ambiguity in the case of parse, as lets say ADD. where the same datatype may mean two things.
BrianH 1-May-2011 [5801x2]	For the mezzanine version, two functions might be better, though they can share code in the same module. Maybe just have one exported word for a dispatch function though.
BrianH 1-May-2011 [5801x2]	(or the context equivalent of modules for R2)
Geomol 1-May-2011 [5803x2]	yes
Geomol 1-May-2011 [5803x2]	When programming it, I also wondered, why the or keyword is \| and not OR. Do you know the reason?
BrianH 1-May-2011 [5805]	Parsing tradition. And it's not really OR, it's backtracking alternation.
Geomol 1-May-2011 [5806]	Right, just wondered, now rebol call e.g. floats for decimals etc. many attempts to make the language more humane.
BrianH 1-May-2011 [5807]	Considering that the space character is the closest thing to AND if \| is OR, we should consider ourselves to have gotten off lucky :)
Geomol 1-May-2011 [5808]	parse [a b c] ['aAND'bAND'cEND] hmm, yeah, you've got a point.
BrianH 1-May-2011 [5809]	We used up that luck though when we called the lookahead-match operation AND, and the lookahead-non-match operation NOT.
Geomol 1-May-2011 [5810]	& and ! maybe?
BrianH 1-May-2011 [5811]	We're probably fine with the wording we got. Though strangely enough, \| is the ELSE of the IF operation. ELSE is a more descriptive name for \| than OR in general.
Ladislav 1-May-2011 [5812]	Geomol: [to rule skip] does not mean the same as [thru rule] , as can be demonstrated when comparing the behaviour of thru rule for rule = "abc" It is quite a surprise for me, that you don't see the difference.
Geomol 2-May-2011 [5813]	In R2 parsing a block: >> parse ["abc"] [to "abc" skip] == true >> parse ["abc"] [thru "abc"] == true I know, it's different when parsing a string instead of a block. My comparison of [thru rule] to the alternatives was meant as a loose comparison, not to be taken literally. So it's easy to think of THRU to work this way, because it does in many cases, therefore the confusion.
Ladislav 2-May-2011 [5814]	because it does in many cases - should rather be "because THRU is so limited, that it is unable to handle many cases"
Geomol 2-May-2011 [5815]	yeah :)
Ladislav 2-May-2011 [5816]	But, the recursive description: a: [b \| skip a] is quite natural.
Geomol 2-May-2011 [5817]	Yes, and that should work in all cases, if the b rule is found, complex or not. And this will return true, if b is END, because END is a repeatable rule (you can't go past it with SKIP). NONE is also repeatable, and if you look in the code, I have to take care of this too separately. This mean, we can't parse none of datatype none! by using the NONE keyword, but we can using a datatype: >> parse reduce [none] [none] == false >> parse reduce [none] [none!] == true So it raises the question, if the NONE keyword should be there? What are the consequences, if we drop NONE as a keyword? And are there other repeatable rules beside END and NONE? In R2 or R3.
Ladislav 2-May-2011 [5818]	The "empty string rule" (represented by the NONE keyword in REBOL) is absolutely necessary to have. All other members of the Top Down Parsing Language family have it as well.
Geomol 2-May-2011 [5819]	Ok, what is a good source of information to read about parsing in general? The Top Down Parsing Language family etc.?
Ladislav 2-May-2011 [5820]	You can find something in the Wikipedia: http://en.wikipedia.org/wiki/Parsing_expression_grammar� http://en.wikipedia.org/wiki/Top-down_parsing_language
Geomol 2-May-2011 [5821]	Is the "empty string rule" covered by butting a \| without anything after it? Like in: >> parse [] ['a \|] == true >> parse [] ['a \| none] == true
Ladislav 2-May-2011 [5822]	Hmm, as it looks, we could do without the empty string, we could use the rule like: empty: []
Geomol 2-May-2011 [5823]	It could be interesting to creat an absolutely minimal PARSE function, that can handle all we expect from such a function but with as little code as possible (as few keywords as possible).
Ladislav 2-May-2011 [5824x2]	For strings, the empty: "" should work as well, but it does not.
Ladislav 2-May-2011 [5824x2]	Another variant that comes to mind is empty: quote ()
Geomol 2-May-2011 [5826]	From your idioms it can also be seen, that OPT can be dropped easily.
Ladislav 2-May-2011 [5827]	BTW (looks a unlucky to me), do you know, that in REBOL the NONE rule can fail?
Geomol 2-May-2011 [5828]	Can't remember. Give me an example.
Ladislav 2-May-2011 [5829]	Nevermind, I do not remember. The NONE rule is described in the wikibook, so it can be found in there, I guess.
Geomol 2-May-2011 [5830]	Maybe the last section here: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Troubleshooting
Ladislav 2-May-2011 [5831x3]	That is not related
	Nevertheless, I messed it up. The NONE rule probably cannot fail, but it can consume some input.
	(which does not look good as well)
Geomol 2-May-2011 [5834]	With bparse, this hangs: bparse [a b c] [some [none]] but it can be stopped by hitting <Esc>.
Ladislav 2-May-2011 [5835x2]	Yes, but that is OK, it is just an infinite cycle
Ladislav 2-May-2011 [5835x2]	Nobody should expect an infinite cycle to stop.
Geomol 2-May-2011 [5837x3]	It can't be stopped using PARSE, it seems.
	In parse, NONE is a keyword unless it comes after TO or THRU, then it's looked up. >> parse [#[none!]] [none] ; as a keyword == false >> parse [#[none!]] [thru none] ; looked up == true Same behaviour in R2 and R3.
	Maybe it would be a good idea to make all these combination trigger an invalid argument error? any end some end opt end into end set end ... copy end ... thru end and then only let to end be valid.
BrianH 2-May-2011 [5840x2]	[set var end] sets the var to none; [copy var end] sets to none in R2, the empty string/block in R3; [thru end] doesn't match, so it should just get a warning in case the rules were written to expect that; [opt end] is definitely legit; perhaps [any end] and [some end] should get warnings for R2, but keep in mind that rules like [any [end]] and [some [end]] are much more common, have the same effect, and are more difficult to detect; [into end] properly trigers an error in R2 and R3 because the end is not in a block, while [into [end]] is legit and safe.
BrianH 2-May-2011 [5840x2]	So you want to allow COPY, SET and OPT. Warn about THRU (because of the bug), ANY and SOME, because of R3 compatibility. Trigger an error for INTO if its argument rule isn't a block or a word referring to a block, but nothing special if that rule is END.
Geomol 4-May-2011 [5842x2]	[any end]�and [some end] As we don't have warnings, I suggest these to produce errors. They can produce endless loops, and that should be pointed out in the docs, if they don't produce errors. [opt end] Yes, it's legit, but what's the point of this combination? At best, the programmer knows, what she does, and the combination will do nothing else than slowing the program down. At worst, the programmer misinterpret this combination, and since it doesn't produce an error or anything, it's a source of confusion. I suggest to make it produce an error. [into end] Produces an error today, so fine. [set end ...] and [copy end ...] I wasn't thinking of [set var end], but about setting a var named end to something, like [set end integer!]. Problem with this is, that now the var, end, can be used and looks exactly like the keyword, end, maybe leading to confusion. But after a second thought, maybe this being allowed is ok. [thru end] Making this produce an error will solve the problem with the confusion around, what this combination mean. And in the first place, it's a bad way to produce a 'fail' rule (in R2, in R3 it has the value true, and parsing continues). It's slow compared to e.g. [end skip].
Geomol 4-May-2011 [5842x2]	These are just suggestions to make a better PARSE. I've learnt, it's a good idea to not allow most combinations of keywords in R2 parse. Another example: >> parse [] [opt into ['a]] == false >> bparse [] [opt into ['a]] ** User Error: Invalid argument: into The PARSE result is wrong, as I see it. My BPARSE produce an error. Better?
older newer	first last