World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
MichaelB 23-Oct-2005 [529]	I just found out that I can't do the following: s: "a b c" s: "a c b" parse s ["a" to ["b" \| "c"] to end] The two strings should only symbolize that b and c can alternate. But 'to and 'thru don't work with subrules. It's not even stated in the documentation that it should but wouldn't it be natural ? Or am I missing some complication for the parser if it would support this (in the general case indefinite look-ahead necessary for the parser - is this the problem?) ? How are other people doing things like this - what if you want to parse something like "a bla bla bla c" or "a bla bla bla d" if you are interested in the "bla bla bla" which might be arbitrary text and thus can't be put into rules ?
Volker 23-Oct-2005 [530x2]	Carl mentioned performance-problems. Although everyone asks for it.
Volker 23-Oct-2005 [530x2]	i use "a" any[ "b" \| "c" \| skip ] to end Even slower and less elegantly, but works.
MichaelB 23-Oct-2005 [532]	OK, thanks. Didn't know this. But this solution will work for me as well. In a sense this is interesting, as skip isn't a real token, but a command - but it's treated as a token. :-)
Chris 23-Oct-2005 [533x2]	Volker, that will return true for "a" as well as "a b c".
Chris 23-Oct-2005 [533x2]	I tend to use charsets for this, though again, there is probably a performance cost...
Volker 23-Oct-2005 [535x2]	true. never thought about that.
Volker 23-Oct-2005 [535x2]	'some would do half the trick. except if nothing is found, it eats all and counts as true here.
Chris 23-Oct-2005 [537]	non-bc: complement charset "bc" parse "a b c" ["a" any non-bc ["b" \| "c"] to end]
Volker 23-Oct-2005 [538]	but if it is longer, like "be", every "bi" would fail to.
Chris 23-Oct-2005 [539]	In that case, you need to elaborate a little.
Volker 23-Oct-2005 [540x2]	s: "abibet" parse s ["a" any non-bc ["be" \| "ce"] to end]
Volker 23-Oct-2005 [540x2]	More complex than i thought, with that missing thing.
Chris 23-Oct-2005 [542]	Complex indeed.
Volker 23-Oct-2005 [543]	Maybe i used this? parse s ["a" some[ "be" break \| "ce" break \| skip] p: to end] if nothing is found, it skips to the end. returns true, but if you require something after it, that fails (because already at end).
Izkata 23-Oct-2005 [544]	Michael>I just found out that I can't do the following: s: "a b c" s: "a c b" parse s ["a" to ["b" \| "c"] to end]< parse s ["a" [to "b" \| to "c"] to end]
Chris 23-Oct-2005 [545]	Iskata, that breaks if the "c" comes before the "b".
Izkata 23-Oct-2005 [546]	I agree, it should work the other way, too, though..
Chris 23-Oct-2005 [547]	Iz -- d'oh...
Izkata 23-Oct-2005 [548x2]	^.^
Izkata 23-Oct-2005 [548x2]	But isn't that what is wanted? (to ["b" \| "c"])
Chris 23-Oct-2005 [550x3]	V: perhaps better in this case to use 'while and 'find rather than 'parse?
	Izkata: >> non-bc: complement bc: charset "bc" == make bitset! 64#{////////////////8/////////////////////////8=} >> s1: "a b c" == "a b c" >> s2: "a c b" == "a c b" >> parse s1 ["a" [to "b" \| to "c"] mk: to end] mk == "b c" >> parse s2 ["a" [to "b" \| to "c"] mk: to end] mk == "b" >> parse s1 ["a" any non-bc mk: ["b" \| "c"] to end] mk == "b c" >> parse s2 ["a" any non-bc mk: ["b" \| "c"] to end] mk == "c b"
	Note the difference when parsing 's2...
Izkata 23-Oct-2005 [553x2]	ack... well.. it was worth a try =P
Izkata 23-Oct-2005 [553x2]	as you can see, I know some, but am not too strong in parse ^.^
MichaelB 23-Oct-2005 [555]	=image file: images/a picture.gif size: 200x300 caption: some caption below the picture desc: some description for the picture I'm trying to extend Makedoc2 for a project to generate a xml dialect and I need much more information to certain elements - e.g. images - so I'm trying to make it as easy as possible for the user. The above is what I actually wanted to parse - but the order of the information is supposed to be free and I can't and don't want to use rebol datatypes which might be the first thought to make the parsing easier, because normal people don't want to learn too many rules for all these things. So the b and c in the example corresponded more to the caption and desc in the above example.
Volker 23-Oct-2005 [556x2]	So you want to handle both, not only one of them? Something like some[ ( caption: desc: none ) set caption caption-rule \| set desc desc-rule ] ( if all[caption desc][handle-them] )
Volker 23-Oct-2005 [556x2]	No, initalisation before some.. ( caption: desc: none ) some[
MichaelB 23-Oct-2005 [558]	but aren't this only block parsing rules ? (because of set)
Izkata 23-Oct-2005 [559]	I'm gonna try again: >> s: {=image { file: images/a picture.gif { size: 200x300 { caption: some caption below the picture { desc: some description for the picture} == {=image file: images/a picture.gif size: 200x300 caption: some caption below the picture desc: some description for the pictu... >> parse head append s {^/} [ [ some [ [ thru {file: } copy file to {^/} \| [ thru {size: } copy size to {^/} \| [ thru {caption: } copy cap to {^/} \| [ thru {desc: } copy desc to {^/} [ ] [ ]
Volker 23-Oct-2005 [560]	right, mistake. with strings that is copy, not set.
Izkata 23-Oct-2005 [561]	err wait.. then they can't have newline inside the description/caption (x_x)
MichaelB 23-Oct-2005 [562]	ok - have to try this ideas
Volker 23-Oct-2005 [563x3]	IMHO 'to and 'thru are only for simple cases. You need a real bnf. or you can use two parses. the first takes only the lines after image, then a second processes the lines.
	http://polly.rebol.it/test/test/parse-images.r
	updated with pure parse-rule. but better support for such cases would be nice, should not be guru-level.
Graham 23-Oct-2005 [566x2]	Has clean-script been updated for the new version of Core?
Graham 23-Oct-2005 [566x2]	It barfs on data/(...)
Graham 31-Oct-2005 [568]	How to exit a parse rule in the middle and return true ? ( to allow the next rule to be applied ... )
Volker 31-Oct-2005 [569]	'break
Henrik 31-Oct-2005 [570]	interesting... will write that in the wikibook :-)
Volker 31-Oct-2005 [571]	or "end skip". with break the parsed part counts as success. with end skip it counts as failure and backtracks.
Graham 31-Oct-2005 [572x4]	This is part of my scheduler dialect away-days is a block of [ start-date end-date reasons ] current-date is the date I am looking at The syntax is away 25-Dec-2005 on holiday away 25-Dec-2005 away from 25-Dec-2005 to 7-Jan-2006 on "summer holidays" I want to add away every Wednesday at "golf course"
	away-rule: [ 'away [ set awaydate date! (repend away-days [ awaydate awaydate]) \| 'from set awayfrom date! 'to set awayto date! ( repend away-days [ awayfrom awayto ]) \| 'every set day word! ( either day = to-word pick system/locale/days current-date/weekday [ repend away-days [ current-date current-date ] ][ ...break out of rule... ] ) ] ( reason: copy "" ) opt [ [ 'on \| 'at ] set reason [ word! \| string! ]] ( append away-days to-string reason ) ]
	Now if the current-date matches a Wednesday, I am okay. But if not, I want to leave the rule at that point, and move on to the next rule.
	'break can only be used within the parse dialect, so that won't work.
Volker 31-Oct-2005 [576]	the general way: rule: [ ( dummy-rule: [] if not ok? [ dummy-rule: [end skip] ) dummy-rule ]
Graham 31-Oct-2005 [577]	oh ... looks ugly.
Volker 31-Oct-2005 [578]	It is.
older newer	first last