World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Sunanda 6-Nov-2008 [2855]	My suggested improvement to parse would be a trace (or debug) refinement: trace-output-word: copy [] parse/trace string rules trace-output-word I'm not entirely sure how it would work. That would depend in part on how parse works internally, and so what trace points are possible. But, as a minimum, I'd expect it to show me each rule that triggers a match, and the current position of the string being parsed. parse would append trace info to the trace-output word Otherwise, parse is too big a black box for any one other than very patient experts.
Anton 6-Nov-2008 [2856]	Buffered parse - that should be added to the Parse Project DocBase page. It's a big one, though.
Dockimbel 6-Nov-2008 [2857x2]	Streamed parsing with backtracking : sure, it's possible, I'm doing that in postgresql driver since 2001 and more recently in the experimental async mysql driver release last year. (It's not done in a easily reusable way, thought).
Dockimbel 6-Nov-2008 [2857x2]	Tracing parse: IMHO, it would more efficient to add a PARSE mezz for parse rules debugging purpose. (That requires to emulate PARSE command, which is not a difficult task.).
BrianH 6-Nov-2008 [2859x2]	Tracing support would be good to add, but start it and stop it with the TRACE native.
BrianH 6-Nov-2008 [2859x2]	Tomc, parsing an open port has been a wish of mine for years. In theory you could handle backtracking with buffering.
Pekr 6-Nov-2008 [2861]	BrianH: how do you know how much to store in cache and whent o flush it?
Robert 6-Nov-2008 [2862]	Just cache the whole thing ;-)
BrianH 6-Nov-2008 [2863]	Well, when PARSE enters a block it saves the position at the start of that block. If you have to backtrack that is as far back as you would need to cache. PARSE could optimize this by not saving backtracking info unless there is an alternate later on in the block. You could then minimize caching in some cases by rearranging your rules to use as little backtracking as possible, and none at the top level.
Pekr 6-Nov-2008 [2864]	:-) Very usefull for xy MB streamed video :-)
BrianH 6-Nov-2008 [2865]	I was thinking streamed XML, but yes :)
Anton 6-Nov-2008 [2866x2]	Interesting, Brian. I was going to suggest:
Anton 6-Nov-2008 [2866x2]	DISPENSE: Parse command to mark points in the data which don't need backtracking past. Parse can use this information to dispense with older buffer data no longer needed. Otherwise it holds and accumulates the data. This would be used for very large or unbound length data streams. eg. internet radio.
BrianH 6-Nov-2008 [2868]	Also, I was thinking of parsing file ports opened with seek mode.
Pekr 6-Nov-2008 [2869]	exactly ....
Anton 6-Nov-2008 [2870]	Yes, that's another mode, suitable for files (but not internet radio).
Pekr 6-Nov-2008 [2871]	I was thinking about Amiga like datatypes, done in REBOL. Such decoders could be slow though ....
BrianH 6-Nov-2008 [2872]	Interesting, but you wouldn't need DISPENSE if your rules don't have alternates to backtrack to (statically determinable).
Anton 6-Nov-2008 [2873x2]	Yeah, I hadn't thought of that.
Anton 6-Nov-2008 [2873x2]	Perhaps there are cases where alternates are not a good method of determining when to dispense buffer data ?
BrianH 6-Nov-2008 [2875x2]	Of course "statically determinable" means that you wouldn't be able to modify the rule block that PARSE is currently working on (which would likely crash PARSE anyways).
BrianH 6-Nov-2008 [2875x2]	Well, if you have no alternate, you have no backtracking, so you can dispose on the way.
Anton 6-Nov-2008 [2877]	What about REVERSE ?
BrianH 6-Nov-2008 [2878]	Ah, that would require buffering. Darn.
Anton 6-Nov-2008 [2879]	And also set-words..
BrianH 6-Nov-2008 [2880]	We're back to Robert's "cache the whole thing" :(
Anton 6-Nov-2008 [2881]	and Anton's DISPENSE when you know you ain't goin' backwards from here.
BrianH 6-Nov-2008 [2882x2]	And crashing PARSE if you DISPENSE something that you need to go back to. That might be better done with incremental parsing.
BrianH 6-Nov-2008 [2882x2]	Like this: parse port rule1 ; cache gone parse port rule2 ; picks up where the previous left off
Anton 6-Nov-2008 [2884x3]	Yes, I was just thinking what would happen if you var:, DISPENSE, then :var afterwards. Should DISPENSE update the index of var (and any other vars) when the internal parse index shortens ?
	Does your incremental parse example assume that there is enough data in the cache to complete each rule ?
	Maybe we should study Doc's postgresql driver :)
BrianH 6-Nov-2008 [2887x4]	Oh, set-words and get-words would not work the same with R3 ports. You wouldn't be able to use them the same way in the code blocks for instance. This is because for R3 ports the position is a property of the port rather than the port reference, so those set-words would be setting the word to the numeric index of the current position of that port, and the get-word would correspond to seeking to that index. In the code blocks the words would only contain integer! values - you would need a separate reference to the port for them to make sense.
	The new port model would make PARSE on ports completely different. You would only be able to parse seekable ports if you want to use set-words and get-words, and you might just be able to rely on the internal port caching. This might be easier than we thought.
	In theory you could even do something like block parsing on event ports, like SAX pull. Same seekable restrictions apply - no backtracking or position setting or getting unless the port supports seeking.
	That would shunt the cache management into the port scheme :)
Anton 6-Nov-2008 [2891x2]	Ah, that makes sense. My model of how parse would handle ports was wrong. I was assuming it would work just like string parse, except working on a limited buffer, supplied by the port.
Anton 6-Nov-2008 [2891x2]	Block parsing ? How are you going to do that when you can't even see the final ']' in the buffer yet ?
BrianH 6-Nov-2008 [2893x2]	With seekable ports the buffering is handled by the ports, rather than provided by them. I wonder if there will be cache control APIs :)
BrianH 6-Nov-2008 [2893x2]	By "something like block parsing", I mean ports that return other REBOL values than bytes or characters can be parsed as if the values were contained in a block and being parsed there. Any buffering of these values would be handled by the port scheme code. Only whole REBOL values would be returned by such ports, so any inner blocks returned would be parsed by INTO as actual blocks.
Anton 6-Nov-2008 [2895]	Hmm.. that could work. I suppose the outermost block that usually encompasses loaded rebol data would have to be "ignored".
BrianH 6-Nov-2008 [2896x2]	No, it would be virtual :)
BrianH 6-Nov-2008 [2896x2]	Actually, there are no [ and ] in REBOL blocks once they are loaded. Block parse works on data structures.
Anton 6-Nov-2008 [2898]	'Virtual' is the right word.
Pekr 6-Nov-2008 [2899]	I thought along the Anton's thoughts - that it would work like parsing a string, using some limited buffer ...
BrianH 6-Nov-2008 [2900x2]	Ports don't work like series in R3. If anything, port PARSE would simplify port handling by making seekable ports act more like series.
BrianH 6-Nov-2008 [2900x2]	I gotta suggest this to Carl :)
Anton 6-Nov-2008 [2902]	At least if you could add "3.12 port parsing" to the Parse_Project page... :)
Pekr 6-Nov-2008 [2903]	OTOH - I never did some binary format parsing. Oldes has some experience here IIRC. Dunno how encoders/decoders will be built, maybe those will be in native C code anyway ...
Tomc 6-Nov-2008 [2904]	the potential for backtracking is initiated by setting a placeholder i.e. :here caching only as far back as the earliest current placeholder may be sufficent
older newer	first last