World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
BrianH 28-May-2007 [1885x2]	I'm more concerned about the first and follow sets - doing that wrong could mean a slowdown of orders of magnitude.
BrianH 28-May-2007 [1885x2]	Actually, big O slowdowns.
Rebolek 28-May-2007 [1887]	BrianH: "How does your reverse code handle "aa+", or "aa+a"" - damn... :) Anyway, today I was thinking about different way how to do it, had no time to code it yet. It's slow, I know...
BrianH 28-May-2007 [1888x2]	Are you familiar with the theories behind parser generators? That is what you are doing. I studied the theories in college - hence the talks about first and follow sets.
BrianH 28-May-2007 [1888x2]	I've been thinking about ways to do this too - you got me going. I was also thinking of a way to cache charsets for later reuse.
Maxim 28-May-2007 [1890x3]	didn't you guys try using a charset? I would guess its an order of magnitude faster still... no?
	well, checking with one letter didn't give any changes (bitset and char seem the same) but just change that with an OR of two letter and the char goes to 4 times slower and the bitset stays exactly the same... so do use bitsets as often as you can.
	>> t0: now/time/precise loop 10 [parse data [some #"a"]] now/time/precise - t0 == 0:00:06.389 >> t0: now/time/precise loop 10 [parse data [some [#"b" \| #"a"]]] now/time/precise - t0 == 0:00:25.496 >> b: charset "ab" >> t0: now/time/precise loop 10 [parse data [some b]] now/time/precise - t0 == 0:00:06.379
Rebolek 29-May-2007 [1893]	I'm not sure how can I test for end in case like this: r: ["a" "b" "c" end] parse "abc" [some [r/1 (r: next r)]] anybody?
Oldes 29-May-2007 [1894]	r: ["a" "b" "c" break] parse "abc" [some [r/1 (r: next r)]]
Rebolek 29-May-2007 [1895]	great, thanks
Oldes 29-May-2007 [1896]	Here are C sources for Regex engine used in MySQL5.0.9 http://leithal.cool-tools.co.uk/sourcedoc/mysql509/html/dir_000079.html
Rebolek 29-May-2007 [1897x5]	So I left tail-parse becuase of the "aa+" problem mentioned by BrianH. I wrote 'lazy-parse that works different and it seems to handle both cases "a+a" and "aa+" well.
	>> lazy-parse "aaa" [[#"a"][some #"a"][#"a"]] == true >> lazy-parse "aaa" [[#"a"][some #"a"]] == true >> lazy-parse "aaa" [[some #"a"][#"a"]]
	lazy-parse: func [string rules /local parsed?][ ;does not support ANY yet (wrong results) parsed?: false rules: copy rules parse string [ some [ b: rules/1 e: ( parsed?: true rules: next rules if empty? e [ either empty? rules [ parsed?: true rules: copy [break] ] [ e: next b if empty? e [parsed?: false] ] ] ) :e ] ] parsed? ]
	Warning: this does not suport ANY rule yet.
	Grrr, I should make more test before posting...forget last four posts
Rebolek 31-May-2007 [1902]	'regset is now on rebol.org, rest of parser is still in works.
btiffin 31-May-2007 [1903]	Rebolek...You posted the magic prime 797'th Script. Well Done! :)
Rebolek 31-May-2007 [1904]	Hm thanks, but I was aiming for 800th one... ;)
btiffin 31-May-2007 [1905]	Yep...I've been waiting... :)
Rebolek 7-Jun-2007 [1906]	just a quick idea: FORALL is implemented as mezzanine function. It calls FORSKIP which is mezzanine also. As you can see, it's probably not the fastest method. So here's the idea. Cannnot be FORALL rewritten to make it faster and is it possible to use PARSE to do this? So I tried and came up with this simple function: parall: func [ 'word body /local data ][ data: get :word parse data compose/deep [ some [(to set-word! word) any-type! (to paren! [do bind body :word])] ] ] (parall is just a shortcut for parse version of forall). this is very simple function written in five minutes and not very well checked, it needs some additional work (eg. it does not return same value as 'forall etc). So let's do some test (using Ladislav's %timblk.r): >> n: copy [] repeat i 100 [append n i] == [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4... >> time-block [forall n [b: n/1 + 1]] 0.05 == 3.623046875E-4 >> time-block [parall n [b: n/1 + 1]] 0.05 == 3.814697265625E-6 >> 3.62e-4 / 3.81e-6 == 95.0131233595801 95x faster? whooo.... and what about bigger block? >> n: copy [] repeat i 10000 [append n i] == [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4... >> time-block [forall n [b: n/1 + 1]] 0.05 == 3.540625E-2 >> time-block [parall n [b: n/1 + 1]] 0.05 == 3.7994384765625E-6 >> 3.54e-2 / 3.8e-6 == 9315.78947368421 9000x ? omg... comments?
Gabriele 7-Jun-2007 [1907]	the devil is in the details... you need to handle break :)
Rebolek 7-Jun-2007 [1908]	oh yes, break...didn't use it for years, so almost completely forget about it :)
BrianH 7-Jun-2007 [1909x4]	Don't forget continue.
	First tip: You don't have to bind the body - it just uses the existing binding of the word.
	Netx tip: What native does forskip decompose down to? Try using that first.
	; Try against this, the forskip code with the skip part taken out forall: func [ "Evaluates a block for every value in a series." [catch throw] 'word [word!] {Word set to each position in series and changed as a result} body [block!] "Block to evaluate each time" /local orig result ][ if not any [ series? get word port? get word ] [throw make error! {forall expected word argument to refer to a series or port!}] orig: get word while [any [not tail? get word (set word orig false)]] [ set/any 'result do body set word next get word get/any 'result ] ]
Rebolek 7-Jun-2007 [1913x2]	BrianH: I know it can be done using while loop in forskip, it was more an experiment on using parse as foreach/forall kind of function.
Rebolek 7-Jun-2007 [1913x2]	to see how fast it can be
BrianH 7-Jun-2007 [1915x3]	Well, if you can figure out how to make it handle break and continue, let us know.
	Or for that matter, ports.
	If you want to test the speed of parse, replace the any-type! with a skip - the forall you are comparing it to doesn't do that test.
Chris 7-Jun-2007 [1918]	What is 'continue?
BrianH 7-Jun-2007 [1919]	Sort of like the opposite of break. It breaks one iteration of a loop and continues at the top of the next iteration. Many languages have it, as does REBOL 3.
Gabriele 8-Jun-2007 [1920]	notice that, forall was initially done that way, and converted to use forskip for simplicity (ie. avoiding having the same code twice in the source)
Rebolek 8-Jun-2007 [1921]	I tried to enclose parse in loop 1 [] and it seems to handle break. I guess you'll probably prove me wrong, Brian :) parall: func [ 'word body /loc data ][ loop 1 [ data: get :word parse data compose/deep [ some [(to set-word! word) skip (to paren! [do body])] ] ] ] re: continue - this is not r3 ;) >> n: [1 2 3 4 5] parall n [if n/1 = 4 [break/return "break"] if n/1 > 4 [print "bad"]] == "break"
Gabriele 8-Jun-2007 [1922x4]	good :) now... if Ladislav was here he'd probably find at least ten wrong things in that function. ;)
	you need to add [throw] for example, and return the last return value of the body
	but... as long as we're not replacing forall... this could be a very fast alternative to use in cases when you don't need all the features...
	and actually... i'd call parse directly in that case ;)
Rebolek 8-Jun-2007 [1926]	forall replacement not...an experiment in using parse for maybe not obvious purposes - yes :)
Volker 8-Jun-2007 [1927x2]	You could make something new. like this: parall [p: set x integer! set y integer!] [?? p ?? x ?? y]
Volker 8-Jun-2007 [1927x2]	parall [p: set x integer! set y integer!] DATA [?? p ?? x ?? y] ;..
Oldes 10-Jun-2007 [1929]	To be able escape from infinite loops use () somewhere in the parse rules as for example: >> parse "abc" [any [()]] * YOU CAN PRESS [ESC] NOW TO STOP THE LOOP * >> parse "abc" [any []] * HANGS REBOL *
BrianH 11-Jun-2007 [1930]	Cool!
PeterWood 11-Jun-2007 [1931]	Oldes: does using () inside a parse loop slow things down? I'm wary of () as they do seem to slow the interpreter down in many cases.
Rebolek 11-Jun-2007 [1932]	Peter it does slow interpreter down, it's better to use this trick when designing your rules. If you're happy with them, remove the parens to make your rule faster.
Steeve 27-Jun-2007 [1933x2]	What about a new word �all� which would allow to parse several successives conditions whatever their order ?
Steeve 27-Jun-2007 [1933x2]	parse [a: 1 b: 2] [all ['a: integer! \| 'b: integer!]] would be true only if it exists successivly [a: integer!] and [b: integer! ] whatever their order
older newer	first last