r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
28-May-2007
[1885x2]
I'm more concerned about the first and follow sets - doing that wrong 
could mean a slowdown of orders of magnitude.
Actually, big O slowdowns.
Rebolek
28-May-2007
[1887]
BrianH: "How does your reverse code handle "aa+", or "aa+a"" - damn... 
:)

Anyway, today I was thinking about different way how to do it, had 
no time to code it yet.
It's slow, I know...
BrianH
28-May-2007
[1888x2]
Are you familiar with the theories behind parser generators? That 
is what you are doing. I studied the theories in college - hence 
the talks about first and follow sets.
I've been thinking about ways to do this too - you got me going. 
I was also thinking of a way to cache charsets for later reuse.
Maxim
28-May-2007
[1890x3]
didn't you guys try using a charset?  I would guess its an order 
of magnitude faster still... no?
well, checking with one letter didn't give any changes (bitset and 
char seem the same) but just change that with an OR of two letter 
and the char goes to 4 times slower and the bitset stays exactly 
the same... so do use bitsets as often as you can.
>> t0: now/time/precise loop 10 [parse data [some #"a"]] now/time/precise 
- t0
== 0:00:06.389


>> t0: now/time/precise loop 10 [parse data [some [#"b" | #"a"]]] 
now/time/precise - t0
== 0:00:25.496

>> b: charset "ab"

>> t0: now/time/precise loop 10 [parse data [some b]] now/time/precise 
- t0
== 0:00:06.379
Rebolek
29-May-2007
[1893]
I'm not sure how can I test for end in case like this:
r: ["a" "b" "c" end] parse "abc" [some [r/1 (r: next r)]]
anybody?
Oldes
29-May-2007
[1894]
r: ["a" "b" "c" break] parse "abc" [some [r/1 (r: next r)]]
Rebolek
29-May-2007
[1895]
great, thanks
Oldes
29-May-2007
[1896]
Here are C sources for Regex engine used in MySQL5.0.9 http://leithal.cool-tools.co.uk/sourcedoc/mysql509/html/dir_000079.html
Rebolek
29-May-2007
[1897x5]
So I left tail-parse becuase of the "aa+" problem mentioned by BrianH. 
I wrote 'lazy-parse that works different and it seems to handle both 
cases "a+a" and "aa+" well.
>> lazy-parse "aaa" [[#"a"][some #"a"][#"a"]]
== true
>> lazy-parse "aaa" [[#"a"][some #"a"]]
== true
>> lazy-parse "aaa" [[some #"a"][#"a"]]
lazy-parse: func [string rules /local parsed?][
;does not support ANY yet (wrong results)
	parsed?: false
	rules: copy rules
	parse string [
		some [
			b: rules/1 e:
			(
				parsed?: true
				rules: next rules
				if empty? e [
					either empty? rules [
						parsed?: true
						rules: copy [break]
					] [
						e: next b
						if empty? e [parsed?: false]
					]
				]
			)
			:e
		]
	]
	parsed?
]
Warning: this does not suport ANY rule yet.
Grrr, I should make more test before posting...forget last four posts
Rebolek
31-May-2007
[1902]
'regset is now on rebol.org, rest of parser is still in works.
btiffin
31-May-2007
[1903]
Rebolek...You posted the magic prime 797'th Script.  Well Done!  
:)
Rebolek
31-May-2007
[1904]
Hm thanks, but I was aiming for 800th one... ;)
btiffin
31-May-2007
[1905]
Yep...I've been waiting...  :)
Rebolek
7-Jun-2007
[1906]
just a quick idea:

FORALL is implemented as mezzanine function. It calls FORSKIP which 
is mezzanine also. As you can see, it's probably not the fastest 
method. So here's the idea. Cannnot be FORALL rewritten to make it 
faster and is it possible to use PARSE to do this?
So I tried and came up with this simple function:

parall: func [
	'word body
	/local data
][
	data: get :word
	parse data compose/deep [

  some [(to set-word! word) any-type! (to paren! [do bind body :word])]
	]
]

(parall is just a shortcut for parse version of forall).


this is very simple function written in five minutes and not very 
well checked, it needs some additional work (eg. it does not return 
same value as 'forall etc).

So let's do some test (using Ladislav's %timblk.r):
>> n: copy [] repeat i 100 [append n i]

== [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4...
>> time-block [forall n [b: n/1 + 1]] 0.05
== 3.623046875E-4
>> time-block [parall n [b: n/1 + 1]] 0.05
== 3.814697265625E-6
>> 3.62e-4 / 3.81e-6
== 95.0131233595801

95x faster? whooo....

and what about bigger block?

>> n: copy [] repeat i 10000 [append n i]

== [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4...
>> time-block [forall n [b: n/1 + 1]] 0.05
== 3.540625E-2
>> time-block [parall n [b: n/1 + 1]] 0.05
== 3.7994384765625E-6
>> 3.54e-2 / 3.8e-6
== 9315.78947368421

9000x ? omg...

comments?
Gabriele
7-Jun-2007
[1907]
the devil is in the details... you need to handle break :)
Rebolek
7-Jun-2007
[1908]
oh yes, break...didn't use it for years, so almost completely forget 
about it :)
BrianH
7-Jun-2007
[1909x4]
Don't forget continue.
First tip: You don't have to bind the body - it just uses the existing 
binding of the word.
Netx tip: What native does forskip decompose down to? Try using that 
first.
; Try against this, the forskip code with the skip part taken out
forall: func [
    "Evaluates a block for every value in a series."
    [catch throw]

    'word [word!] {Word set to each position in series and changed as 
    a result}
    body [block!] "Block to evaluate each time"
    /local orig result
][
    if not any [
        series? get word
        port? get word

    ] [throw make error! {forall expected word argument to refer to a 
    series or port!}]
    orig: get word
    while [any [not tail? get word (set word orig false)]] [
        set/any 'result do body
        set word next get word
        get/any 'result
    ]
]
Rebolek
7-Jun-2007
[1913x2]
BrianH: I know it can be done using while loop in forskip, it was 
more an experiment on using parse as foreach/forall kind of function.
to see how fast it can be
BrianH
7-Jun-2007
[1915x3]
Well, if you can figure out how to make it handle break and continue, 
let us know.
Or for that matter, ports.
If you want to test the speed of parse, replace the any-type! with 
a skip - the forall you are comparing it to doesn't do that test.
Chris
7-Jun-2007
[1918]
What is 'continue?
BrianH
7-Jun-2007
[1919]
Sort of like the opposite of break. It breaks one iteration of a 
loop and continues at the top of the next iteration. Many languages 
have it, as does REBOL 3.
Gabriele
8-Jun-2007
[1920]
notice that, forall was initially done that way, and converted to 
use forskip for simplicity (ie. avoiding having the same code twice 
in the source)
Rebolek
8-Jun-2007
[1921]
I tried to enclose parse in loop 1 [] and it seems to handle break. 
I guess you'll probably prove me wrong, Brian :)

parall: func [
	'word body
	/loc data
][
	loop 1 [
		data: get :word
		parse data compose/deep [
			some [(to set-word! word) skip (to paren! [do body])]
		]
	]
]

re: continue - this is not r3 ;)


>> n: [1 2 3 4 5] parall n [if n/1 = 4 [break/return "break"] if 
n/1 > 4 [print "bad"]]
== "break"
Gabriele
8-Jun-2007
[1922x4]
good :) now... if Ladislav was here he'd probably find at least ten 
wrong things in that function. ;)
you need to add [throw] for example, and return the last return value 
of the body
but... as long as we're not replacing forall... this could be a very 
fast alternative to use in cases when you don't need all the features...
and actually... i'd call parse directly in that case ;)
Rebolek
8-Jun-2007
[1926]
forall replacement not...an experiment in using parse for maybe not 
obvious purposes - yes :)
Volker
8-Jun-2007
[1927x2]
You could make something new. like this:
 parall [p: set x integer! set y integer!] [?? p ?? x ?? y]
parall [p: set x integer! set y integer!]  DATA [?? p ?? x ?? y] 
;..
Oldes
10-Jun-2007
[1929]
To be able escape from infinite loops use () somewhere in the parse 
rules as for example:

 >> parse "abc" [any [()]]
 *** YOU CAN PRESS [ESC] NOW TO STOP THE LOOP ***

>> parse "abc" [any []]
*** HANGS REBOL ***
BrianH
11-Jun-2007
[1930]
Cool!
PeterWood
11-Jun-2007
[1931]
Oldes: does using () inside a parse loop slow things down? I'm wary 
of () as they do seem to slow the interpreter down in many cases.
Rebolek
11-Jun-2007
[1932]
Peter it does slow interpreter down, it's better to use this trick 
when designing your rules. If you're happy with them, remove the 
parens to make your rule faster.
Steeve
27-Jun-2007
[1933x2]
What about a new word “all” which would allow  to parse several successives 
conditions whatever their order  ?
parse [a: 1 b: 2] [all ['a: integer! | 'b: integer!]] would be true 
only if it exists successivly [a: integer!] and [b: integer! ] whatever 
their order