• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp5907
r3wp58701
total:64608

results window for this page: [start: 17301 end: 17400]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Gabriele:
25-Dec-2006
not a bug - you are not skipping the newline, so to "^/" will always 
match. you are not getting to the end.
BrianH:
27-Dec-2006
Nevermind, failing isn't a problem here.
Maxim:
28-Dec-2006
hi,  yesterday I realized I have a 1400 line single parse ruleset 
 which amounts to ~40kb of code !   :-)  I was wondering what are 
your largest Parse rulesets,  I'm just curious at how people here 
are pushing REBOL into extremes.


I might also say that parse is wildly efficient in this case, allowing 
my server to decipher 600 bytes of binary data through all of this 
huge parse rule within 0.01 - 0.02 seconds (spitting out a nested 
rebol block tree of  rebxml2xml ready data).
Maxim:
28-Dec-2006
like bounds checking, making sure some items are not within a specific 
area, etc.
Geomol:
28-Dec-2006
My largest Parse rulesets are in NicomDoc. The scripts nicomdoc.r 
and ndmath.r parse from text to rebxml format. They are 20k and 24k. 
ndrebxml2html.r parse from NicomDoc rebxml format til html, and that 
is a 28k script mostly parse rules. I once build a html dialect, 
and that was 24k.
Geomol:
28-Dec-2006
And yes, parse is a great tool!
Tomc:
28-Dec-2006
Max complement charset  ...  can often be used as a sort of NOT
Maxim:
28-Dec-2006
true, but that does not match many cases. obviously one can built 
NOT-A NOT-B and so one.. but man.. that gets tedious, which is not 
what parse should be.
Robert:
29-Dec-2006
And don't forget to take a look at Gab's compile-rule function.
Ladislav:
29-Dec-2006
lad, maybe, but if you change the name of the variable to copy to 
you have then to change it twice in the rule.

 - right. That is a general problem of procedural programming style. 
 OTOH, the "opt skip" variant has got another problem - the "opt skip" 
 code is related only to the first alternative, which seems to me 
 like the reason why Joe doesn't like it
Ladislav:
29-Dec-2006
have a look at: http://www.compkarori.com/vanilla/display/compile-rules.r

my contribution is at: http://www.compkarori.com/vanilla/display/TO%2C+THRU+And+NOT+PARSE+Rules
Graham:
29-Dec-2006
Me too .. but it must be a vanilla problem. Are you logged in when 
you read the page, or as guest?
Oldes:
19-Jan-2007
Isn't this a bug?

>> b: "1234^@567" parse/all b [copy i to {^@} 1 skip b: to end] probe 
i probe b
1234
567

BUT:

>> b: "1234^@567" parse/all b [copy i to #{00} 1 skip b: to end] 
probe i probe b
1234
1234^@567
Maxim:
19-Jan-2007
I wish it did too.  it would make some things simple a little bit.
Volker:
19-Jan-2007
notperfect, but a way to use numbers
Oldes:
28-Feb-2007
how to parse such a string:
   {some.string/(a + (b.a * c()))/end} 
to get:
   ["some" "string" "(a + (b.a * c()))" "end"]
Maxim:
28-Feb-2007
here is a full script  :-) 


rebol []

paren-start: charset "("
paren-end: charset ")"
parens: union paren-start paren-end

separator: charset [#"/" #"."]
label: complement union separator parens

content: complement parens
blk: copy []

str: {some.string/(a + 1 / 2 (b.a * c()))/end}

expression: [paren-start  any [content | expression] paren-end]

parse/all str [some [ separator | here: some [label] there:  (append 
blk copy/part here there) | here:   expression there: (append blk 
copy/part here there)]]


probe blk


ask "..."
Steeve:
7-Mar-2007
I made a pause during which I raised sheep in the French Alps
Steeve:
7-Mar-2007
the reality ?, i'm just a nerd, i must admit this weakness, i can't 
leave without technology
Maxim:
7-Mar-2007
hehe not enough buttons on a sheep I guess  ;-)
Sunanda:
8-Mar-2007
I've never heard of such a script, Steeve.

It does not seem to be on REBOLtech (a forerunner to REBOL.org). 
You could try some more detailed searches than I did if you want 
to look further
http://www.reboltech.com/library/scripts/
****

Sadly, a lot of good stuff gets published on personal websites, and 
when the enthusiasm for REBOL wanes, or the site is taken offline 
for some reason, the scripts are lost to the wider community.
sqlab:
8-Mar-2007
Steeve, you can have a look at the scripts archive

http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlYCHQ
Maxim:
13-Apr-2007
I am having a hard time with using REMOVE on a parsed string.
Maxim:
13-Apr-2007
symbol: charset [#"a" - #"z" #"A" - #"Z" #"0" #"9" "_-?!*+~"]
nstr: "...aa.....a.....a.....h." 

parse/all nstr [
	some [
		symbol | end skip | 
		[

   here: ( probe here either none? here/1 [print nstr print "!!"  ][print 
   here/1 print nstr remove here here: back here]) ; :here skip
		]
		;:here
	]
] probe nstr
Maxim:
13-Apr-2007
doh.... was about to give a better example... then I realise the 
error... there is nothing to match in the last rule, just an expression, 
so a no match is always matching nothing !
btiffin:
13-Apr-2007
It's nice just thinking out loud once in a while...we're here for 
you Maxim.  Cerebration.  :)
Maxim:
13-Apr-2007
well, I was trying to say that I had not realized this was possible... 
and its quite cool... we can actually use that in some ways  ...


 make rules which make parse become an event handler for example ! 
  the moment you feed a string some value, parse will start treating 
 it...
Maxim:
13-Apr-2007
and then fall back to silence... (just inserting a little wait in 
the loop will take care of cpu load)
Maxim:
13-Apr-2007
it would be nice if the result from the expression could be used 
to determine if the rule is a match or not...
btiffin:
13-Apr-2007
Off topic but...that was what intrigued me with SNOBOL and Icon...succeed, 
fail and a result.
btiffin:
13-Apr-2007
If you haven't, take a read through Icon pattern matching...mondo 
powerful.  Off topic...sorry.
Maxim:
13-Apr-2007
here is the solution... complement the valid symbols and match them 
explicitely.

rebol []

symbol: charset [#"a" - #"z" #"A" - #"Z" #"0" #"9" "_-?!*+~"]
invalid: complement symbol
nstr: "...aa.....a.....a.....h." 

end-rule: []

parse/all nstr [
	some [
		symbol | [here: invalid (remove here) ] :here
	]
]
btiffin:
13-Apr-2007
More off topic...I wept a little bit when I heard of Dr. Ralph Griswold 
passing, back in October.  Never met him, much respect.
btiffin:
13-Apr-2007
Final off topic;  Now I'm slowly replacing all my computer heroes...Names 
like Kernighan, Pike, Moore, Griswold, Lovelace... are now Sassenrath, 
DocKimbel, Anton, Cyphre, Graham, Maxim, Ladislav, Henrik, Oldes...et 
al.  Thanks guys.  You are making my world a better place.
Ladislav:
13-Apr-2007
Max: "it would be nice if the result from the expression could be 
used to determine if the rule is a match or not" - that is of course 
possible as follows:
Ladislav:
13-Apr-2007
right, but the value of the expression *can be used* to determine 
if a rule is a match
Ladislav:
13-Apr-2007
otherwise, I am for addition of a rule which would take the result 
of the paren! expression directly into account without us having 
to resort to this (more complicated) way
Ladislav:
13-Apr-2007
if you use a more appropriate rule name like check-result, you have 
got a more readable:
btiffin:
13-Apr-2007
guru question;  Will a utype! definition be allowed to wrap builtins? 
 SNMP MIBs require a fairly heavy weight tuple!  But will a short 
MIB conflict with internal scans of tuple! or do utype! scans take 
some form of precedent?  I've become curious, yet remain dumb enough 
to not know.
Oldes:
14-Apr-2007
Isn't this a bug?
>> parse [a/b] [a/b]
** Script Error: a has no value
** Near: parse [a/b] [a/b]
Oldes:
14-Apr-2007
I don't want the a to be evaluated in the parse rules!
Oldes:
14-Apr-2007
hm... ech.. I'm stupid.. normaly is evaluated as well, so it's not 
a bug.. but is there any way how to parse specific path! ?
Oldes:
14-Apr-2007
I mean:
>> parse [a] ['a]
== true
>> parse [a/b] ['a/b]
== false
ChristianE:
14-Apr-2007
>> parse [a/b] [(path: 'a/b) path]
== true
>> parse [a/c] [(path: 'a/b) path]
== false
Gabriele:
14-Apr-2007
older versions did not evaluate paths. since newer version do, we 
need 'a/b to work. dunno if this is in RAMBO... but it needs to be 
fixed.
Oldes:
14-Apr-2007
Yes I know they were not evaluated before, but I'm not sure if it's 
not a feature, that they are evaluated now.
Oldes:
14-Apr-2007
I just think, that maybe it would be good to have parse [a/b] ['a/b] 
== true as is parse [a] ['a]
Oldes:
14-Apr-2007
..because it would not be useful anyway as I would have to write 
a special rule for each refinement.
Gabriele:
14-Apr-2007
it's not a bug that they are evaluated (in fact it was requested 
in a rambo ticket). it's a bug that - since now they are evaluated 
- lit-paths are not used to match paths.
Anton:
14-Apr-2007
Maybe if the result of parens were parsed, we could use a paren to 
evaluate a path (and don't use a paren to leave as is).
Gabriele:
16-Apr-2007
it looks like that 3.0 won't have a new parse, but i don't have any 
details and i'm just guessing.
PeterWood:
16-Apr-2007
Does that imply there won't be a Unicode Charset with which to parse 
unicode strings?
btiffin:
16-Apr-2007
There is going to be a unicode! datatype
Henrik:
17-Apr-2007
Perhaps vector! will play a part in solving the unicode problem
Gabriele:
17-Apr-2007
you can make a bitset with 65000 bits in r2... so why not in r3?
Pekr:
17-Apr-2007
I don't know, as for me, I just wanted to|thru [a | b | c] :-)
Gabriele:
17-Apr-2007
we won't stop at 3.0... there will be a 3.1 and so on... at least 
we hope so :)
Rebolek:
24-May-2007
Is there some way to make this work: parse "aaa" [some "a" "a"] or 
PARSE just don't work this way?
Geomol:
24-May-2007
What do you mean?
>> parse "aaa" [some "a"]
== true

Why the second "a"?
Geomol:
24-May-2007
Parsing for [some "a" "a"] will return false, because you've already 
parsed past the "a"s.
Geomol:
24-May-2007
A clumsy way of doing it:
>> parse "aaa" [some "a" p: (p: skip p -1) :p "a"]
== true
BrianH:
24-May-2007
parse "aaa" [some [p: "a"] :p "a"]
BrianH:
24-May-2007
Not in my version. The p is set before the position advances past 
the "a", so it is already back.
BrianH:
24-May-2007
The p is reset before "a" is consumed - that is why I put [p: "a"] 
in [].
BrianH:
24-May-2007
Interesting. It seems to be setting the last p before it fails on 
the last iteration of "a".
BrianH:
24-May-2007
Clearly I need a temporary.
BrianH:
24-May-2007
parse "aaa" [some [p1: "a" (p2: :p1)] :p2 "a"]
BrianH:
24-May-2007
A temporary will work better with parts of unknown size, and be faster 
too.
BrianH:
24-May-2007
Still, you might want to apply rewrite rules to your generated parse 
rules - that code seems a little sloppy.
Oldes:
24-May-2007
that you will not have [some "a" "a"] but just [some "a"]
BrianH:
24-May-2007
By rewrite rules, I mean something like what Gabriele came up with 
for the rebcode assembler a while ago. Since I helped refine his 
work, I may still have a copy somewhere. I'll take a look.
Geomol:
24-May-2007
Define readable! ;-) Maybe you could use a combination of to-string, 
to-binary, debase and things like that.
Rebolek:
24-May-2007
if i do (a: charset "abc") i want to do also (decharset a) to get 
"abc" :) that's readable ;)
Geomol:
24-May-2007
Rebolek, use my hokus-pokus function:

hokus-pokus: func [
	value
	/local a out
][
	either bitset? value [
		a: enbase/base to-binary value 2
		out: copy ""
		forall a [
			if a/1 = #"1" [append out to-char (index? a) - 4]
		]
		out
	][
		42
	]
]

>> a: charset "abc"
>> hokus-pokus a
== "abc"
Gregg:
24-May-2007
Yes, Brett has built a lot of very cool stuff. Haven't seen him around 
for a while though.
Oldes:
26-May-2007
and... it would be good to have just a function which returns the 
translated Rebol parse block
Rebolek:
26-May-2007
And yes, function returning just parse rules will be done, this is 
just a work in progress
Oldes:
26-May-2007
and anyway... 12 or 8 millions google rusults  is not a big difference 
if your page is not listed between first 20 pages:)
Oldes:
26-May-2007
you can use... http://www.googlefight.com/or make a Rebol version... 
it's quite easy
Rebolek:
26-May-2007
in the file i posted is a function REGSET that converts small bit 
of regex to bitset, it's syntax seems to be easier than charset's 
syntax (charset [#"a" - #"z" #"0" - #"9"] vs regset "a-z0-9")
Gregg:
26-May-2007
Very nice Boleslav! What regex engine/syntax are you going for compatibility 
with (if any)?


Charset syntax is probably that way because it's a dialect, and Carl 
wanted a string as input to be easy, without escapes and such; just 
my guess.
BrianH:
26-May-2007
You should wrap your code in a context.
BrianH:
26-May-2007
You should seperate the regex compilation phase from its application 
phase, and just write a wrapper that calls both in order. The compilation 
phase is often more complex than just applying the results, so if 
you are using the regex repeatedly you should just compile it once.
Rebolek:
26-May-2007
Oldes, I though about just a translator from regex to parse rules 
and I'm not sure it will be easy, I'm using my 'tail-parse that matches 
rules in reversed order that is better for regex syntax. Maybe there's 
some other way.
Rebolek:
26-May-2007
this is the problem with [some "a" "a"]. This is equivalent of "a*a" 
in regex which is perfectly valid, but problematic in parse. This 
is simple example, but it can get quite complicated so I'm not sure 
I can handle all cases. The reversed order seemed simpler. But you 
will probably prove me wrong :)
BrianH:
26-May-2007
BTW, "a*a" is directly equivalent to [any "a" "a"], not some.
BrianH:
26-May-2007
Most of the changes were made to make it faster and to use less memory 
overhead.

- It is faster for parse to match a one-character string than a character 
value.
- Insert is faster than union, and makes no temporaries.

- If you are capturing a single character, I think [a: skip (a: first 
a)] is faster than [copy a skip (a: first a)].

- Path access is slower than the equivalent native, so [first a] 
instead of [a/1].

- The fastest loop is loop, even with the math to calculate the number 
of times.
BrianH:
26-May-2007
Aside from the one-time bind, repeat may be faster than loop with 
a self-incremented index.
BrianH:
26-May-2007
It might be a good idea to run a peephole optimizer on the patterns 
before compiling them, to convert ones like "a*a" to "aa*".
Rebolek:
27-May-2007
Hi Brian, thanks for support, I was out for a sleep :)
BrianH:
27-May-2007
Yeah, so it does. I wonder why the docs don't say (will be local) 
like it does for foreach. It still ends up faster than loop when 
you have to keep track of an index or a counter.
Dockimbel:
27-May-2007
Brian, you've stated that  "It is faster for parse to match a one-character 
string than a character value." It seems to me that the opposite 
statement is true. (matching a char! is faster than matching a on-character 
string!)
BrianH:
27-May-2007
It seems to me that the opposite _should_ be true, but parse converts 
the character to a string before matching it - no conversion is performed 
for string values. It's just one of those weird things.
Ladislav:
28-May-2007
my measurements show:

>> time-block [parse "a" ["a"]] 0.05
== 3.83615493774414E-7
>> time-block [parse "a" [#"a"]] 0.05
== 3.61204147338867E-7

, i.e. the opposite
BrianH:
28-May-2007
Which version? Nevermind, my timing differences may just be a multitasking 
artifact.
BrianH:
28-May-2007
Too small a sample for a busy computer.
BrianH:
28-May-2007
Rebolek, I gather you made the parse go in reverse to handle rules 
like "a+a" better. How does your reverse code handle "aa+", or "aa+a" 
- same problem?
Dockimbel:
28-May-2007
Here's another benchmark:

>> data: head insert/dup make string! 10'000'000 #"a" 10'000'000

>> t0: now/time/precise loop 10 [parse data [some "a"]] now/time/precise 
- t0
== 0:00:06.078

>> t0: now/time/precise loop 10 [parse data [some #"a"]] now/time/precise 
- t0
== 0:00:04.296


Running this test several times shows that char! matching is, in 
average, 30 % faster than string! matching.
BrianH:
28-May-2007
Well there you go. That's different numbers than last time, but more 
dramatic. It's just a #, easy fix :)
Dockimbel:
28-May-2007
Didn't want to sound "dramatic", but just wanted to provide a more 
accurate measure. Sure whatever datatype is used (char! or string!) 
in regex.r, that won't  change much the overall speed. ;-)
17301 / 6460812345...172173[174] 175176...643644645646647