r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
5-Dec-2006
[1552]
Such a thing has been on my todo list for a while, but I've been 
a little busy lately with non-REBOL projects :(
Gregg
5-Dec-2006
[1553]
I don't want to deal with XML beyond simple well-formed XML, too 
complex. I don't, personally, have any interest in doing generic 
XML toolkit stuff at this point. I can see value in it for some people, 
but I'd rather write REBOL dialects. :-)
Maxim
8-Dec-2006
[1554x2]
geomol's xml2rebxml handles XML pretty well.  one might want to change 
the parse rules a little to adapt the output, but it actually loads 
all the xml tags, empty tags and attributes.  it even handles utf-8, 
CDATA chunks, and converts some of the & chars.
I am using an adapted form of it commercially so far.  I have implemented 
full schema validation and loading (in rebol) but its proprietary 
code I can't release.  So guys, it can be done !
Allen
10-Dec-2006
[1556]
I'm starting to see some abandonment of XML in favour of JSON .. 
mainly in web 2.0 .  but it will not replace xml where validation 
 is required.
BrianH
11-Dec-2006
[1557]
You really have to trust your source when using JSON to a browser 
though. Standard usage is to load with eval - only safe to use on 
https sites because of script injection.
[unknown: 9]
11-Dec-2006
[1558]
XML and JSON sucks...
Maxim
11-Dec-2006
[1559]
is there a way to make block parsing case sensitive?

this doesn't seem to work:
parse/case [A a] [some ['A (print "upper") | 'a (print "lower")]]
Gabriele
11-Dec-2006
[1560x2]
words are not case sensitive.
>> strict-equal? 'A 'a
== true
Maxim
11-Dec-2006
[1562x3]
I was just hoping case could have been an exception... it would be 
very usefull especially when parsing code from other languages...
(I meant using /case within parse)
well, seems like I'll be doing string parsing then  :-)
Gabriele
11-Dec-2006
[1565x3]
you could take advantage of this bug:
>> alias 'a "aa"
== aa
>> strict-equal? 'A 'a
== false
but it will be fixed eventually :P
Maxim
11-Dec-2006
[1568x2]
hehe... I would not want the bug to get too comfortable,  less it 
becomes a feature  ;-)
you know what they say...  "features are bugs with experience"
Josh
11-Dec-2006
[1570x2]
I don't know
Whoops
Joe
24-Dec-2006
[1572x4]
s: 		"str"
s2:		"str 1^/ str 2 ^/ str 3"


rules:	[
		any [
			end break
			| copy value [to "^/" | to end]		(print value)
		]
		]
	

parse		s rules
print		"---"
parse		s2 rules
i run the above on core 2.6 and it loops forever . This was a bug 
fixed in 2.3 but it looks like the bug still exists
sorry, not a bug. I was inspired by the example in the changes page 
and it is missing the  thru "^/" after the to "^/"
parse item [
    any [
        "word" (print "got word")
        | copy value [to "abc" | to end]
            (print value) break
    ]
]
Gabriele
25-Dec-2006
[1576x2]
not a bug - you are not skipping the newline, so to "^/" will always 
match. you are not getting to the end.
>> rules: [
[    any [
[        end break
[        |
[        copy value [to newline | to end] (print value) opt skip
[        ]
[    ]
== [
    any [
        end break
        |
        copy value [to newline | to end] (print value) opt skip
    ]
]
>> parse s2 rules
str 1
str 2
str 3
== true
Joe
25-Dec-2006
[1578x2]
yes, thanks gabriele - happy holidays ! i find the opt skip not very 
intuitive !
wouldn't  to newline thru newline be easier to understand than opt 
skip
Volker
25-Dec-2006
[1580]
could be opt newline
Gabriele
26-Dec-2006
[1581x5]
joe, if you don't care about parse returning true you can just use 
skip (without opt, which is there for the end case)
also, if you don't care about your value having the newline in it, 
you can just replace to newline with thru newline.
another possibility (but gives more maineinance problems) is to split 
the copy rule into two, one for newline and one for end.
copy value to newline skip
copy value to end
Ladislav
27-Dec-2006
[1586x2]
Joe: another option is to use:

rules: [
    any [
        copy value [to newline | to end] (print value) skip
    ]
    to end
]
but, as Gabriele said, that is equivalent to:

rules: [
    any [
        copy value [to newline | to end] (print value) skip
    ]
]

if you ignore the parse result
BrianH
27-Dec-2006
[1588x2]
to end skip will always fail. move the skip after the to newline.
Nevermind, failing isn't a problem here.
Ladislav
28-Dec-2006
[1590]
another possibility (but gives more maineinance problems) is to split 
the copy rule into two, one for newline and one for end.

 - I am curious whether this isn't actually better when the maintenance 
 is taken into account - suppose e.g. that we want to add yet another 
 alternative...
Gabriele
28-Dec-2006
[1591x2]
lad, maybe, but if you change the name of the variable to copy to 
you have then to change it twice in the rule.
generally, i'd prefer [copy value [rule1 | rule2]] to [copy value 
rule1 | copy value rule2], however it is not always that easy, so 
many times you have to do the latter.
Anton
28-Dec-2006
[1593]
I agree.
Maxim
28-Dec-2006
[1594x2]
hi,  yesterday I realized I have a 1400 line single parse ruleset 
 which amounts to ~40kb of code !   :-)  I was wondering what are 
your largest Parse rulesets,  I'm just curious at how people here 
are pushing REBOL into extremes.


I might also say that parse is wildly efficient in this case, allowing 
my server to decipher 600 bytes of binary data through all of this 
huge parse rule within 0.01 - 0.02 seconds (spitting out a nested 
rebol block tree of  rebxml2xml ready data).
to anyone not yet accustomed to 'PARSE, really do take the time to 
look it through and use it.
Pekr
28-Dec-2006
[1596]
when working so extensively with Parse, you migt try to write down 
your enhancement/fixes proposal and submit it to R3 team :-)
Maxim
28-Dec-2006
[1597x4]
The one real limitation I always saw which makes some rules harder 
to write for nothing  is the first-of-any  search (to and through)
and THE MOST PROFOUND limitiation... why the hell is 'NOT within 
the dialect?
50% of the time its easier to match something which is not and then 
within that choice select something which is.
like bounds checking, making sure some items are not within a specific 
area, etc.
Geomol
28-Dec-2006
[1601]
My largest Parse rulesets are in NicomDoc. The scripts nicomdoc.r 
and ndmath.r parse from text to rebxml format. They are 20k and 24k. 
ndrebxml2html.r parse from NicomDoc rebxml format til html, and that 
is a 28k script mostly parse rules. I once build a html dialect, 
and that was 24k.