r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Janko
2-Dec-2009
[4641x2]
I just started talking about this as a general limitation of parse 
that I meed a lot of times and I suppose Paul could of meet it when 
trying to parse CSV
janko

,"some\"thing92!","graham" I am not sure but I think here you have 
the same problem
Gregg
2-Dec-2009
[4643x3]
It's not necessarily a PARSE limitation, but there are things we'd 
like PARSE to do that aren't always reasonable. :-)


TO and THRU can work very well, but that doesn't mean they'll work 
for every situation. You may have to use rules where you check for 
your target value or just SKIP, marking locations in the input as 
you go.
CSV parsing is an issue, because REBOL handles some inputs well, 
but fails for what may be a common way things are formatted. "CSV" 
isn't always as simple as it sounds.
That said, if you know the format (e.g. WRT quotes and escapes), 
it can be done with PARSE. It just may not be a one-liner.
Janko
2-Dec-2009
[4646x2]
I know parsing csv can be messy ... at least at this high level I 
don't know how to do it with escapes and commas in etc
and I know everything has limitations ... this functionality OR with 
taking the first that appears would just in practice solve me many 
cases
Graham
2-Dec-2009
[4648]
you have to turn off parse's default delimiters and use bitsets
Janko
2-Dec-2009
[4649]
(aha bitsets.. I was calling them charsets upthere)
Graham
2-Dec-2009
[4650x2]
BTW, Bolek wrote a regex engine in Rebol ...
http://www.mail-archive.com/[rebol-bounce-:-rebol-:-com]/msg01983.html
Ladislav
2-Dec-2009
[4652]
Janko: the only problem is, that you cannot use:

C: [to [A | B]]

, where A and B are "general rules", but you can always write:

C: [here: [A | B] :here | skip C]

, which would do what you want
Oldes
2-Dec-2009
[4653x2]
Just would like to remember that there is something like R3 where:

>> parse "I like Apple . I like Windows ! I like Linux . I like Amiga 
." [any ["I like " copy x to [" ." | " !"] (probe x) to "I like "]]
Apple
Windows
Linux
Amiga
And Janko... if you don't use charsets at all, I think you should 
give it a try. It's not so difficult. I think that if I can write 
parser to colorize PHP code, than you can parse everything.
Janko
3-Dec-2009
[4655x4]
Ladislav, thanks.. I didn't know you could set the position back 
with :here , that is interesting and probably expands what you can 
do with parse a lot.
Oldes if that is in R3 >> copy x to [" ." | " !"]  << this is exactly 
as I was proposing above :) , very nice!


I know I have to .. I haven't really needed them yet I guess, I solved 
some things less elegantly in other ways without them. I intend to 
take the plunge next time I need them.
yes, you are right .. if you can write partser for php then you can 
make anything with it. I always supposed parse with charsets is like 
low level step by one char in a looop and call "events" and change 
states , with which you can parse anything from xml to languages 
.. well but parse with charsets is still much more elegant
but it is a level less simple and nice to use than simple parse modes 
that's why the simple ones should be powerfull *if possible* too 
- you can't get a newbie impressed with charset parsing because he 
won't understand it probably.
Ladislav
3-Dec-2009
[4659x3]
Just to complete the list of possible equivalents to the

    C: [to [A | B]]

rule, here is a way how to do it in Rebol3 parse:

    C: [while [and [A | B] break | skip | reject]]


you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms
I didn't know you could set the position back with :here

 - you can set the position back even without :here, the choice operator 
 is sufficient for you to be able to do that, see the above idioms 
 as an example
It looks, that I could have used:

    C: [while [and [A | B] accept | skip | reject]]
Graham
3-Dec-2009
[4662x2]
Janko, charset is short for make bitset! so you can call them bitsets 
or charsets :)
Ladislav, what 'choice operator?
BrianH
4-Dec-2009
[4664]
|
jack-ort
11-Dec-2009
[4665]
Help!  Still struggling to understand parse.  How could I replace 
any and all SINGLE occurrences of  the single-quote character anywhere 
in a string (beginning, middle or end) with TWO single-quotes?  But 
if there are already TWO single-quotes together, I want to leave 
them alone.

TIA for any and all help for a newbie!
Maxim
11-Dec-2009
[4666x2]
easy, actually.  you match double quotes first then fallback to single 
quotes, adding a new one and skiping one char... 

give me a minute I should get something working...
R2?
jack-ort
11-Dec-2009
[4668]
yes, View 2.7.6 under Windows XP
Steeve
11-Dec-2009
[4669x2]
>> parse/all str [ any [thru {"} [{"} | p: (insert p {"} skip) ]]]
something like this (not tested)
i think i misunderstood something, replace {"} by {'} maybe
Maxim
11-Dec-2009
[4671x2]
>> str: {1 ''2 '3 4 ' '5 ''6 '7 8 9 '0'}

>> parse/all str [some [{''} | [{'} here: (insert here {'}) skip] 
| skip]]
>> print str
== {1 ''2 ''3 4 '' ''5 ''6 ''7 8 9 ''0''}
note all ticks... ( ' ) are single quote chars in the above.
Steeve
11-Dec-2009
[4673]
same as mine, except i use THRU to speed up the process
jack-ort
11-Dec-2009
[4674]
Thanks!  I'm going to have to look @ this for awhile to understand 
why you even need to worry about the double-quote character.  Much 
to learn....

Thanks Maxim and Steeve for the prompt replies!
Maxim
11-Dec-2009
[4675]
print it out in the rebol console... you will see that my exampe 
doesn't nave any double quote characters.. they just look like so 
in altme's font  ;-)
Steeve
11-Dec-2009
[4676]
corrected version with thru:

>> parse/all str [ any [thru {'} [{'} | p: (insert p {'} ) skip ]]]
jack-ort
11-Dec-2009
[4677]
Ah!  when you said "...you match double quotes first then fallback 
to single quotes, ..." I was thinking double-quote character, not 
double single-quotes.  Need more coffee...

Thanks very much!
Maxim
11-Dec-2009
[4678]
( I can see that being misleading when read hehehe :-)
Rebolek
11-Dec-2009
[4679]
Just curious, I tested both versions and Steeve's version is about 
2times faster than Maxim's :)
Steeve
11-Dec-2009
[4680]
we should add a DONATE account somewhere, linked with Altme.

I'm sure people would be glad to add 1 dollar for such fast assistance.
Then, we could finance some interesting projects
Maxim
11-Dec-2009
[4681x3]
actually, having a paypal account linked with your login and a "donate" 
button would be really nice :-)  right in the chat tool.
I sure would use it... some people have helped save days of work 
with free code and insight.
I'd gladly give back a few $ for their efforts
Reichart
11-Dec-2009
[4684]
Jack, Parse is my fav REBOL command.  If I ever have time, this is 
the one funciton I would like to create hundreds of examples for 
in a Wiki.
WuJian
11-Dec-2009
[4685]
newbie's solution,without  PARSE:
>> s2: {1 ''2 '3 4 ' '5 ''6 '7 8 9 '0'}

>> replace/all s2 {''} {'}     replace/all s2 {'} {''}      print 
str
1 ''2 ''3 4 '' ''5 ''6 ''7 8 9 ''0''
>> str == s2
== true
Maxim
12-Dec-2009
[4686x4]
I just adopted a new notation standard for parse rules... the goal 
is to make rules a bit more verbose as to the type of each rule token... 
I find this reads well in any direction, since we encouter the "=" 
character when reading from left to right or right to left... and 
parse rules often have to be read from right to left.

example:

=terminal=: [

 =quote= copy terminal to =quote= skip (print ["found terminal: " 
 terminal])
]


on very large rules, and with the syntax highlighting in my editor 
making the "=" signs very distinct, I can instantly detect what parts 
of my rules are other rules or character patterns... it also helps 
out in the declarations... I see when blocks are intended to be used 
as rules quite instantly where ever they are in my code.


in my current little parser, I find I can edit my rules almost twice 
as fast and loose MUCH less time scanning my blocks to find the rule 
tokens, and switching them around.

wonder what you guys think about it...
another example.... in this dense block of text, I can spot the =eol= 
 (end of line) token instantly in both x and y dimensions of the 
rule paragraph:

=line-comment=: [
	=comment-symbol= [
		[thru =eol= (print "comment to end of line")]
		|[to end]
	]
	(print "success")
]
when using rules in other contexts, they also stick out...

=alphabet=: rejoin [=digit= =letter= bits "_"]


here I immediately see that bits isn't a rule, but a function or 
a word.
with syntax highlighting it's quite amazing how    bits   stands 
out. ... in my editor at least.
Graham
12-Dec-2009
[4690]
use color instead :)