• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp4382
r3wp44224
total:48606

results window for this page: [start: 23101 end: 23200]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
Steeve:
3-Oct-2009
And you all missed my (N Fail) proposal.
Steeve:
3-Oct-2009
I just rewrote the math expressions resolver.

digit: charset "0123456789"
num: [some digit opt [#"." any digit]] 
term: [num | #"(" any lv1 term #")" | #"-" any lv3 term]
calc: [
	remove [copy num1 term copy op skip copy num2 term]
	(expr: do reform select [
		"+"  [num1 op num2]
		"-"  [num1 op num2]
		"*"  [num1 op num2]
		"/"  [num1 op num2]
		"^^" [num1 "**" num2]
		"%"  [num1 "//" num2]
		
	] op)
	stay insert expr (probe e)
]
lv4: [term #"%" term then fail | break | calc]
lv3: [any lv4 term #"^^" any lv4 term then fail | break | calc]

lv2: [any lv3 term [#"*" | #"/"] any lv3 term then fail | break | 
calc]

lv1: [any lv2 term [#"+" | #"-"] any lv2 term then fail | break | 
calc]

I just think it's more clear like that.
Moreover, it's prepared to use the further AND command.

Because this nasty trick i use:
[rule THEN FAIL | BREAK | calc]
will be replaced by:
[AND rule calc]
Pekr:
4-Oct-2009
What is your take on simple mode parsing? It is handy for simple 
CSV parsing, and the idiom is common:

parse/all row ";"


The trouble is, that if there is no data in last column, parse mistakenly 
makes the resulting block shorter, so you have to use common idiom:

rec: parse/all append row  ";" ";"

I always wondered, if it could be regarded being a parse bug?
Henrik:
4-Oct-2009
enline and deline will help somewhat.
Pekr:
4-Oct-2009
Ladislav - in comment to ticket #1248, you write:


According to the documentation, that can be found in http://www.rebol.net/wiki/Parse_Project

parse "b" [not #"a"]


yields FALSE correctly. If you want to obtain TRUE, you can try e.g.:

parse "b" [not #"a" to end] 


My question is - what it the advantage to actually not advance the 
input on the rule match? It does not look natural and I would expect 
it to match the rule and hence move past it:

>> parse "b" [not #"a" ??]
end!: "b"
== false

... as can be seen, it does not advance ...
Ladislav:
4-Oct-2009
What is the advantage?:


1) by not consuming input this would be a direct inversion of the 
rule. Example:

    parse ""a" [not end ...]


is a meaningful rule, and it is quite trivial to see, that any rule 
consuming input would not be a direct inversion of this rule.


NOT SOMETHING actually means, that at the current input position 
the SOMETHING rule shall not match. That does not give us any information, 
that NOT should skip any input (how far should it?).

2) This version of NOT is compatible with PEG

3) It is consistent with the AND operation:

   [AND rule] is equivalent to [NOT [NOT rule]]
Ladislav:
4-Oct-2009
Yet another example:


    [NOT skip] is equivalent to the [END] rule and is meaningful only, 
    when NOT does not skip any input
Ladislav:
4-Oct-2009
...I would expect it to match the rule and hence move past it...

 - that is trivially wrong. If the RULE matches, the [NOT RULE] cannot 
 match, therefore it cannot even advance. The only case, when (theoretically) 
 we could think of advancing is, when the rule does not match. But 
 then, it is not known, how far.
Maxim:
5-Oct-2009
pekr, I had the same initial reaction, then realized that it would 
not be consistent wrt fail or no fail... when NOT would succeed a 
match (and fail the rule), the input would be beyond what the not 
is usefull for.


when I started thinking about it,  if you really want you can simply 
use a set word/get word pair to advance when the not finds a match 
to ignore a rule, but then its like not using 'NOT in the first place, 
so its pointless  :-)
Steeve:
6-Oct-2009
I can have a look, but the purpose of NOT is not to have better perfs 
than complemented charset, but to allow some simplification when 
writing rules.

Actually, It's the case of most other improvements, easier to write, 
not inevitably faster.

And don't forget that safe complemented charset in R3 are a pain 
in the ass to construct, because of UTF-8
PeterWood:
6-Oct-2009
Which is why I was dissapointed that I apparently misunderstood from 
Carl's blog:


Changes that are critical, but not highly complicated. For example, 
providing a NOT command seems easy enough, and it is now critical 
because using complemented charsets is problematic (due to the Unicode 
enhancements). 
Steeve:
6-Oct-2009
Well, i saw your script, i don't know if it can be faster, i only 
can say I would have written it differently.
Probaby, using parse and load/next for all normal rebol values.

I can see that your rule about matching binaries are false. Cause 
[#{" thru #"}"] is wrong (what if the the binary contains the #"^}" 
char ?)
BrianH:
6-Oct-2009
Peter, Steeve, the original problem that started the parse proposals 
was the problem of complimenting charsets. However, it quickly changed 
to improving PARSE in general. Then, while we were waiting for the 
parse proposals to come up on the todo list, we came up with a better 
solution to complimenting charsets, which is not yet implemented 
and which is not limited to PARSE.
BrianH:
6-Oct-2009
Using a bit in the charset that would mark it as "complemented", 
and then all of its matching algorithms would do an internal not.
BrianH:
6-Oct-2009
I want to write more port code first and refine the model based on 
what I learn.
BrianH:
12-Oct-2009
Behavior of BREAK, ANY and SOME decided, finally: http://www.rebol.net/r3blogs/0270.html
BrianH:
12-Oct-2009
And it's finally break from a loop, rather than break from a block 
(supposedly).
Maxim:
12-Oct-2009
but its a hell of a powerfull addition to parse and to general code 
control.  I don't see why Carl can't see any use for it.
BrianH:
12-Oct-2009
And you can do that with CATCH.
Steeve:
12-Oct-2009
yep, and for functions, you still got THROW/CATCH and RETURN, which 
are enough to my mind.
BrianH:
12-Oct-2009
The BREAK, THROW, RETURN, EXIT, HALT and QUIT functions are implemented 
the same way, just with different error codes.
Maxim:
12-Oct-2009
but n BREAK allows us to leverage smaller rules reuse, as if they 
where large complex rules and still benefit from the same speed of 
a root rule backtrack.
BrianH:
12-Oct-2009
I think that Carl is trying to balance speed, ease of use, and debugability. 
In practice n BREAK would be tricky to debug, and doesn't actually 
reflect what PARSE does internally. Apparently PARSE isn't actually 
recursive descent - it just fakes it with a state machine.
BrianH:
12-Oct-2009
Because you can't through the end, not even with THRU END. And once 
you reach the end, END always succeeds.
BrianH:
12-Oct-2009
And TO "abc" will also continue to succeed, matching the same "abc" 
every time. THRU "abc" skips past the "abc" like you say.
Pekr:
13-Oct-2009
So according to his doc, we should get BREAK/RETURN and DO?
Pekr:
13-Oct-2009
But generally - the level of feedback is lower and lower. We need 
to get R3 into beta with requested features in few months, as we 
are starting to loose ppl being interested ...
Pekr:
13-Oct-2009
well, otoh we lived without OF for so long. I think it can be done 
in a conventional (recent) way :-) I think that Carl should dedicate 
few more days to finish parse and move on to Extensions :-)
BrianH:
13-Oct-2009
The only still-missing proposals that aren't easy or efficient to 
work around are OF and REVERSE. They will be missed if not included. 
Unfortunately, the same reasons why they will be hard to work arond 
if missing, are the reasons why they would be difficult to implement 
:(
Graham:
14-Oct-2009
Tim Berners-Lee is quoted today to say that he can't think of a good 
reason to keep the // in http://, and that if he did it again, he 
would have done without them.  I wonder if he spoke to people who 
write parsers ....
Gabriele:
15-Oct-2009
the reason for the // is to allow relative paths like: //www.rebol.com/ 
  where the scheme is the same as the base url. Nobody has ever used 
this; also, it could have been achieved by using :www.rebol.com/ 
instead... so, yeah, it was not really a good idea. I also don't 
think ftp:file.txt (meaning, change scheme, but keep host and path) 
has ever been used and not sure it's supported by software. so in 
practice http:www.rebol.com/ would have worked.
BrianH:
15-Oct-2009
It's an operator, like |, and mentioned in that section near the 
top.
Pekr:
15-Oct-2009
isn't AND operator too for e.g.?
Maxim:
17-Oct-2009
I really want to do it... but I'm so deep into parsing right now 
I don't want to loose the few GB of information in my brain's cache. 
 I'm writing self-modifying parse rules and its pretty nightmarish. 
 although it works.
Pekr:
17-Oct-2009
An=And
Pekr:
17-Oct-2009
So - we don't need complementing to be enhanced? Because we talked 
about it, but it is not defined in proposal, it is not part of Carl's 
feature table, and I also got no reaction on R3 Chat ....
Maxim:
17-Oct-2009
laden with many paren expressions and a stack on top of it.
Maxim:
17-Oct-2009
since I use binding to map inner rules which are also constructed 
on the fly but have to be pushed and poped from the stack as I traverse 
data... its a lot of fun  :-D
BrianH:
17-Oct-2009
If the self-modifying rules are strung-together basic blocks, you 
can use the rule compiler to generate the blocks. And the R3 changes 
make self-modifying rules less necessary, so you can have even larger 
basic blocks.
Maxim:
17-Oct-2009
and its not simple parsing since I use parsing index manipulation, 
which is also dictated by the source data in encounters.  its like 
swatting flies using a fly swatter at the end of a rope, while riding 
a roller coster which changes layout every time you ride it  ;-)
BrianH:
17-Oct-2009
Which is what a rule compiler does :)  Actually, it sounds like you 
could adapt the tricks of the ruule compiler to *your* rule compiler, 
which would let you use the new operations in your rule source and 
have the workarounds generated in the output.
Maxim:
17-Oct-2009
well, build it and I will try it  ;-)
Pekr:
18-Oct-2009
ah, got reply on Chat from Carl towards complementing:


Re #5718: Pekr, that's a good question, and I think the answer must 
be YES. We need to be able to complement bitmaps in a 

nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII 
chars, would take a lot of memory.


This change should be listed on the project sheet, and if not, I'll 
add it there."
Chris:
22-Oct-2009
Both w1 and w+ appear to be very large values.  Would it be smart 
to perhaps do:

	[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?
Steeve:
22-Oct-2009
Uses R3 (and his optimized complemented bitsets)
Chris:
22-Oct-2009
Allowing 'into to look inside strings can break current usage of 
'into, requiring [and any-block! into ...]
Chris:
22-Oct-2009
An example: a nested d: [k v] structure where 'k is a word and 'v 
is 'd or any other type:

	data: [k [k "s"]]

R2, you can validate with d: [word! [into d | skip]]


Now you have to specify: d: [word! [and any-block! into d | skip]] 
otherwise you get an error if 'v is a string!
BrianH:
26-Oct-2009
Chris, there can be an advantage in R3 to breaking up a bitset into 
more that one bitset on occasion, mostly memory savings. However, 
it might not work as well as you might like since offset and/or sparse 
bitsets aren't supported. Bitsets that involve high codepoints will 
take a lot of RAM no matter what you do.
JoshF:
17-Nov-2009
The second one failed when I tried to extend the dialect with multiply 
(*) and divide (/). After further experimentation, it seems that 
you can't escape the "/". Google has not been helpful here... Does 
anybody have any ideas? I could parse for just a word! instead of 
the +, -, etc., but I wanted parse to do the work of deciding what 
was a valid operation or not. Sorry for the multiple messages, I'm 
still trying to figure this client out... Thanks for any advice!
JoshF:
17-Nov-2009
Both tdiv and lit-div type? to a word!...
Henrik:
17-Nov-2009
And also hence the expression "a block is or isn't loadable"
JoshF:
17-Nov-2009
OK... Mechanically, I see what you're saying, but what's the difference 
between a lit-word and a word? The spirit eludes me...
JoshF:
17-Nov-2009
I thought there was only word!'s and then everything else were more 
concrete types. I guess what I am asking is what is the purpose of 
lit-words?
JoshF:
17-Nov-2009
The difference between what I'm doing and what you linked to is that 
it's working against a string, while I'm doing a dialect, no?
Janko:
2-Dec-2009
I know I was stopped by parse in some occasions where. I think always 
every time the problem would be solvable if I had for example >> 
to [ "A" | "B" ] where parser would check where is A and where is 
B and go to the closest one.
Janko:
2-Dec-2009
I was trying to show an example where you have two possible endings 
and you want to process both (and you can differently with parens) 
) but you don't know in what order they will come or anything
Janko:
2-Dec-2009
yes , then you have to do charset parsing (but I don't know that 
yet :) ) .. I was just trying to say if there would be the way to 
say something like "to any [ "A" | "B" ] and it would go to the closest 
one A LOT of problems with parse would be easily solvable
Graham:
2-Dec-2009
and see which has the best fit ?
Janko:
2-Dec-2009
The pattern is known ... the scentence starts with this is and can 
end with . or ! but they can come in any order .. if you try to parse 
with "." first you will get 
---- ops some errors upthere  .. just a sec
Janko:
2-Dec-2009
this is the common to all problems where that I am describing .. 
if I had  > to [ "." | "!" ] and parse would find both and go to 
the one that is closer it would be solved.
Graham:
2-Dec-2009
Janko, best thing to do is show us a  string you can't parse ... 
and someone will show you how to do it.
Janko:
2-Dec-2009
I don't have real example right now :) I had them few times before 
and I also asked here about them and I solved with your help somehow
Janko:
2-Dec-2009
I just started talking about this as a general limitation of parse 
that I meed a lot of times and I suppose Paul could of meet it when 
trying to parse CSV
Gregg:
2-Dec-2009
It's not necessarily a PARSE limitation, but there are things we'd 
like PARSE to do that aren't always reasonable. :-)


TO and THRU can work very well, but that doesn't mean they'll work 
for every situation. You may have to use rules where you check for 
your target value or just SKIP, marking locations in the input as 
you go.
Gregg:
2-Dec-2009
That said, if you know the format (e.g. WRT quotes and escapes), 
it can be done with PARSE. It just may not be a one-liner.
Janko:
2-Dec-2009
I know parsing csv can be messy ... at least at this high level I 
don't know how to do it with escapes and commas in etc
Janko:
2-Dec-2009
and I know everything has limitations ... this functionality OR with 
taking the first that appears would just in practice solve me many 
cases
Graham:
2-Dec-2009
you have to turn off parse's default delimiters and use bitsets
Ladislav:
2-Dec-2009
Janko: the only problem is, that you cannot use:

C: [to [A | B]]

, where A and B are "general rules", but you can always write:

C: [here: [A | B] :here | skip C]

, which would do what you want
Oldes:
2-Dec-2009
And Janko... if you don't use charsets at all, I think you should 
give it a try. It's not so difficult. I think that if I can write 
parser to colorize PHP code, than you can parse everything.
Janko:
3-Dec-2009
Ladislav, thanks.. I didn't know you could set the position back 
with :here , that is interesting and probably expands what you can 
do with parse a lot.
Janko:
3-Dec-2009
yes, you are right .. if you can write partser for php then you can 
make anything with it. I always supposed parse with charsets is like 
low level step by one char in a looop and call "events" and change 
states , with which you can parse anything from xml to languages 
.. well but parse with charsets is still much more elegant
Janko:
3-Dec-2009
but it is a level less simple and nice to use than simple parse modes 
that's why the simple ones should be powerfull *if possible* too 
- you can't get a newbie impressed with charset parsing because he 
won't understand it probably.
Ladislav:
3-Dec-2009
Just to complete the list of possible equivalents to the

    C: [to [A | B]]

rule, here is a way how to do it in Rebol3 parse:

    C: [while [and [A | B] break | skip | reject]]


you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms
Ladislav:
3-Dec-2009
It looks, that I could have used:

    C: [while [and [A | B] accept | skip | reject]]
jack-ort:
11-Dec-2009
Help!  Still struggling to understand parse.  How could I replace 
any and all SINGLE occurrences of  the single-quote character anywhere 
in a string (beginning, middle or end) with TWO single-quotes?  But 
if there are already TWO single-quotes together, I want to leave 
them alone.

TIA for any and all help for a newbie!
Maxim:
11-Dec-2009
easy, actually.  you match double quotes first then fallback to single 
quotes, adding a new one and skiping one char... 

give me a minute I should get something working...
jack-ort:
11-Dec-2009
Thanks!  I'm going to have to look @ this for awhile to understand 
why you even need to worry about the double-quote character.  Much 
to learn....

Thanks Maxim and Steeve for the prompt replies!
Rebolek:
11-Dec-2009
Just curious, I tested both versions and Steeve's version is about 
2times faster than Maxim's :)
Maxim:
11-Dec-2009
actually, having a paypal account linked with your login and a "donate" 
button would be really nice :-)  right in the chat tool.
Maxim:
11-Dec-2009
I sure would use it... some people have helped save days of work 
with free code and insight.
Maxim:
12-Dec-2009
I just adopted a new notation standard for parse rules... the goal 
is to make rules a bit more verbose as to the type of each rule token... 
I find this reads well in any direction, since we encouter the "=" 
character when reading from left to right or right to left... and 
parse rules often have to be read from right to left.

example:

=terminal=: [

 =quote= copy terminal to =quote= skip (print ["found terminal: " 
 terminal])
]


on very large rules, and with the syntax highlighting in my editor 
making the "=" signs very distinct, I can instantly detect what parts 
of my rules are other rules or character patterns... it also helps 
out in the declarations... I see when blocks are intended to be used 
as rules quite instantly where ever they are in my code.


in my current little parser, I find I can edit my rules almost twice 
as fast and loose MUCH less time scanning my blocks to find the rule 
tokens, and switching them around.

wonder what you guys think about it...
Maxim:
12-Dec-2009
another example.... in this dense block of text, I can spot the =eol= 
 (end of line) token instantly in both x and y dimensions of the 
rule paragraph:

=line-comment=: [
	=comment-symbol= [
		[thru =eol= (print "comment to end of line")]
		|[to end]
	]
	(print "success")
]
Maxim:
12-Dec-2009
syntax highlighting colorizes words ... stuff is colorized... but 
user words aren't colorised and they all get mixed up between functions, 
variables and rules... and having colors which are two strong next 
to each other and in relative distribution ... cancels out.
Graham:
12-Dec-2009
so you could write a parser that reads your rules and colorises them 
...
Maxim:
12-Dec-2009
I'm just trying to get a feel for what others think about the idea. 
 and sharing a bit of a discovery at the same time, if it may help 
others. the goal isn't to be popular or convince others... and sorry, 
if my last line may have looked harsh, it wasn't.  :-)


I was just resuming your reaction plainly and  relaunching the question 
to be sure others realize I want a few opinions.
PeterWood:
12-Dec-2009
any others care to comment?


I'm afraid t looks very messy to me and reminded me of Perl for some 
reasion.
Gregg:
13-Dec-2009
For a long time I've added = to the end of my parse rules, and = 
to the beginning of parse variables. I think it matches the production 
rule grammar well, and also emulates set-word/get-word syntax.
Maxim:
13-Dec-2009
I've used word=  for other things before and I liked it.
Gregg:
14-Dec-2009
Yup. Different mindset.


I just looked at your BNF compiler earlier. Good stuff. I did an 
ABNF-to-parse generator some time back. ABNF is used in a lot of 
IETF RFCs and such.
Maxim:
14-Dec-2009
that is nice, is your ABNF parser still accessiblel somewhere?  it 
could improve the quatily and ease of integrating the protocols to 
R3 IMO.

ABNF also seems much more aligned to parse
Maxim:
15-Dec-2009
I've been rewriting bnf generated parse rules (and often a bit cryptically) 
into proper parse ordered rules for 3 days now... <sigh>  

C is sooo complex for what it really does.  I''ve discovered a few 
quite mind-boggling language capabilities... 
stuff like:    

char *( *(*var)() )[10];


it takes 7 steps to define what that really is and there are other 
"fun" examples which end up being interpretation nightmares, but 
look really simple.


one thing is certain at this point... although I will be able to 
build a C to rebol converter with relative precision under specific 
goals, some of the crazy stuff just will have to be finished manually 
by humans.


at least I rarely see such twisted C code in most of what I've been 
reading so far.
BrianH:
16-Dec-2009
BNF is just a syntax form, with a *lot* of variation. The real difference 
that matters between Yacc and PARSE is the parsing model. Yacc implements 
an LR parser (or some variant thereof), and PARSE implements a variant 
of TDPL parsing (related to PEG), though more powerful and with a 
completely different syntax. How you structure the parse rules depends 
on the parsing model, not the syntax.


For instance, LR parsers tend to do recursion rather than iteration, 
and when they recurse the recrsive call tends to be on the left, 
with the distinguishing clause on the right. For PEG parsers, recursion 
goes the other way. This is not an error, this is a difference in 
parsing model.


If you are translating from Yacc to PARSE, it's not just a syntax 
change. You have to reorganize the rules to match the new model. 
And watch out: Certain patterns are easier to express in some parsing 
models than in others. Some patterns aren't supported at all in some 
models, and so no amount of translation will help you. We chose the 
TDPL model for PARSE because it is more expressive than the LR model, 
so in theory you should be able to translate LR rules to PARSE with 
some topological twists (redoing the sturcture of the rules). However, 
there are patterns that you can express in PARSE that can't be translated 
to LR, even with topological changes.
Maxim:
16-Dec-2009
my goal is to get the host code and OpenGL headers past the parsing 
phase.  once that is done, I'll start work on adding the production 
phase.


I still have to write the pre-processor, but that in fact is pretty 
straight forward.  there are little rules and they are much more 
static and well defined on the MS web site.
Maxim:
16-Dec-2009
the funny thing is that the C language reference on the MSDN is actually 
pretty well done... there are a lot of evil C examples for some of 
the more obscure parts of  the language like pointers, structs and 
unions.


funny thing is that some of the most complex things to express where 
the litteral constants!  integers, with octal, hex notation... not 
as simple as some [digits]  ;-)
Henrik:
24-Dec-2009
Looking at the new WHILE keyword and I was quite baffled by Carl's 
use of it in his latest blog example. Then I read the docs and it 
didn't get much better:

- WHILE is a variant of ANY
- ANY stops, if input does not change
- WHILE doesn't stop, even if input does not change

What does "input does not change" mean?

Is it about changing the parse series length during parse?

Is it actively moving the parse index back or forth using special 
commands?

Is it normal progression of parse index with each cycle of WHILE 
or ANY?

Is it alteration of the parse series content while maintaining length 
during parse?
Pekr:
24-Dec-2009
Henrik - according to docs explanation, 'parse contains some internal 
protection for the case, when input stream does not advance its position. 
In R2, following code causes infinite loop, in R3, it returns false:

parse str [some [to "abc"]]


(I am not sure I like that it returns false - normally I expect it 
to cause infinite loop. This is imo overprotecting programmer, and 
you have to think, why your code returns false anyway, which for 
me is the same, as if it would cause an infinite loop)

Further from docs:


To avoid infinite looping, a special internal rule is triggered based 
on the fact that the rule did not change the input position.

However, this shows a problem with this rule:

parse str [some [to "a" remove thru "b"]]


Here the input did not appear to advance, but something useful happened. 
In such cases, the some word should not be used, and the while word 
is better:

parse str [while [to "a" remove thru "b"]]
Pekr:
24-Dec-2009
I don't probably understand usefullness of 'while at all. Because 
now I have to think, if my code would cause infinite loop, or not, 
and use 'some or 'while accordingly ...
Pekr:
24-Dec-2009
Running above examples, my opinion is, that in fact adding 'while 
was probably not a good decision. I can understand, that now we have 
more power - our code will not easily cause an infinite loops, but 
otoh you now have to think, if it can happen or not, and 'some becomes 
your enemy ...
Fork:
28-Dec-2009
?? not initialized after first match?  And secondly, how do I match 
thru a series of things (e.g. integer! integer!, but just wondering 
about the thte.  ?? problem before the first match?)
23101 / 4860612345...230231[232] 233234...483484485486487