AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 4382 |
r3wp | 44224 |
total: | 48606 |
results window for this page: [start: 23101 end: 23200]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
Steeve: 3-Oct-2009 | And you all missed my (N Fail) proposal. | |
Steeve: 3-Oct-2009 | I just rewrote the math expressions resolver. digit: charset "0123456789" num: [some digit opt [#"." any digit]] term: [num | #"(" any lv1 term #")" | #"-" any lv3 term] calc: [ remove [copy num1 term copy op skip copy num2 term] (expr: do reform select [ "+" [num1 op num2] "-" [num1 op num2] "*" [num1 op num2] "/" [num1 op num2] "^^" [num1 "**" num2] "%" [num1 "//" num2] ] op) stay insert expr (probe e) ] lv4: [term #"%" term then fail | break | calc] lv3: [any lv4 term #"^^" any lv4 term then fail | break | calc] lv2: [any lv3 term [#"*" | #"/"] any lv3 term then fail | break | calc] lv1: [any lv2 term [#"+" | #"-"] any lv2 term then fail | break | calc] I just think it's more clear like that. Moreover, it's prepared to use the further AND command. Because this nasty trick i use: [rule THEN FAIL | BREAK | calc] will be replaced by: [AND rule calc] | |
Pekr: 4-Oct-2009 | What is your take on simple mode parsing? It is handy for simple CSV parsing, and the idiom is common: parse/all row ";" The trouble is, that if there is no data in last column, parse mistakenly makes the resulting block shorter, so you have to use common idiom: rec: parse/all append row ";" ";" I always wondered, if it could be regarded being a parse bug? | |
Henrik: 4-Oct-2009 | enline and deline will help somewhat. | |
Pekr: 4-Oct-2009 | Ladislav - in comment to ticket #1248, you write: According to the documentation, that can be found in http://www.rebol.net/wiki/Parse_Project parse "b" [not #"a"] yields FALSE correctly. If you want to obtain TRUE, you can try e.g.: parse "b" [not #"a" to end] My question is - what it the advantage to actually not advance the input on the rule match? It does not look natural and I would expect it to match the rule and hence move past it: >> parse "b" [not #"a" ??] end!: "b" == false ... as can be seen, it does not advance ... | |
Ladislav: 4-Oct-2009 | What is the advantage?: 1) by not consuming input this would be a direct inversion of the rule. Example: parse ""a" [not end ...] is a meaningful rule, and it is quite trivial to see, that any rule consuming input would not be a direct inversion of this rule. NOT SOMETHING actually means, that at the current input position the SOMETHING rule shall not match. That does not give us any information, that NOT should skip any input (how far should it?). 2) This version of NOT is compatible with PEG 3) It is consistent with the AND operation: [AND rule] is equivalent to [NOT [NOT rule]] | |
Ladislav: 4-Oct-2009 | Yet another example: [NOT skip] is equivalent to the [END] rule and is meaningful only, when NOT does not skip any input | |
Ladislav: 4-Oct-2009 | ...I would expect it to match the rule and hence move past it... - that is trivially wrong. If the RULE matches, the [NOT RULE] cannot match, therefore it cannot even advance. The only case, when (theoretically) we could think of advancing is, when the rule does not match. But then, it is not known, how far. | |
Maxim: 5-Oct-2009 | pekr, I had the same initial reaction, then realized that it would not be consistent wrt fail or no fail... when NOT would succeed a match (and fail the rule), the input would be beyond what the not is usefull for. when I started thinking about it, if you really want you can simply use a set word/get word pair to advance when the not finds a match to ignore a rule, but then its like not using 'NOT in the first place, so its pointless :-) | |
Steeve: 6-Oct-2009 | I can have a look, but the purpose of NOT is not to have better perfs than complemented charset, but to allow some simplification when writing rules. Actually, It's the case of most other improvements, easier to write, not inevitably faster. And don't forget that safe complemented charset in R3 are a pain in the ass to construct, because of UTF-8 | |
PeterWood: 6-Oct-2009 | Which is why I was dissapointed that I apparently misunderstood from Carl's blog: Changes that are critical, but not highly complicated. For example, providing a NOT command seems easy enough, and it is now critical because using complemented charsets is problematic (due to the Unicode enhancements). | |
Steeve: 6-Oct-2009 | Well, i saw your script, i don't know if it can be faster, i only can say I would have written it differently. Probaby, using parse and load/next for all normal rebol values. I can see that your rule about matching binaries are false. Cause [#{" thru #"}"] is wrong (what if the the binary contains the #"^}" char ?) | |
BrianH: 6-Oct-2009 | Peter, Steeve, the original problem that started the parse proposals was the problem of complimenting charsets. However, it quickly changed to improving PARSE in general. Then, while we were waiting for the parse proposals to come up on the todo list, we came up with a better solution to complimenting charsets, which is not yet implemented and which is not limited to PARSE. | |
BrianH: 6-Oct-2009 | Using a bit in the charset that would mark it as "complemented", and then all of its matching algorithms would do an internal not. | |
BrianH: 6-Oct-2009 | I want to write more port code first and refine the model based on what I learn. | |
BrianH: 12-Oct-2009 | Behavior of BREAK, ANY and SOME decided, finally: http://www.rebol.net/r3blogs/0270.html | |
BrianH: 12-Oct-2009 | And it's finally break from a loop, rather than break from a block (supposedly). | |
Maxim: 12-Oct-2009 | but its a hell of a powerfull addition to parse and to general code control. I don't see why Carl can't see any use for it. | |
BrianH: 12-Oct-2009 | And you can do that with CATCH. | |
Steeve: 12-Oct-2009 | yep, and for functions, you still got THROW/CATCH and RETURN, which are enough to my mind. | |
BrianH: 12-Oct-2009 | The BREAK, THROW, RETURN, EXIT, HALT and QUIT functions are implemented the same way, just with different error codes. | |
Maxim: 12-Oct-2009 | but n BREAK allows us to leverage smaller rules reuse, as if they where large complex rules and still benefit from the same speed of a root rule backtrack. | |
BrianH: 12-Oct-2009 | I think that Carl is trying to balance speed, ease of use, and debugability. In practice n BREAK would be tricky to debug, and doesn't actually reflect what PARSE does internally. Apparently PARSE isn't actually recursive descent - it just fakes it with a state machine. | |
BrianH: 12-Oct-2009 | Because you can't through the end, not even with THRU END. And once you reach the end, END always succeeds. | |
BrianH: 12-Oct-2009 | And TO "abc" will also continue to succeed, matching the same "abc" every time. THRU "abc" skips past the "abc" like you say. | |
Pekr: 13-Oct-2009 | So according to his doc, we should get BREAK/RETURN and DO? | |
Pekr: 13-Oct-2009 | But generally - the level of feedback is lower and lower. We need to get R3 into beta with requested features in few months, as we are starting to loose ppl being interested ... | |
Pekr: 13-Oct-2009 | well, otoh we lived without OF for so long. I think it can be done in a conventional (recent) way :-) I think that Carl should dedicate few more days to finish parse and move on to Extensions :-) | |
BrianH: 13-Oct-2009 | The only still-missing proposals that aren't easy or efficient to work around are OF and REVERSE. They will be missed if not included. Unfortunately, the same reasons why they will be hard to work arond if missing, are the reasons why they would be difficult to implement :( | |
Graham: 14-Oct-2009 | Tim Berners-Lee is quoted today to say that he can't think of a good reason to keep the // in http://, and that if he did it again, he would have done without them. I wonder if he spoke to people who write parsers .... | |
Gabriele: 15-Oct-2009 | the reason for the // is to allow relative paths like: //www.rebol.com/ where the scheme is the same as the base url. Nobody has ever used this; also, it could have been achieved by using :www.rebol.com/ instead... so, yeah, it was not really a good idea. I also don't think ftp:file.txt (meaning, change scheme, but keep host and path) has ever been used and not sure it's supported by software. so in practice http:www.rebol.com/ would have worked. | |
BrianH: 15-Oct-2009 | It's an operator, like |, and mentioned in that section near the top. | |
Pekr: 15-Oct-2009 | isn't AND operator too for e.g.? | |
Maxim: 17-Oct-2009 | I really want to do it... but I'm so deep into parsing right now I don't want to loose the few GB of information in my brain's cache. I'm writing self-modifying parse rules and its pretty nightmarish. although it works. | |
Pekr: 17-Oct-2009 | An=And | |
Pekr: 17-Oct-2009 | So - we don't need complementing to be enhanced? Because we talked about it, but it is not defined in proposal, it is not part of Carl's feature table, and I also got no reaction on R3 Chat .... | |
Maxim: 17-Oct-2009 | laden with many paren expressions and a stack on top of it. | |
Maxim: 17-Oct-2009 | since I use binding to map inner rules which are also constructed on the fly but have to be pushed and poped from the stack as I traverse data... its a lot of fun :-D | |
BrianH: 17-Oct-2009 | If the self-modifying rules are strung-together basic blocks, you can use the rule compiler to generate the blocks. And the R3 changes make self-modifying rules less necessary, so you can have even larger basic blocks. | |
Maxim: 17-Oct-2009 | and its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-) | |
BrianH: 17-Oct-2009 | Which is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to *your* rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output. | |
Maxim: 17-Oct-2009 | well, build it and I will try it ;-) | |
Pekr: 18-Oct-2009 | ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there." | |
Chris: 22-Oct-2009 | Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 | w1] any [aw+ | w+]] Where 'aw1 and 'aw+ are limited to ascii values? | |
Steeve: 22-Oct-2009 | Uses R3 (and his optimized complemented bitsets) | |
Chris: 22-Oct-2009 | Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...] | |
Chris: 22-Oct-2009 | An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d | skip]] Now you have to specify: d: [word! [and any-block! into d | skip]] otherwise you get an error if 'v is a string! | |
BrianH: 26-Oct-2009 | Chris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do. | |
JoshF: 17-Nov-2009 | The second one failed when I tried to extend the dialect with multiply (*) and divide (/). After further experimentation, it seems that you can't escape the "/". Google has not been helpful here... Does anybody have any ideas? I could parse for just a word! instead of the +, -, etc., but I wanted parse to do the work of deciding what was a valid operation or not. Sorry for the multiple messages, I'm still trying to figure this client out... Thanks for any advice! | |
JoshF: 17-Nov-2009 | Both tdiv and lit-div type? to a word!... | |
Henrik: 17-Nov-2009 | And also hence the expression "a block is or isn't loadable" | |
JoshF: 17-Nov-2009 | OK... Mechanically, I see what you're saying, but what's the difference between a lit-word and a word? The spirit eludes me... | |
JoshF: 17-Nov-2009 | I thought there was only word!'s and then everything else were more concrete types. I guess what I am asking is what is the purpose of lit-words? | |
JoshF: 17-Nov-2009 | The difference between what I'm doing and what you linked to is that it's working against a string, while I'm doing a dialect, no? | |
Janko: 2-Dec-2009 | I know I was stopped by parse in some occasions where. I think always every time the problem would be solvable if I had for example >> to [ "A" | "B" ] where parser would check where is A and where is B and go to the closest one. | |
Janko: 2-Dec-2009 | I was trying to show an example where you have two possible endings and you want to process both (and you can differently with parens) ) but you don't know in what order they will come or anything | |
Janko: 2-Dec-2009 | yes , then you have to do charset parsing (but I don't know that yet :) ) .. I was just trying to say if there would be the way to say something like "to any [ "A" | "B" ] and it would go to the closest one A LOT of problems with parse would be easily solvable | |
Graham: 2-Dec-2009 | and see which has the best fit ? | |
Janko: 2-Dec-2009 | The pattern is known ... the scentence starts with this is and can end with . or ! but they can come in any order .. if you try to parse with "." first you will get ---- ops some errors upthere .. just a sec | |
Janko: 2-Dec-2009 | this is the common to all problems where that I am describing .. if I had > to [ "." | "!" ] and parse would find both and go to the one that is closer it would be solved. | |
Graham: 2-Dec-2009 | Janko, best thing to do is show us a string you can't parse ... and someone will show you how to do it. | |
Janko: 2-Dec-2009 | I don't have real example right now :) I had them few times before and I also asked here about them and I solved with your help somehow | |
Janko: 2-Dec-2009 | I just started talking about this as a general limitation of parse that I meed a lot of times and I suppose Paul could of meet it when trying to parse CSV | |
Gregg: 2-Dec-2009 | It's not necessarily a PARSE limitation, but there are things we'd like PARSE to do that aren't always reasonable. :-) TO and THRU can work very well, but that doesn't mean they'll work for every situation. You may have to use rules where you check for your target value or just SKIP, marking locations in the input as you go. | |
Gregg: 2-Dec-2009 | That said, if you know the format (e.g. WRT quotes and escapes), it can be done with PARSE. It just may not be a one-liner. | |
Janko: 2-Dec-2009 | I know parsing csv can be messy ... at least at this high level I don't know how to do it with escapes and commas in etc | |
Janko: 2-Dec-2009 | and I know everything has limitations ... this functionality OR with taking the first that appears would just in practice solve me many cases | |
Graham: 2-Dec-2009 | you have to turn off parse's default delimiters and use bitsets | |
Ladislav: 2-Dec-2009 | Janko: the only problem is, that you cannot use: C: [to [A | B]] , where A and B are "general rules", but you can always write: C: [here: [A | B] :here | skip C] , which would do what you want | |
Oldes: 2-Dec-2009 | And Janko... if you don't use charsets at all, I think you should give it a try. It's not so difficult. I think that if I can write parser to colorize PHP code, than you can parse everything. | |
Janko: 3-Dec-2009 | Ladislav, thanks.. I didn't know you could set the position back with :here , that is interesting and probably expands what you can do with parse a lot. | |
Janko: 3-Dec-2009 | yes, you are right .. if you can write partser for php then you can make anything with it. I always supposed parse with charsets is like low level step by one char in a looop and call "events" and change states , with which you can parse anything from xml to languages .. well but parse with charsets is still much more elegant | |
Janko: 3-Dec-2009 | but it is a level less simple and nice to use than simple parse modes that's why the simple ones should be powerfull *if possible* too - you can't get a newbie impressed with charset parsing because he won't understand it probably. | |
Ladislav: 3-Dec-2009 | Just to complete the list of possible equivalents to the C: [to [A | B]] rule, here is a way how to do it in Rebol3 parse: C: [while [and [A | B] break | skip | reject]] you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms | |
Ladislav: 3-Dec-2009 | It looks, that I could have used: C: [while [and [A | B] accept | skip | reject]] | |
jack-ort: 11-Dec-2009 | Help! Still struggling to understand parse. How could I replace any and all SINGLE occurrences of the single-quote character anywhere in a string (beginning, middle or end) with TWO single-quotes? But if there are already TWO single-quotes together, I want to leave them alone. TIA for any and all help for a newbie! | |
Maxim: 11-Dec-2009 | easy, actually. you match double quotes first then fallback to single quotes, adding a new one and skiping one char... give me a minute I should get something working... | |
jack-ort: 11-Dec-2009 | Thanks! I'm going to have to look @ this for awhile to understand why you even need to worry about the double-quote character. Much to learn.... Thanks Maxim and Steeve for the prompt replies! | |
Rebolek: 11-Dec-2009 | Just curious, I tested both versions and Steeve's version is about 2times faster than Maxim's :) | |
Maxim: 11-Dec-2009 | actually, having a paypal account linked with your login and a "donate" button would be really nice :-) right in the chat tool. | |
Maxim: 11-Dec-2009 | I sure would use it... some people have helped save days of work with free code and insight. | |
Maxim: 12-Dec-2009 | I just adopted a new notation standard for parse rules... the goal is to make rules a bit more verbose as to the type of each rule token... I find this reads well in any direction, since we encouter the "=" character when reading from left to right or right to left... and parse rules often have to be read from right to left. example: =terminal=: [ =quote= copy terminal to =quote= skip (print ["found terminal: " terminal]) ] on very large rules, and with the syntax highlighting in my editor making the "=" signs very distinct, I can instantly detect what parts of my rules are other rules or character patterns... it also helps out in the declarations... I see when blocks are intended to be used as rules quite instantly where ever they are in my code. in my current little parser, I find I can edit my rules almost twice as fast and loose MUCH less time scanning my blocks to find the rule tokens, and switching them around. wonder what you guys think about it... | |
Maxim: 12-Dec-2009 | another example.... in this dense block of text, I can spot the =eol= (end of line) token instantly in both x and y dimensions of the rule paragraph: =line-comment=: [ =comment-symbol= [ [thru =eol= (print "comment to end of line")] |[to end] ] (print "success") ] | |
Maxim: 12-Dec-2009 | syntax highlighting colorizes words ... stuff is colorized... but user words aren't colorised and they all get mixed up between functions, variables and rules... and having colors which are two strong next to each other and in relative distribution ... cancels out. | |
Graham: 12-Dec-2009 | so you could write a parser that reads your rules and colorises them ... | |
Maxim: 12-Dec-2009 | I'm just trying to get a feel for what others think about the idea. and sharing a bit of a discovery at the same time, if it may help others. the goal isn't to be popular or convince others... and sorry, if my last line may have looked harsh, it wasn't. :-) I was just resuming your reaction plainly and relaunching the question to be sure others realize I want a few opinions. | |
PeterWood: 12-Dec-2009 | any others care to comment? I'm afraid t looks very messy to me and reminded me of Perl for some reasion. | |
Gregg: 13-Dec-2009 | For a long time I've added = to the end of my parse rules, and = to the beginning of parse variables. I think it matches the production rule grammar well, and also emulates set-word/get-word syntax. | |
Maxim: 13-Dec-2009 | I've used word= for other things before and I liked it. | |
Gregg: 14-Dec-2009 | Yup. Different mindset. I just looked at your BNF compiler earlier. Good stuff. I did an ABNF-to-parse generator some time back. ABNF is used in a lot of IETF RFCs and such. | |
Maxim: 14-Dec-2009 | that is nice, is your ABNF parser still accessiblel somewhere? it could improve the quatily and ease of integrating the protocols to R3 IMO. ABNF also seems much more aligned to parse | |
Maxim: 15-Dec-2009 | I've been rewriting bnf generated parse rules (and often a bit cryptically) into proper parse ordered rules for 3 days now... <sigh> C is sooo complex for what it really does. I''ve discovered a few quite mind-boggling language capabilities... stuff like: char *( *(*var)() )[10]; it takes 7 steps to define what that really is and there are other "fun" examples which end up being interpretation nightmares, but look really simple. one thing is certain at this point... although I will be able to build a C to rebol converter with relative precision under specific goals, some of the crazy stuff just will have to be finished manually by humans. at least I rarely see such twisted C code in most of what I've been reading so far. | |
BrianH: 16-Dec-2009 | BNF is just a syntax form, with a *lot* of variation. The real difference that matters between Yacc and PARSE is the parsing model. Yacc implements an LR parser (or some variant thereof), and PARSE implements a variant of TDPL parsing (related to PEG), though more powerful and with a completely different syntax. How you structure the parse rules depends on the parsing model, not the syntax. For instance, LR parsers tend to do recursion rather than iteration, and when they recurse the recrsive call tends to be on the left, with the distinguishing clause on the right. For PEG parsers, recursion goes the other way. This is not an error, this is a difference in parsing model. If you are translating from Yacc to PARSE, it's not just a syntax change. You have to reorganize the rules to match the new model. And watch out: Certain patterns are easier to express in some parsing models than in others. Some patterns aren't supported at all in some models, and so no amount of translation will help you. We chose the TDPL model for PARSE because it is more expressive than the LR model, so in theory you should be able to translate LR rules to PARSE with some topological twists (redoing the sturcture of the rules). However, there are patterns that you can express in PARSE that can't be translated to LR, even with topological changes. | |
Maxim: 16-Dec-2009 | my goal is to get the host code and OpenGL headers past the parsing phase. once that is done, I'll start work on adding the production phase. I still have to write the pre-processor, but that in fact is pretty straight forward. there are little rules and they are much more static and well defined on the MS web site. | |
Maxim: 16-Dec-2009 | the funny thing is that the C language reference on the MSDN is actually pretty well done... there are a lot of evil C examples for some of the more obscure parts of the language like pointers, structs and unions. funny thing is that some of the most complex things to express where the litteral constants! integers, with octal, hex notation... not as simple as some [digits] ;-) | |
Henrik: 24-Dec-2009 | Looking at the new WHILE keyword and I was quite baffled by Carl's use of it in his latest blog example. Then I read the docs and it didn't get much better: - WHILE is a variant of ANY - ANY stops, if input does not change - WHILE doesn't stop, even if input does not change What does "input does not change" mean? Is it about changing the parse series length during parse? Is it actively moving the parse index back or forth using special commands? Is it normal progression of parse index with each cycle of WHILE or ANY? Is it alteration of the parse series content while maintaining length during parse? | |
Pekr: 24-Dec-2009 | Henrik - according to docs explanation, 'parse contains some internal protection for the case, when input stream does not advance its position. In R2, following code causes infinite loop, in R3, it returns false: parse str [some [to "abc"]] (I am not sure I like that it returns false - normally I expect it to cause infinite loop. This is imo overprotecting programmer, and you have to think, why your code returns false anyway, which for me is the same, as if it would cause an infinite loop) Further from docs: To avoid infinite looping, a special internal rule is triggered based on the fact that the rule did not change the input position. However, this shows a problem with this rule: parse str [some [to "a" remove thru "b"]] Here the input did not appear to advance, but something useful happened. In such cases, the some word should not be used, and the while word is better: parse str [while [to "a" remove thru "b"]] | |
Pekr: 24-Dec-2009 | I don't probably understand usefullness of 'while at all. Because now I have to think, if my code would cause infinite loop, or not, and use 'some or 'while accordingly ... | |
Pekr: 24-Dec-2009 | Running above examples, my opinion is, that in fact adding 'while was probably not a good decision. I can understand, that now we have more power - our code will not easily cause an infinite loops, but otoh you now have to think, if it can happen or not, and 'some becomes your enemy ... | |
Fork: 28-Dec-2009 | ?? not initialized after first match? And secondly, how do I match thru a series of things (e.g. integer! integer!, but just wondering about the thte. ?? problem before the first match?) |
23101 / 48606 | 1 | 2 | 3 | 4 | 5 | ... | 230 | 231 | [232] | 233 | 234 | ... | 483 | 484 | 485 | 486 | 487 |