AltME groups: search
Help · search scripts · search articles · search mailing listresults summary
world | hits |
r4wp | 5907 |
r3wp | 58701 |
total: | 64608 |
results window for this page: [start: 30701 end: 30800]
world-name: r3wp
Group: Parse ... Discussion of PARSE dialect [web-public] | ||
BrianH: 16-Dec-2009 | You might be better off translating a C grammar for a PEG or TDPL parser generator into PARSE - less topological shifts needed. | |
Maxim: 16-Dec-2009 | there is all in all only two or three rules that I'm unsure of the transformation, as some aspects of the C syntax are a bit obscure to represent. | |
BrianH: 16-Dec-2009 | No, really. The syntax of C is so complex that you would need a lot of data to test all of the common variations. | |
Maxim: 16-Dec-2009 | the funny thing is that the C language reference on the MSDN is actually pretty well done... there are a lot of evil C examples for some of the more obscure parts of the language like pointers, structs and unions. funny thing is that some of the most complex things to express where the litteral constants! integers, with octal, hex notation... not as simple as some [digits] ;-) | |
Henrik: 24-Dec-2009 | Looking at the new WHILE keyword and I was quite baffled by Carl's use of it in his latest blog example. Then I read the docs and it didn't get much better: - WHILE is a variant of ANY - ANY stops, if input does not change - WHILE doesn't stop, even if input does not change What does "input does not change" mean? Is it about changing the parse series length during parse? Is it actively moving the parse index back or forth using special commands? Is it normal progression of parse index with each cycle of WHILE or ANY? Is it alteration of the parse series content while maintaining length during parse? | |
Pekr: 24-Dec-2009 | Henrik - according to docs explanation, 'parse contains some internal protection for the case, when input stream does not advance its position. In R2, following code causes infinite loop, in R3, it returns false: parse str [some [to "abc"]] (I am not sure I like that it returns false - normally I expect it to cause infinite loop. This is imo overprotecting programmer, and you have to think, why your code returns false anyway, which for me is the same, as if it would cause an infinite loop) Further from docs: To avoid infinite looping, a special internal rule is triggered based on the fact that the rule did not change the input position. However, this shows a problem with this rule: parse str [some [to "a" remove thru "b"]] Here the input did not appear to advance, but something useful happened. In such cases, the some word should not be used, and the while word is better: parse str [while [to "a" remove thru "b"]] | |
Pekr: 24-Dec-2009 | Running above examples, my opinion is, that in fact adding 'while was probably not a good decision. I can understand, that now we have more power - our code will not easily cause an infinite loops, but otoh you now have to think, if it can happen or not, and 'some becomes your enemy ... | |
Ladislav: 25-Dec-2009 | The WHILE keyword is the simplest possible cycle. The rule: a: [while b] is equivalent to recursive: a: [b a] | |
Ladislav: 25-Dec-2009 | sorry, I meant a: [b a |] | |
Fork: 28-Dec-2009 | ?? not initialized after first match? And secondly, how do I match thru a series of things (e.g. integer! integer!, but just wondering about the thte. ?? problem before the first match?) | |
Pekr: 28-Dec-2009 | what do you mean by "match thru a series of things"? | |
Fork: 28-Dec-2009 | Is a sequence of things one of the complex rules that you can't use in a thru? | |
BrianH: 28-Dec-2009 | Yes. You can express a sequence of characters in a string as a string literal, but not a sequence of types in a block. You are going to need first sets and the other LL tricks for that. | |
Fork: 28-Dec-2009 | >> parse [a b c] [(value: none) copy value to 3 skip to end (probe value)] [a b] == true >> parse [a b c] [(value: none) copy value thru 3 skip to end (probe value)] [a b] == true | |
Fork: 28-Dec-2009 | Should the latter be [a b c] ? | |
Pekr: 28-Dec-2009 | >> parse [a b c][?? 3 skip ??] 3: [a b c] end!: [] == true | |
Pekr: 28-Dec-2009 | to/thru were reimplemented to allow multiple options. There are cases, where they are not supposed to work, but in above case I would regard it being a bug .... unless some guru finds a theory showing us why it should be regarded being a correct result :-) | |
BrianH: 28-Dec-2009 | Fork, the fact that both of those examples work incorrectly instead of throwing an error is a bug in PARSE. It should be CureCoded. | |
Fork: 28-Dec-2009 | >> parse [a b c] [?? copy value thru 1 skip to end] co? : [a b c] == true | |
BrianH: 28-Dec-2009 | Seems like a Unicode to ANSI translation error. | |
Fork: 28-Dec-2009 | >> parse [a b c] [?? copy value thru 1 skip to end] coo:: [a b c] == true | |
Fork: 28-Dec-2009 | Well, I should find a way to reproduce it before doing that. Left a note about how getting a CureCode account didn't work the other day. | |
kcollins: 29-Dec-2009 | Fork, are you seeing these outputs "coo", "thte", etc. on a Linux build of R3? I have seen similar corrupted output with Linux R3 when testing TCP client code, as documented in Curecode #1322. | |
Fork: 29-Dec-2009 | kcollins: I'm using OS/X, I still haven't found a way to reproduce it. Comes and goes. | |
Ladislav: 29-Dec-2009 | e.g. parse [a b c] [?? copy value thru 1 skip to end] should have preferably been parse [a b c] [?? copy value 1 skip to end] | |
Ladislav: 30-Dec-2009 | Carl made a distinction in R3 blog, but they currently work the same, as far as I can tell, so, the only difference I see is, that ACCEPT is more self-explanatory. | |
Carl: 31-Dec-2009 | In the rewrite of DECODE-CGI, that behavior of ANY forces me to write: parse "" [any [end break | copy tmp to end]] This seems wrong to me if we define ANY as a MATCHing function, not as a LOOP function. This topic has been debated a bit between a few of us, but I think it deserves more attention. | |
Carl: 31-Dec-2009 | In other words, is ANY smart about the input? If there is no input, why should it even try? Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing method, not really as a MATCHing method. | |
Carl: 31-Dec-2009 | It's a small thing, and maybe too late to change. I wanted to point it out. | |
Steeve: 31-Dec-2009 | We have so much alternatives that i don't see this as a burden | |
Carl: 31-Dec-2009 | There are a few ways to do it, but that is not my point. | |
BrianH: 6-Jan-2010 | BenBran: Not sure where to put this so asking here: I downloaded a web script and it has a snippet I don't understand: buffer: make string! 1024 ;; contains the browser request file: "index.html" parse buffer ["get" ["http" | "/ " | copy file to " " ]] what does: copy file to " " mean or do? tia | |
BrianH: 6-Jan-2010 | Sort of. The actual code is a little more complex, more like this: either tmp: find data " " [file: if 0 < offset? data tmp [copy/part data tmp]] [break] | |
BrianH: 6-Jan-2010 | The break being a parse match fail, and file being set to none for a zero-length match. | |
BrianH: 6-Jan-2010 | That would return the file instead of setting a variable and not return false because of leftover input. | |
Graham: 14-Jan-2010 | >> parse [ <tag> ] [ copy t tag! ] == true >> t == [<tag>] never noticed it made a block! before | |
ChristianE: 14-Jan-2010 | There's a difference between COPY and SET in block parsing mode. | |
ChristianE: 14-Jan-2010 | From the docs: SET - set the next value to a variable COPY - copy the next match sequence to a variable | |
Graham: 29-Jan-2010 | <?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SelectResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><SelectResult><Item><Name>2010-01-29T09:54:48.000ZI3s3NjIxRjZERDI1MUY0QzQyMDk4M0JDMzkwMERGOEQxQTVDRDY5MzEwfQ==</Name><Attribute><Name>Subject</Name><Value>hello?</Value></Attribute><Attribute><Name>Userid</Name><Value>Guest</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:54:48.000Z</Value></Attribute></Item><Item><Name>2010-01-29T09:58:36.000ZI3swMTZBODg3QjAxNDQ2NEU5OENCNTA3OTc5OTg0Mjc1MTJGQzkxQTc0fQ==</Name><Attribute><Name>Subject</Name><Value>First Message</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:58:36.000Z</Value></Attribute></Item><Item><Name>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute></Item></SelectResult><ResponseMetadata><RequestId>14873461-626a-44bf-2d7d-c1b23694b2e0</RequestId><BoxUsage>0.0000411449</BoxUsage></ResponseMetadata></SelectResponse></SOAP-ENV:Body></SOAP-ENV:Envelope> | |
Steeve: 29-Jan-2010 | Is that result a block or string ? | |
Steeve: 29-Jan-2010 | because in a string you can't find tag! values | |
Graham: 29-Jan-2010 | It's a string ... | |
Graham: 29-Jan-2010 | Yes, tags are a type of string ... | |
Steeve: 29-Jan-2010 | >> parse "<a><item>" [thru <a> ??] end!: "item>" == false | |
Steeve: 29-Jan-2010 | a bug | |
Steeve: 29-Jan-2010 | It should say: >> parse "<a><item>" [thru <a> ??] end!: "<item>" == false | |
Steeve: 29-Jan-2010 | parsing thru a tag eat one more char | |
Graham: 29-Jan-2010 | Ah .. ?? is a new debugging function | |
Steeve: 29-Jan-2010 | you can, just replace <tag> by a real string "<tag>" | |
BrianH: 29-Jan-2010 | And there is a great likelihood of the bugs being fixed in R3. And there aren't many in PARSE, just that tag bug afaik. | |
BrianH: 29-Jan-2010 | Partially - it used to be worse. That's why it's marked a "problem". | |
Graham: 29-Jan-2010 | only eats one char instead of two ... so that's a 50% improvement | |
BrianH: 29-Jan-2010 | The worst was when someone "fixed" #10 to make it compatible with R2's buggy behavior. Bad fixes get marked as a problem. | |
Graham: 29-Jan-2010 | I looked for a previous report on this bug but couldn't find it .. 4 pages of bugs with parse in them. I wonder if they can be filtered to only show active bugs | |
BrianH: 7-Feb-2010 | TO and THRU have limited argument syntax, and don't support full rules. Both R2 and R3 support literal value arguments (that don't count as rules). R3 also supports a block of literal values delimited by |, and those values are less limted. | |
Steeve: 7-Feb-2010 | Something weird ! Using a simple charset with TO or THRU should work. But it fail here with R3. digits: charset "134567890" Something weird ! Using a simple charset with TO or THRU should work. But it fail here with R3. >> digits: charset "134567890" >> parse "azaz 34" [to digits ??] end!: "azaz 34" | |
BrianH: 7-Feb-2010 | Steeve, that's a bug that I reported yesterday. | |
BrianH: 7-Feb-2010 | Oh crap. Well, it was reported as a bug, and it's staying that way until Carl says otherwise :) | |
Gabriele: 7-Feb-2010 | given that to and thru do "more" in R3, it probably is not bad to consider it a bug. (maybe it should be considered a bug in R2 as well, given that FIND does work with charsets...) | |
Graham: 8-Feb-2010 | and finally a parse rule that works under r2 and r3 parse/all txt [ some [ [ end | any nondigits ] [ date-rule | some digits ] ] ] | |
Sunanda: 13-Apr-2010 | He does ask a lot of simpler questions :) | |
Ladislav: 13-Apr-2010 | Yes, "it's faster than anything else, until it's not" is a perfect statement, and you got my agreement :-p | |
Henrik: 13-Apr-2010 | a short string is one that is not long. :-) | |
Ladislav: 13-Apr-2010 | Now, I can make a bold statement: for any method distinct from the one using PARSE and CHANGE/PART combo holds, that it is faster than the above method, until it's not :-p | |
Maxim: 13-Apr-2010 | its not a single change/part which is the issue, its managing the stack, allocating all those blocks over and over... the sheer speed of the parse loop, blows away all the other looped/recursive algorythms in my usage so far. | |
Gregg: 15-Apr-2010 | Petr, it may be more than fast enough for small cases, or where you don't need maximum performance (which is most of the time). The inefficiency comes from REBOL having to move things around when you insert things into a series (list! being a possible exception). | |
Ladislav: 16-Apr-2010 | Please, if somebody finds a good refinement name, let us know. | |
ChristianE: 16-Apr-2010 | Not being a native speaker I think you "change somthing in something", so that gives >> CHANGE/TO "ABC" "123" == 123 | |
ChristianE: 16-Apr-2010 | But it doesn't communicate very well the idea of changing to only a part of the second argument. | |
Maxim: 17-Apr-2010 | /take is a new very usefull function in R3, it's a good idea to use it as a refinement to... IMHO | |
Maxim: 17-Apr-2010 | Gab YESSS!!! it would also be nice if we could actually just set a soft-range to ANY series, removing the need for a specific datatype. | |
Maxim: 17-Apr-2010 | and extra speed consideration of having to allocate/copy/destroy a series | |
ChristianE: 17-Apr-2010 | That's said too much; I think it's more that CHANGE/PART behaves as advertised and the /PART refinement just happens to have a different meaning for INSERT or APPEND. Neither one of /WITH, /TO, /SPAN and /RANGE communicate very well that they refer to the second argument though, and /TAKE has the drawback of suggesting that it's taking away from the second argument like TAKE instead of leaving the second argument untouched. CHANGE/FROM, however, seems to work: >> head change/from #abcdef #123456 3 == #123def >> head change/part/from #abcdef #12345 1 3 == #123bcdef All that under the assumption that for compatibility, /PART in it's current meaning will stay as it is. | |
Steeve: 19-Apr-2010 | Gregg, I used to use append/part to avoid the memory overhead of copy/part in many case. Instead of doing like in the Ladislav's example. >> change/part something copy/part something-else range part. I used to do. >> change/part something append/part clear #{} something-else range part. It's not faster, but saves memory. So, I don't know if it's a good idea to discard this use case from append and insert. | |
Ladislav: 19-Apr-2010 | It does not matter that it is rare: if you can find any unexpected of the GC, you should put it to CureCode as a major bug | |
Steeve: 19-Apr-2010 | It's not a bug to my mind, the GC never acted smoothly. | |
Ladislav: 19-Apr-2010 | maybe I just misunderstood, then. If it is not a bug, then you are actually saying, that the GC collects everything as expected? If that is the case, then why the trouble to "save memory"? | |
florin: 24-May-2010 | Is there a place for the newbie questions on parsing? | |
florin: 24-May-2010 | I've created my very first script. The script loops through a list of email (Kerio) log files, extracts the IP addresses, compiles them in a list and adds them to a (Peerblock) list in order to limit incoming spam. I find rebol perfect for this. | |
florin: 24-May-2010 | A rule can be: "=," etc. How do I "escape" the space character so that I can include in my rule? | |
florin: 24-May-2010 | And the IP addresses are separatered by a space? | |
florin: 24-May-2010 | Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character. | |
PeterWood: 24-May-2010 | >> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true | |
florin: 24-May-2010 | My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you. | |
Pekr: 24-May-2010 | I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab) | |
florin: 24-May-2010 | Then, I said, read only from the last read, and pare the date/time. I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] But I hit a snag because of the space in between. I don't want date and time separater beause rebol can parse the string into a date-time easy. The space gave me trouble, and the brackets too. | |
Anton: 30-Jul-2010 | Ok, continuing the discussion from "Performance" group, I'd like to ask for some help with parsing rebol format files. Basically, I'd like to be able to extract a block near the beginning or end of a file, while minimizing disk access. The files to be parsed could be large, so I don't want to load the entire contents, but chunks at a time. So my parse rule should be able to detect when the input has been exhausted and ask for another chunk. (When extracting a block near the end of a file, I'll have to parse in reverse, but I'll try to implement that later.) | |
Anton: 30-Jul-2010 | Using LOAD/NEXT, I still have to use a O(n^2) algorithm. I'd now like to do my own parse, which can be O(n). | |
Anton: 30-Jul-2010 | Which is why, in that algorithm, I had to iteratively: load a chunk, append it and try LOAD/NEXT until it succeeded. Which gives the algorithm O(n^2) performance. | |
Anton: 30-Jul-2010 | I imagine it could be useful in other similar situations, so I'd like it to be pretty general. I suppose a bonus functionality is to be able to get nested blocks. (And a super bonus will be to get any datatype at any level, but I won't bother doing that until I need it.) | |
Anton: 30-Jul-2010 | Must it ? I think if I can parse single-line strings correctly, then a bracket inside won't cause a problem. This means I'll be basically ignoring datatypes which allow strings in their syntax, and just jumping to the string part. | |
Anton: 30-Jul-2010 | I don't think there's any way to make any type with a literal bracket in it (except blocks, of course). (But I am worrying about that a bit.) | |
Anton: 30-Jul-2010 | I tried to make some words with a single unmatched literal bracket, or literal string delimiter, but I failed so far. They don't load, so they won't be in well-formed rebol format files. | |
Anton: 30-Jul-2010 | One caveat: Misidentifying as a block, types like (what are they called?) "inline types"? eg. #[none] If I don't recognise it as none! (or maybe issue!) , then I might accidentally take it as a block. | |
Anton: 30-Jul-2010 | Does anyone have any advice on how I should structure this algorithm? I don't feel confident as I haven't studied parsing theory deeply. http://en.wikipedia.org/wiki/Parsing Should I do lexical analysis and syntactic analysis separately ? I think I can do it all with just one parse, but it might not be a good idea. | |
Anton: 30-Jul-2010 | I'll make a start. | |
Anton: 30-Jul-2010 | Having a look. Thanks for posting that. | |
Anton: 30-Jul-2010 | I just found something interesting. I remember Gabriele saying he thought PARSE would convert chars it encountered in its rule with strings before using, so these are equivalent: parse "a" [#"a"] parse "a" ["a"] (Of course, the first one is a char and not a string, so consumes less memory.) But I was just thinking it might be clearer to use strings instead of chars in the parse rule. Then I discovered you can use issues: parse "a" [#a] and the escape characters is interesting as you only need to type one of them in the issue: parse "^^" [#^] | |
Anton: 30-Jul-2010 | Anyway, that's a side-issue. | |
BrianH: 30-Jul-2010 | Anton, the cost of disk reads dwarfs the cost of LOAD/next. And PARSE is much slower at loading REBOL data than LOAD. You might consider finding out the max size of the value you are loading, rounded up to multiples of 4096 (disk blocks), and just READ/part a bit more than that from the disk for each file. Then LOAD/next from the resulting string. There is no reason to do speculative reads once you have an upper bound on the size you will need to read. In a language like REBOL, minimizing disk reads sometimes means minimizing the number of calls to READ, not just the amount read. |
30701 / 64608 | 1 | 2 | 3 | 4 | 5 | ... | 306 | 307 | [308] | 309 | 310 | ... | 643 | 644 | 645 | 646 | 647 |