AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

world	hits
r4wp	5907
r3wp	58701
total:	64608

results window for this page: [start: 30701 end: 30800]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
BrianH: 16-Dec-2009	You might be better off translating a C grammar for a PEG or TDPL parser generator into PARSE - less topological shifts needed.
Maxim: 16-Dec-2009	there is all in all only two or three rules that I'm unsure of the transformation, as some aspects of the C syntax are a bit obscure to represent.
BrianH: 16-Dec-2009	No, really. The syntax of C is so complex that you would need a lot of data to test all of the common variations.
Maxim: 16-Dec-2009	the funny thing is that the C language reference on the MSDN is actually pretty well done... there are a lot of evil C examples for some of the more obscure parts of the language like pointers, structs and unions. funny thing is that some of the most complex things to express where the litteral constants! integers, with octal, hex notation... not as simple as some [digits] ;-)
Henrik: 24-Dec-2009	Looking at the new WHILE keyword and I was quite baffled by Carl's use of it in his latest blog example. Then I read the docs and it didn't get much better: - WHILE is a variant of ANY - ANY stops, if input does not change - WHILE doesn't stop, even if input does not change What does "input does not change" mean? Is it about changing the parse series length during parse? Is it actively moving the parse index back or forth using special commands? Is it normal progression of parse index with each cycle of WHILE or ANY? Is it alteration of the parse series content while maintaining length during parse?
Pekr: 24-Dec-2009	Henrik - according to docs explanation, 'parse contains some internal protection for the case, when input stream does not advance its position. In R2, following code causes infinite loop, in R3, it returns false: parse str [some [to "abc"]] (I am not sure I like that it returns false - normally I expect it to cause infinite loop. This is imo overprotecting programmer, and you have to think, why your code returns false anyway, which for me is the same, as if it would cause an infinite loop) Further from docs: To avoid infinite looping, a special internal rule is triggered based on the fact that the rule did not change the input position. However, this shows a problem with this rule: parse str [some [to "a" remove thru "b"]] Here the input did not appear to advance, but something useful happened. In such cases, the some word should not be used, and the while word is better: parse str [while [to "a" remove thru "b"]]
Pekr: 24-Dec-2009	Running above examples, my opinion is, that in fact adding 'while was probably not a good decision. I can understand, that now we have more power - our code will not easily cause an infinite loops, but otoh you now have to think, if it can happen or not, and 'some becomes your enemy ...
Ladislav: 25-Dec-2009	The WHILE keyword is the simplest possible cycle. The rule: a: [while b] is equivalent to recursive: a: [b a]
Ladislav: 25-Dec-2009	sorry, I meant a: [b a \|]
Fork: 28-Dec-2009	?? not initialized after first match? And secondly, how do I match thru a series of things (e.g. integer! integer!, but just wondering about the thte. ?? problem before the first match?)
Pekr: 28-Dec-2009	what do you mean by "match thru a series of things"?
Fork: 28-Dec-2009	Is a sequence of things one of the complex rules that you can't use in a thru?
BrianH: 28-Dec-2009	Yes. You can express a sequence of characters in a string as a string literal, but not a sequence of types in a block. You are going to need first sets and the other LL tricks for that.
Fork: 28-Dec-2009	>> parse [a b c] [(value: none) copy value to 3 skip to end (probe value)] [a b] == true >> parse [a b c] [(value: none) copy value thru 3 skip to end (probe value)] [a b] == true
Fork: 28-Dec-2009	Should the latter be [a b c] ?
Pekr: 28-Dec-2009	>> parse [a b c][?? 3 skip ??] 3: [a b c] end!: [] == true
Pekr: 28-Dec-2009	to/thru were reimplemented to allow multiple options. There are cases, where they are not supposed to work, but in above case I would regard it being a bug .... unless some guru finds a theory showing us why it should be regarded being a correct result :-)
BrianH: 28-Dec-2009	Fork, the fact that both of those examples work incorrectly instead of throwing an error is a bug in PARSE. It should be CureCoded.
Fork: 28-Dec-2009	>> parse [a b c] [?? copy value thru 1 skip to end] co? : [a b c] == true
BrianH: 28-Dec-2009	Seems like a Unicode to ANSI translation error.
Fork: 28-Dec-2009	>> parse [a b c] [?? copy value thru 1 skip to end] coo:: [a b c] == true
Fork: 28-Dec-2009	Well, I should find a way to reproduce it before doing that. Left a note about how getting a CureCode account didn't work the other day.
kcollins: 29-Dec-2009	Fork, are you seeing these outputs "coo", "thte", etc. on a Linux build of R3? I have seen similar corrupted output with Linux R3 when testing TCP client code, as documented in Curecode #1322.
Fork: 29-Dec-2009	kcollins: I'm using OS/X, I still haven't found a way to reproduce it. Comes and goes.
Ladislav: 29-Dec-2009	e.g. parse [a b c] [?? copy value thru 1 skip to end] should have preferably been parse [a b c] [?? copy value 1 skip to end]
Ladislav: 30-Dec-2009	Carl made a distinction in R3 blog, but they currently work the same, as far as I can tell, so, the only difference I see is, that ACCEPT is more self-explanatory.
Carl: 31-Dec-2009	In the rewrite of DECODE-CGI, that behavior of ANY forces me to write: parse "" [any [end break \| copy tmp to end]] This seems wrong to me if we define ANY as a MATCHing function, not as a LOOP function. This topic has been debated a bit between a few of us, but I think it deserves more attention.
Carl: 31-Dec-2009	In other words, is ANY smart about the input? If there is no input, why should it even try? Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing method, not really as a MATCHing method.
Carl: 31-Dec-2009	It's a small thing, and maybe too late to change. I wanted to point it out.
Steeve: 31-Dec-2009	We have so much alternatives that i don't see this as a burden
Carl: 31-Dec-2009	There are a few ways to do it, but that is not my point.
BrianH: 6-Jan-2010	BenBran: Not sure where to put this so asking here: I downloaded a web script and it has a snippet I don't understand: buffer: make string! 1024 ;; contains the browser request file: "index.html" parse buffer ["get" ["http" \| "/ " \| copy file to " " ]] what does: copy file to " " mean or do? tia
BrianH: 6-Jan-2010	Sort of. The actual code is a little more complex, more like this: either tmp: find data " " [file: if 0 < offset? data tmp [copy/part data tmp]] [break]
BrianH: 6-Jan-2010	The break being a parse match fail, and file being set to none for a zero-length match.
BrianH: 6-Jan-2010	That would return the file instead of setting a variable and not return false because of leftover input.
Graham: 14-Jan-2010	>> parse [ <tag> ] [ copy t tag! ] == true >> t == [<tag>] never noticed it made a block! before
ChristianE: 14-Jan-2010	There's a difference between COPY and SET in block parsing mode.
ChristianE: 14-Jan-2010	From the docs: SET - set the next value to a variable COPY - copy the next match sequence to a variable
Graham: 29-Jan-2010	<?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SelectResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><SelectResult><Item><Name>2010-01-29T09:54:48.000ZI3s3NjIxRjZERDI1MUY0QzQyMDk4M0JDMzkwMERGOEQxQTVDRDY5MzEwfQ==</Name><Attribute><Name>Subject</Name><Value>hello?</Value></Attribute><Attribute><Name>Userid</Name><Value>Guest</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:54:48.000Z</Value></Attribute></Item><Item><Name>2010-01-29T09:58:36.000ZI3swMTZBODg3QjAxNDQ2NEU5OENCNTA3OTc5OTg0Mjc1MTJGQzkxQTc0fQ==</Name><Attribute><Name>Subject</Name><Value>First Message</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:58:36.000Z</Value></Attribute></Item><Item><Name>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute></Item></SelectResult><ResponseMetadata><RequestId>14873461-626a-44bf-2d7d-c1b23694b2e0</RequestId><BoxUsage>0.0000411449</BoxUsage></ResponseMetadata></SelectResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
Steeve: 29-Jan-2010	Is that result a block or string ?
Steeve: 29-Jan-2010	because in a string you can't find tag! values
Graham: 29-Jan-2010	It's a string ...
Graham: 29-Jan-2010	Yes, tags are a type of string ...
Steeve: 29-Jan-2010	>> parse "<a><item>" [thru <a> ??] end!: "item>" == false
Steeve: 29-Jan-2010	a bug
Steeve: 29-Jan-2010	It should say: >> parse "<a><item>" [thru <a> ??] end!: "<item>" == false
Steeve: 29-Jan-2010	parsing thru a tag eat one more char
Graham: 29-Jan-2010	Ah .. ?? is a new debugging function
Steeve: 29-Jan-2010	you can, just replace <tag> by a real string "<tag>"
BrianH: 29-Jan-2010	And there is a great likelihood of the bugs being fixed in R3. And there aren't many in PARSE, just that tag bug afaik.
BrianH: 29-Jan-2010	Partially - it used to be worse. That's why it's marked a "problem".
Graham: 29-Jan-2010	only eats one char instead of two ... so that's a 50% improvement
BrianH: 29-Jan-2010	The worst was when someone "fixed" #10 to make it compatible with R2's buggy behavior. Bad fixes get marked as a problem.
Graham: 29-Jan-2010	I looked for a previous report on this bug but couldn't find it .. 4 pages of bugs with parse in them. I wonder if they can be filtered to only show active bugs
BrianH: 7-Feb-2010	TO and THRU have limited argument syntax, and don't support full rules. Both R2 and R3 support literal value arguments (that don't count as rules). R3 also supports a block of literal values delimited by \|, and those values are less limted.
Steeve: 7-Feb-2010	Something weird ! Using a simple charset with TO or THRU should work. But it fail here with R3. digits: charset "134567890" Something weird ! Using a simple charset with TO or THRU should work. But it fail here with R3. >> digits: charset "134567890" >> parse "azaz 34" [to digits ??] end!: "azaz 34"
BrianH: 7-Feb-2010	Steeve, that's a bug that I reported yesterday.
BrianH: 7-Feb-2010	Oh crap. Well, it was reported as a bug, and it's staying that way until Carl says otherwise :)
Gabriele: 7-Feb-2010	given that to and thru do "more" in R3, it probably is not bad to consider it a bug. (maybe it should be considered a bug in R2 as well, given that FIND does work with charsets...)
Graham: 8-Feb-2010	and finally a parse rule that works under r2 and r3 parse/all txt [ some [ [ end \| any nondigits ] [ date-rule \| some digits ] ] ]
Sunanda: 13-Apr-2010	He does ask a lot of simpler questions :)
Ladislav: 13-Apr-2010	Yes, "it's faster than anything else, until it's not" is a perfect statement, and you got my agreement :-p
Henrik: 13-Apr-2010	a short string is one that is not long. :-)
Ladislav: 13-Apr-2010	Now, I can make a bold statement: for any method distinct from the one using PARSE and CHANGE/PART combo holds, that it is faster than the above method, until it's not :-p
Maxim: 13-Apr-2010	its not a single change/part which is the issue, its managing the stack, allocating all those blocks over and over... the sheer speed of the parse loop, blows away all the other looped/recursive algorythms in my usage so far.
Gregg: 15-Apr-2010	Petr, it may be more than fast enough for small cases, or where you don't need maximum performance (which is most of the time). The inefficiency comes from REBOL having to move things around when you insert things into a series (list! being a possible exception).
Ladislav: 16-Apr-2010	Please, if somebody finds a good refinement name, let us know.
ChristianE: 16-Apr-2010	Not being a native speaker I think you "change somthing in something", so that gives >> CHANGE/TO "ABC" "123" == 123
ChristianE: 16-Apr-2010	But it doesn't communicate very well the idea of changing to only a part of the second argument.
Maxim: 17-Apr-2010	/take is a new very usefull function in R3, it's a good idea to use it as a refinement to... IMHO
Maxim: 17-Apr-2010	Gab YESSS!!! it would also be nice if we could actually just set a soft-range to ANY series, removing the need for a specific datatype.
Maxim: 17-Apr-2010	and extra speed consideration of having to allocate/copy/destroy a series
ChristianE: 17-Apr-2010	That's said too much; I think it's more that CHANGE/PART behaves as advertised and the /PART refinement just happens to have a different meaning for INSERT or APPEND. Neither one of /WITH, /TO, /SPAN and /RANGE communicate very well that they refer to the second argument though, and /TAKE has the drawback of suggesting that it's taking away from the second argument like TAKE instead of leaving the second argument untouched. CHANGE/FROM, however, seems to work: >> head change/from #abcdef #123456 3 == #123def >> head change/part/from #abcdef #12345 1 3 == #123bcdef All that under the assumption that for compatibility, /PART in it's current meaning will stay as it is.
Steeve: 19-Apr-2010	Gregg, I used to use append/part to avoid the memory overhead of copy/part in many case. Instead of doing like in the Ladislav's example. >> change/part something copy/part something-else range part. I used to do. >> change/part something append/part clear #{} something-else range part. It's not faster, but saves memory. So, I don't know if it's a good idea to discard this use case from append and insert.
Ladislav: 19-Apr-2010	It does not matter that it is rare: if you can find any unexpected of the GC, you should put it to CureCode as a major bug
Steeve: 19-Apr-2010	It's not a bug to my mind, the GC never acted smoothly.
Ladislav: 19-Apr-2010	maybe I just misunderstood, then. If it is not a bug, then you are actually saying, that the GC collects everything as expected? If that is the case, then why the trouble to "save memory"?
florin: 24-May-2010	Is there a place for the newbie questions on parsing?
florin: 24-May-2010	I've created my very first script. The script loops through a list of email (Kerio) log files, extracts the IP addresses, compiles them in a list and adds them to a (Peerblock) list in order to limit incoming spam. I find rebol perfect for this.
florin: 24-May-2010	A rule can be: "=," etc. How do I "escape" the space character so that I can include in my rule?
florin: 24-May-2010	And the IP addresses are separatered by a space?
florin: 24-May-2010	Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character.
PeterWood: 24-May-2010	>> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true
florin: 24-May-2010	My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you.
Pekr: 24-May-2010	I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab)
florin: 24-May-2010	Then, I said, read only from the last read, and pare the date/time. I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] But I hit a snag because of the space in between. I don't want date and time separater beause rebol can parse the string into a date-time easy. The space gave me trouble, and the brackets too.
Anton: 30-Jul-2010	Ok, continuing the discussion from "Performance" group, I'd like to ask for some help with parsing rebol format files. Basically, I'd like to be able to extract a block near the beginning or end of a file, while minimizing disk access. The files to be parsed could be large, so I don't want to load the entire contents, but chunks at a time. So my parse rule should be able to detect when the input has been exhausted and ask for another chunk. (When extracting a block near the end of a file, I'll have to parse in reverse, but I'll try to implement that later.)
Anton: 30-Jul-2010	Using LOAD/NEXT, I still have to use a O(n^2) algorithm. I'd now like to do my own parse, which can be O(n).
Anton: 30-Jul-2010	Which is why, in that algorithm, I had to iteratively: load a chunk, append it and try LOAD/NEXT until it succeeded. Which gives the algorithm O(n^2) performance.
Anton: 30-Jul-2010	I imagine it could be useful in other similar situations, so I'd like it to be pretty general. I suppose a bonus functionality is to be able to get nested blocks. (And a super bonus will be to get any datatype at any level, but I won't bother doing that until I need it.)
Anton: 30-Jul-2010	Must it ? I think if I can parse single-line strings correctly, then a bracket inside won't cause a problem. This means I'll be basically ignoring datatypes which allow strings in their syntax, and just jumping to the string part.
Anton: 30-Jul-2010	I don't think there's any way to make any type with a literal bracket in it (except blocks, of course). (But I am worrying about that a bit.)
Anton: 30-Jul-2010	I tried to make some words with a single unmatched literal bracket, or literal string delimiter, but I failed so far. They don't load, so they won't be in well-formed rebol format files.
Anton: 30-Jul-2010	One caveat: Misidentifying as a block, types like (what are they called?) "inline types"? eg. #[none] If I don't recognise it as none! (or maybe issue!) , then I might accidentally take it as a block.
Anton: 30-Jul-2010	Does anyone have any advice on how I should structure this algorithm? I don't feel confident as I haven't studied parsing theory deeply. http://en.wikipedia.org/wiki/Parsing Should I do lexical analysis and syntactic analysis separately ? I think I can do it all with just one parse, but it might not be a good idea.
Anton: 30-Jul-2010	I'll make a start.
Anton: 30-Jul-2010	Having a look. Thanks for posting that.
Anton: 30-Jul-2010	I just found something interesting. I remember Gabriele saying he thought PARSE would convert chars it encountered in its rule with strings before using, so these are equivalent: parse "a" [#"a"] parse "a" ["a"] (Of course, the first one is a char and not a string, so consumes less memory.) But I was just thinking it might be clearer to use strings instead of chars in the parse rule. Then I discovered you can use issues: parse "a" [#a] and the escape characters is interesting as you only need to type one of them in the issue: parse "^^" [#^]
Anton: 30-Jul-2010	Anyway, that's a side-issue.
BrianH: 30-Jul-2010	Anton, the cost of disk reads dwarfs the cost of LOAD/next. And PARSE is much slower at loading REBOL data than LOAD. You might consider finding out the max size of the value you are loading, rounded up to multiples of 4096 (disk blocks), and just READ/part a bit more than that from the disk for each file. Then LOAD/next from the resulting string. There is no reason to do speculative reads once you have an upper bound on the size you will need to read. In a language like REBOL, minimizing disk reads sometimes means minimizing the number of calls to READ, not just the amount read.

30701 / 64608

[308]