• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

AltME groups: search

Help · search scripts · search articles · search mailing list

results summary

worldhits
r4wp5907
r3wp58701
total:64608

results window for this page: [start: 30701 end: 30800]

world-name: r3wp

Group: Parse ... Discussion of PARSE dialect [web-public]
BrianH:
16-Dec-2009
You might be better off translating a C grammar for a PEG or TDPL 
parser generator into PARSE - less topological shifts needed.
Maxim:
16-Dec-2009
there is all in all only two or three rules that I'm unsure of the 
transformation, as some aspects of the C syntax are a bit obscure 
to represent.
BrianH:
16-Dec-2009
No, really. The syntax of C is so complex that you would need a lot 
of data to test all of the common variations.
Maxim:
16-Dec-2009
the funny thing is that the C language reference on the MSDN is actually 
pretty well done... there are a lot of evil C examples for some of 
the more obscure parts of  the language like pointers, structs and 
unions.


funny thing is that some of the most complex things to express where 
the litteral constants!  integers, with octal, hex notation... not 
as simple as some [digits]  ;-)
Henrik:
24-Dec-2009
Looking at the new WHILE keyword and I was quite baffled by Carl's 
use of it in his latest blog example. Then I read the docs and it 
didn't get much better:

- WHILE is a variant of ANY
- ANY stops, if input does not change
- WHILE doesn't stop, even if input does not change

What does "input does not change" mean?

Is it about changing the parse series length during parse?

Is it actively moving the parse index back or forth using special 
commands?

Is it normal progression of parse index with each cycle of WHILE 
or ANY?

Is it alteration of the parse series content while maintaining length 
during parse?
Pekr:
24-Dec-2009
Henrik - according to docs explanation, 'parse contains some internal 
protection for the case, when input stream does not advance its position. 
In R2, following code causes infinite loop, in R3, it returns false:

parse str [some [to "abc"]]


(I am not sure I like that it returns false - normally I expect it 
to cause infinite loop. This is imo overprotecting programmer, and 
you have to think, why your code returns false anyway, which for 
me is the same, as if it would cause an infinite loop)

Further from docs:


To avoid infinite looping, a special internal rule is triggered based 
on the fact that the rule did not change the input position.

However, this shows a problem with this rule:

parse str [some [to "a" remove thru "b"]]


Here the input did not appear to advance, but something useful happened. 
In such cases, the some word should not be used, and the while word 
is better:

parse str [while [to "a" remove thru "b"]]
Pekr:
24-Dec-2009
Running above examples, my opinion is, that in fact adding 'while 
was probably not a good decision. I can understand, that now we have 
more power - our code will not easily cause an infinite loops, but 
otoh you now have to think, if it can happen or not, and 'some becomes 
your enemy ...
Ladislav:
25-Dec-2009
The WHILE keyword is the simplest possible cycle. The rule:

    a: [while b]

is equivalent to recursive:

    a: [b a]
Ladislav:
25-Dec-2009
sorry, I meant a: [b a |]
Fork:
28-Dec-2009
?? not initialized after first match?  And secondly, how do I match 
thru a series of things (e.g. integer! integer!, but just wondering 
about the thte.  ?? problem before the first match?)
Pekr:
28-Dec-2009
what do you mean by "match thru a series of things"?
Fork:
28-Dec-2009
Is a sequence of things one of the complex rules that you can't use 
in a thru?
BrianH:
28-Dec-2009
Yes. You can express a sequence of characters in a string as a string 
literal, but not a sequence of types in a block. You are going to 
need first sets and the other LL tricks for that.
Fork:
28-Dec-2009
>> parse [a b c] [(value: none) copy value to 3 skip to end (probe 
value)] 
[a b]
== true


>> parse [a b c] [(value: none) copy value thru 3 skip to end (probe 
value)]
[a b]
== true
Fork:
28-Dec-2009
Should the latter be [a b c] ?
Pekr:
28-Dec-2009
>> parse [a b c][?? 3 skip ??]
3: [a b c]
end!: []
== true
Pekr:
28-Dec-2009
to/thru were reimplemented to allow multiple options. There are cases, 
where they are not supposed to work, but in above case I would regard 
it being a bug .... unless some guru finds a theory showing us why 
it should be regarded being a correct result :-)
BrianH:
28-Dec-2009
Fork, the fact that both of those examples work incorrectly instead 
of throwing an error is a bug in PARSE. It should be CureCoded.
Fork:
28-Dec-2009
>> parse [a b c] [?? copy value thru 1 skip to end]              
            
co? : [a b c]
== true
BrianH:
28-Dec-2009
Seems like a Unicode to ANSI translation error.
Fork:
28-Dec-2009
>> parse [a b c] [?? copy value thru 1 skip to end]              
            
coo:: [a b c]
== true
Fork:
28-Dec-2009
Well, I should find a way to reproduce it before doing that.  Left 
a note about how getting a CureCode account didn't work the other 
day.
kcollins:
29-Dec-2009
Fork, are you seeing these outputs "coo", "thte", etc. on a Linux 
build of R3? I have seen similar corrupted output with Linux R3 when 
testing TCP client code, as documented in Curecode #1322.
Fork:
29-Dec-2009
kcollins: I'm using OS/X, I still haven't found a way to reproduce 
it.  Comes and goes.
Ladislav:
29-Dec-2009
e.g. 

    parse [a b c] [?? copy value thru 1 skip to end]   

should have preferably been

    parse [a b c] [?? copy value 1 skip to end]
Ladislav:
30-Dec-2009
Carl made a distinction in R3 blog, but they currently work the same, 
as far as I can tell, so, the only difference I see is, that ACCEPT 
is more self-explanatory.
Carl:
31-Dec-2009
In the rewrite of DECODE-CGI, that behavior of ANY forces me to write:

parse "" [any [end break | copy tmp to end]]


This seems wrong to me if we define ANY as a MATCHing function, not 
as a LOOP function. This topic has been debated a bit between a few 
of us, but I think it deserves more attention.
Carl:
31-Dec-2009
In other words, is ANY smart about the input?  If there is no input, 
why should it even try?


Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing 
method, not really as a MATCHing method.
Carl:
31-Dec-2009
It's a small thing, and maybe too late to change. I wanted to point 
it out.
Steeve:
31-Dec-2009
We have so much alternatives that i don't see this as a burden
Carl:
31-Dec-2009
There are a few ways to do it, but that is not my point.
BrianH:
6-Jan-2010
BenBran:
Not sure where to put this so asking here:


I downloaded a web script and it has a  snippet I don't understand:
buffer: make string! 1024         ;; contains the browser request
file: "index.html"
parse buffer ["get" ["http"  |   "/ "  |  copy file to " " ]]

what does:

copy file to " "

mean or do?
tia
BrianH:
6-Jan-2010
Sort of. The actual code is a little more complex, more like this:

either tmp: find data " " [file: if 0 < offset? data tmp [copy/part 
data tmp]] [break]
BrianH:
6-Jan-2010
The break being a parse match fail, and file being set to none for 
a zero-length match.
BrianH:
6-Jan-2010
That would return the file instead of setting a variable and not 
return false because of leftover input.
Graham:
14-Jan-2010
>> parse [ <tag> ] [ copy t tag! ]
== true
>> t
== [<tag>]

never noticed it made a block! before
ChristianE:
14-Jan-2010
There's a difference between COPY and SET in block parsing mode.
ChristianE:
14-Jan-2010
From the docs:

SET - set the next value to a variable
COPY - copy the next match sequence to a variable
Graham:
29-Jan-2010
<?xml version="1.0"?>

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SelectResponse 
xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><SelectResult><Item><Name>2010-01-29T09:54:48.000ZI3s3NjIxRjZERDI1MUY0QzQyMDk4M0JDMzkwMERGOEQxQTVDRDY5MzEwfQ==</Name><Attribute><Name>Subject</Name><Value>hello?</Value></Attribute><Attribute><Name>Userid</Name><Value>Guest</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:54:48.000Z</Value></Attribute></Item><Item><Name>2010-01-29T09:58:36.000ZI3swMTZBODg3QjAxNDQ2NEU5OENCNTA3OTc5OTg0Mjc1MTJGQzkxQTc0fQ==</Name><Attribute><Name>Subject</Name><Value>First 
Message</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:58:36.000Z</Value></Attribute></Item><Item><Name>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index 
working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute></Item></SelectResult><ResponseMetadata><RequestId>14873461-626a-44bf-2d7d-c1b23694b2e0</RequestId><BoxUsage>0.0000411449</BoxUsage></ResponseMetadata></SelectResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
Steeve:
29-Jan-2010
Is that result a block or string ?
Steeve:
29-Jan-2010
because in a string you can't find tag! values
Graham:
29-Jan-2010
It's a string ...
Graham:
29-Jan-2010
Yes, tags are a type of string ...
Steeve:
29-Jan-2010
>> parse "<a><item>" [thru <a> ??]
end!: "item>"
== false
Steeve:
29-Jan-2010
a bug
Steeve:
29-Jan-2010
It should say:
>> parse "<a><item>" [thru <a> ??]
end!: "<item>"
== false
Steeve:
29-Jan-2010
parsing thru a tag eat one more char
Graham:
29-Jan-2010
Ah .. ?? is a new debugging function
Steeve:
29-Jan-2010
you can, just replace <tag> by a real string "<tag>"
BrianH:
29-Jan-2010
And there is a great likelihood of the bugs being fixed in R3. And 
there aren't many in PARSE, just that tag bug afaik.
BrianH:
29-Jan-2010
Partially - it used to be worse. That's why it's marked a "problem".
Graham:
29-Jan-2010
only eats one char instead of two ... so that's a 50% improvement
BrianH:
29-Jan-2010
The worst was when someone "fixed" #10 to make it compatible with 
R2's buggy behavior. Bad fixes get marked as a problem.
Graham:
29-Jan-2010
I looked for a previous report on this bug but couldn't find it .. 
4 pages of bugs with parse in them.  I wonder if they can be filtered 
to only show active bugs
BrianH:
7-Feb-2010
TO and THRU have limited argument syntax, and don't support full 
rules. Both R2 and R3 support literal value arguments (that don't 
count as rules). R3 also supports a block of literal values delimited 
by |, and those values are less limted.
Steeve:
7-Feb-2010
Something weird !
Using a simple charset with TO or THRU should work.
But it fail here with R3.

digits: charset "134567890"

Something weird !
Using a simple charset with TO or THRU should work.
But it fail here with R3.

>> digits: charset "134567890"
>> parse "azaz 34" [to digits ??]
end!: "azaz 34"
BrianH:
7-Feb-2010
Steeve, that's a bug that I reported yesterday.
BrianH:
7-Feb-2010
Oh crap. Well, it was reported as a bug, and it's staying that way 
until Carl says otherwise :)
Gabriele:
7-Feb-2010
given that to and thru do "more" in R3, it probably is not bad to 
consider it a bug. (maybe it should be considered a bug in R2 as 
well, given that FIND does work with charsets...)
Graham:
8-Feb-2010
and finally a parse rule that works under r2 and r3

	parse/all txt [
		some [
			[ end | any nondigits ] [ date-rule | some digits  ] 
		]
	]
Sunanda:
13-Apr-2010
He does ask a lot of simpler questions :)
Ladislav:
13-Apr-2010
Yes, "it's faster than anything else, until it's not" is a perfect 
statement, and you got my agreement :-p
Henrik:
13-Apr-2010
a short string is one that is not long. :-)
Ladislav:
13-Apr-2010
Now, I can make a bold statement: for any method distinct from the 
one using PARSE and CHANGE/PART combo holds, that it is faster than 
the above method, until it's not :-p
Maxim:
13-Apr-2010
its not a single change/part which is the issue, its managing the 
stack, allocating all those blocks over and over... the sheer speed 
of the parse loop, blows away all the other looped/recursive algorythms 
in my usage so far.
Gregg:
15-Apr-2010
Petr, it may be more than fast enough for small cases, or where you 
don't need maximum performance (which is most of the time). The inefficiency 
comes from REBOL having to move things around when you insert things 
into a series (list! being a possible exception).
Ladislav:
16-Apr-2010
Please, if somebody finds a good refinement name, let us know.
ChristianE:
16-Apr-2010
Not being a native speaker I think you "change somthing in something", 
so that gives >> CHANGE/TO "ABC" "123" == 123
ChristianE:
16-Apr-2010
But it doesn't communicate very well the idea of changing to only 
a part of the second argument.
Maxim:
17-Apr-2010
/take is a new very usefull function in R3, it's a good idea to use 
it as a refinement to... IMHO
Maxim:
17-Apr-2010
Gab  YESSS!!!


it would also be nice if we could actually just set a soft-range 
to ANY series, removing the need for a specific datatype.
Maxim:
17-Apr-2010
and extra speed consideration of having to allocate/copy/destroy 
a series
ChristianE:
17-Apr-2010
That's said too much; I think it's more that CHANGE/PART behaves 
as advertised and the /PART refinement just happens to have a different 
meaning for INSERT or APPEND. 

Neither one of /WITH, /TO, /SPAN and /RANGE communicate very well 
that they refer to the second argument though, and /TAKE has the 
drawback of suggesting that it's taking away from the second argument 
like TAKE instead of leaving the second argument untouched. 

CHANGE/FROM, however, seems to work:

>> head change/from #abcdef #123456 3
== #123def
>> head change/part/from #abcdef #12345 1 3
== #123bcdef 


All that under the assumption that for compatibility, /PART in it's 
current meaning will stay as it is.
Steeve:
19-Apr-2010
Gregg, I used to use append/part to avoid the memory overhead of 
copy/part in many case.
Instead of doing like in the Ladislav's example. 
>> change/part something copy/part something-else range part.
I used  to do.

>> change/part something append/part clear #{} something-else range 
part.
It's not faster, but saves memory.


So, I don't know if it's a good idea to discard this use case from 
append and insert.
Ladislav:
19-Apr-2010
It does not matter that it is rare: if you can find any unexpected 
of the GC, you should put it to CureCode as a major bug
Steeve:
19-Apr-2010
It's not a bug to my mind, the GC never acted smoothly.
Ladislav:
19-Apr-2010
maybe I just misunderstood, then. If it is not a bug, then you are 
actually saying, that the GC collects everything as expected? If 
that is the case, then why the trouble to "save memory"?
florin:
24-May-2010
Is there a place for the newbie questions on parsing?
florin:
24-May-2010
I've created my very first script. The script loops through a list 
of email (Kerio) log files, extracts the IP addresses, compiles them 
in a list and adds them to a (Peerblock) list in order to limit incoming 
spam. I find rebol perfect for this.
florin:
24-May-2010
A rule can be: "=," etc. How do I "escape" the space character so 
that I can include in my rule?
florin:
24-May-2010
And the IP addresses are separatered by a space?
florin:
24-May-2010
Yes, parse/all is great, and this is why I want to include the space 
not as a delimiter but as a character in the rule. As if, sometimes 
I want to find two strings separated by a character.
PeterWood:
24-May-2010
>> a: "a b"

== "a b"

>> parse/all a ["a" " " "b"]

== true
florin:
24-May-2010
My script works, but you know how it goes. Once a question creeps 
in the brain, it needs an answer. Thank you.
Pekr:
24-May-2010
I would use #" ", or defined a space rule first: spaces: charset 
" ^-" (eventually include tab)
florin:
24-May-2010
Then, I said, read only from the last read, and pare the date/time. 
I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] 
But I hit a snag because of the space in between. I don't want date 
and time separater beause rebol can parse the string into a date-time 
easy. The space gave me trouble, and the brackets too.
Anton:
30-Jul-2010
Ok, continuing the discussion from "Performance" group, I'd like 
to ask for some help with parsing rebol format files.

Basically, I'd like to be able to extract a block near the beginning 
or end of a file, while minimizing disk access.

The files to be parsed could be large, so I don't want to load the 
entire contents, but chunks at a time.

So my parse rule should be able to detect when the input has been 
exhausted and ask for another chunk.

(When extracting a block near the end of a file, I'll have to parse 
in reverse, but I'll try to implement that later.)
Anton:
30-Jul-2010
Using LOAD/NEXT, I still have to use a O(n^2) algorithm. I'd now 
like to do my own parse, which can be O(n).
Anton:
30-Jul-2010
Which is why, in that algorithm, I had to iteratively: load a chunk, 
append it and try LOAD/NEXT until it succeeded.
Which gives the algorithm O(n^2) performance.
Anton:
30-Jul-2010
I imagine it could be useful in other similar situations, so I'd 
like it to be pretty general.

I suppose a bonus functionality is to be able to get nested blocks.

(And a super bonus will be to get any datatype at any level, but 
I won't bother doing that until I need it.)
Anton:
30-Jul-2010
Must it ?

I think if I can parse single-line strings correctly, then a bracket 
inside won't cause a problem.

This means I'll be basically ignoring datatypes which allow strings 
in their syntax, and just jumping to the string part.
Anton:
30-Jul-2010
I don't think there's any way to make any type with a literal bracket 
in it (except blocks, of course). (But I am worrying about that a 
bit.)
Anton:
30-Jul-2010
I tried to make some words with a single unmatched literal bracket, 
or literal string delimiter, but I failed so far. They don't load, 
so they won't be in well-formed rebol format files.
Anton:
30-Jul-2010
One caveat:

Misidentifying as a block, types like (what are they called?) "inline 
types"?
eg.  #[none]

If I don't recognise it as none! (or maybe issue!) , then I might 
accidentally take it as a block.
Anton:
30-Jul-2010
Does anyone have any advice on how I should structure this algorithm?

I don't feel confident as I haven't studied parsing theory deeply.
http://en.wikipedia.org/wiki/Parsing

Should I do lexical analysis and syntactic analysis separately ?

I think I can do it all with just one parse, but it might not be 
a good idea.
Anton:
30-Jul-2010
I'll make a start.
Anton:
30-Jul-2010
Having a look. Thanks for posting that.
Anton:
30-Jul-2010
I just found something interesting.

I remember Gabriele saying he thought PARSE would convert chars it 
encountered in its rule with strings before using, so these are equivalent:
	parse "a" [#"a"]
	parse "a" ["a"]

(Of course, the first one is a char and not a string, so consumes 
less memory.)

But I was just thinking it might be clearer to use strings instead 
of chars in the parse rule.
Then I discovered you can use issues:
	parse "a" [#a]

and the escape characters is interesting as you only need to type 
one of them in the issue:
	parse "^^" [#^]
Anton:
30-Jul-2010
Anyway, that's a side-issue.
BrianH:
30-Jul-2010
Anton, the cost of disk reads dwarfs the cost of LOAD/next. And PARSE 
is much slower at loading REBOL data than LOAD. You might consider 
finding out the max size of the value you are loading, rounded up 
to multiples of 4096 (disk blocks), and just READ/part a bit more 
than that from the disk for each file. Then LOAD/next from the resulting 
string. There is no reason to do speculative reads once you have 
an upper bound on the size you will need to read. In a language like 
REBOL, minimizing disk reads sometimes means minimizing the number 
of calls to READ, not just the amount read.
30701 / 6460812345...306307[308] 309310...643644645646647