r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Geomol
27-Apr-2011
[5648]
Think of the end of a series as an internal marker, which the user 
shouldn't see as an element in the series, if you ask me.
Ladislav
27-Apr-2011
[5649]
You are missing the point, why don't you read the THRU documentation 
instead of speculations?
Geomol
27-Apr-2011
[5650]
Ok, I will. I did long time ago, but maybe it changed, or I missed 
something the first time, or I forgot, how it works!? :-)
Ladislav
27-Apr-2011
[5651]
See the above reference to the doc article
Geomol
27-Apr-2011
[5652x3]
Oh, I got my understanding from http://www.rebol.com/docs/core23/rebolcore-15.html#section-4
The little example there is a good way to understand it:

page: read http://www.rebol.com/
parse page [thru <title> copy text to </title>]
print text
REBOL Technologies


I was thinking R2, maybe you guys talked about R3 only? Has this 
changed?
I think, the docs match pretty well.
thru	advance input thru a value or datatype


You're right, taking this strictly, we should be able to advance 
thru the end. But this doesn't make much sense, so my guess is, most 
people wouldn't take this strictly, when talking the end of a series.
The end specifies that nothing follows in the input stream. The entire 
input has been parsed.


I read it, as there isn't anymore to parse. So is it possible to 
parse past the end? I would say no.
Ladislav
27-Apr-2011
[5655x2]
This is just a speculative interpretation of a text of one example. 
See the documentation.
http://www.rebol.com/r3/docs/concepts/parsing-summary.html
Geomol
27-Apr-2011
[5657]
Keywords that accept a repeat count are:
...
end


So the advancing must stop somehow, as we can parse end multiple 
times.

>> parse [a b c] [to end]      
== true
>> parse [a b c] [to end 5 end]
== true


Anyway, this may be a pointless discussion, if the language isn't 
clearly defined in such detail.
Ladislav
27-Apr-2011
[5658x8]
Why don't you read the documentation?
It is explained in there, and it is the *only* place where it is 
explained in general, not just using one example.
The fact is, that when a rule matches, the cursor may (optionally) 
advance, but it does not need to.
In case the cursor advances, the head of the rule match is distinct 
from the tail of the rule match. As opposed to that, when the cursor 
does not advance, the head of the match is identical with the tail 
of the match.
Which is the case of e.g. the above NONE rule.
But, many rules can have this property, even in R2
And, surely, one of the rules having this property is the END rule.
So, while the TO rule advances to the head of the subrule match, 
the THRU rule advances to the tail of the subrule match, which happen 
to be identical in case the subrule match does not advance the cursor.
Geomol
27-Apr-2011
[5666]
Argh, I was confused by sentences like
Where do you think the cursor is after matching the [end] rule?
:-)
Old def. of thru: advance input thru a value or datatype

New def. of thru: scan forward in input for matching rules, advance 
input to tail of the match


Then it can be argued, the tail of end (of a series) is still the 
end. Or it can be argued, that thru always advance some way further 
(as in all cases except at end). I understand, why [thru end] not 
failing is confusing.

(And stop saying, I should read the doc. I have read it ... in full 
this time.) ;-)
Ladislav
27-Apr-2011
[5667x2]
it can be argued, that thru always advance some way further

 - actually it cannot be argued, taking into account, that it has 
 been documented
I was confused by sentences like
Where do you think the cursor is after matching the [end] rule?


Interesting, so, you do not know where the cursor is after matching 
the [end] rule? Otherwise, such a question cannot confuse anybody 
knowing where the cursor is.
Geomol
27-Apr-2011
[5669]
That I don't agree with. I don't see, it say anywhere in the doc, 
that thru does not advance the input.
Ladislav
27-Apr-2011
[5670x2]
That is not what I said
I said, that, in general, PARSE may, or may not advance the input 
after successfully matching a rule. Which is true.
Geomol
27-Apr-2011
[5672x3]
Interesting, so, you do not know where the cursor is after matching 
the [end] rule?

I assume, I know it, when it comes to parse in R2. I'm not sure with 
R3, as I don't have much experience with that.
Yes, true about parse may or may not.
I'm quite impressed (or surprised at least), how much parse has grown 
from R2 to R3. I haven't studied it close, but what's your opinion? 
Is all those rules necessary. I feel, it might be too complex to 
use?
Ladislav
27-Apr-2011
[5675x4]
And, I guess, that the "past" word does not express as clearly what 
Carl had in mind, and that he wanted to explain the general case, 
explaining all possible outcomes of subrule matching.
(in one sentence)
...how PARSE has grown from R2 to R3

 - actually, not at all. That is only a superficial difference. As 
 can be seen in the


http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions


article, (especially it is obvious when the "Idioms" section is examined), 
all the constructs from R3 are possible in R2 as well.
See also

http://www.rebol.org/view-script.r?script=parseen.r
Geomol
27-Apr-2011
[5679]
Oh, ok.
Ladislav
27-Apr-2011
[5680x3]
The "Idioms" section actually suggests, that even some R2 constructs 
are "superfluous" in the sense, that they can be derived from more 
elementary constructs like sequence and choice.
For example,

    a: [opt b]

is actually the same as

    a: [b |]
etc.
Geomol
27-Apr-2011
[5683]
Interesting. Could it be an idea to 'create' a minimum parse? Maybe 
just the specification.
Ladislav
27-Apr-2011
[5684]
You should examine the "Idioms" section to get the idea.
Geomol
27-Apr-2011
[5685x3]
I will.
Has anyone made PARSE as a function? It should be possible, right?
Found a trick to parse integers in blocks. Let's say, I want to parse 
this block: [year 2011]

The rule can't be ['year 2011], because 2011 in this case is a counter 
for number of next element (none here). So normally, I would do something 
like ['year set y integer! ( ... )] and checking the y variable and 
create a fail rule, in case it's not 2011. But this is the trick:

>> parse [year 2011] ['year 1 1 2011]
== true


Two numbers mean repeat the next pattern a number of times, and in 
this case, the pattern can be an integer itself.
onetom
27-Apr-2011
[5688]
:) nice
Gregg
27-Apr-2011
[5689]
I wouldn't call it a trick John, just a non-obvious syntax. I haven't 
used it much, but I wrote a func a long time ago when I needed it 
for something.

literalize-int-rules: func [template /local mark] [
; Turn a single integer value into a quantity-of-one integer
; rule for parse (e.g. 1 becomes 1 1 1, 4 becomes 1 1 4).
	rule: [
		any [
			into rule
			| mark: integer! (insert mark [1 1]) 2 skip 
			| skip
		]
	]
	parse template rule
	template
]
Ladislav
27-Apr-2011
[5690]
Yes, John, handling of such values has been discussed a while ago. 
That is why in R3 the QUOTE directive has been defined.
Geomol
28-Apr-2011
[5691x2]
Nice!
In http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Parse_idioms

The idiom
Description: "Range of times operator"
Operation: 	a: [m n b]
Idiom:	a: [m b (k: n - m) [k [b | c: fail] | :c]]


only seem to be true, when n >= m. When n < m, parse works as if 
the rule was
a: [n b]
Ladislav
28-Apr-2011
[5693x4]
That is somewhat surprising, do you see any difference?
(I don't)
aha, sorry, you are right
Corrected, should be better now.
Sunanda
29-Apr-2011
[5697]
Can an R2 parse expert help me with an efficient parse, please?

I've got a set of bbcode-type tags, eg:
    tags: [ "[a]" "[b]" "[cc]" ] 
    

And I've got a data string that includes those (and other) tags, 
eg:

    data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz"


What I'd like is the data string split at the designated tags, eg:

    [ "[a]" "aa aa" "[b]" "xxxx" "[a]" " yyyy[d]yyy" "[cc]" "dd[e]ddd" 
    "[b]" "" "[A]" "zz[zz" ]
    
Thanks!