[REBOL] Re: 'Parse is peculiar!
From: brett:codeconscious at: 14-Dec-2000 20:25
Howdy,
I'll address your immediate questions first then make a stab at explaining
what is happening.
> >> digits: charset "0123456789"
> == make bitset! #{000000000000FF0300000... [snip] ...000}
> >> line1: {Lets find "Julie<1234>"}
> >> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> Julie<1234>
> == true
>
> but what if I don't want to include the <xxxx> in 'name?
>
> >> parse line1 [thru {"} copy name [to {<} 4 digits {>} (print name)] to
end]
> -------------------------------------| Note the 'thru changed to 'to
> == false
>
> That doesn't make sense. Now the second problem:
It actually does make sense. Imagine a cursor on your line
to {<}
positions your cursor just before the {<}
Then your rule says it must be followed by 4 digits, but < is not a digit so
your rule fails.
> >> line2: {Lets find "J<o>hn<1234>"}
> >> parse line2 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> == false
>
> So there's the second problem, parse seems to get stuck on '<o>'. I would
assume
> the sub rule should only match a string containing '<xxxx>' and x is a
digit.
That right. You had a rule that began with matching a <. Parse now expects a
digit but you dissappointed it by giving it an o.
> Please don't ask me to use 'find or to change the data structure, or to
parse the
> results twice. I want to understand why 'parse doesn't return the results
I
> expect.
Best way to learn.
> Finally the follow line causes the rebol console to hang:
>
> >>parse line2 [thru {"} copy name [some [to {<}]] to end]
You put parse into an infinite loop.
> and once again I can't get my head around it. Clearly the manual needs
more
> detail on 'parse.
After many times reading it and finally getting my head around parse I
realise the manual is accurate. It is maybe deficient in not getting people
to think in the "right" way from the start. It takes longer to understand
parse because early on you can create rules that work 90% of the time, and
then all of a sudden after a small change don't work at all. The problem for
me was not parse, it was how I was thinking.
If you allow me a little licence, here is how I understand parse works.
The rules that you give parse are like hypotheses. Imagine you develop a
theory that you hope will explain the input. You give the "theory" (rules)
to the Parse function to check to see if you were right.
To check you rules, parse conducts experiments. It moves through the input
matching what it sees with your explanation. Each rule you give parse has
to complete to be successful. If it completes then the input that was
explained by that rule is left behind has having been dealt with. Parse
ticks off the successful rule and gets the next appropriate rule. In order
to tick off compound rules - those enclosed with a "[" and "]" - parse will
have to tick off each nested rule recursively.
If Parse finds that the rule fails to explain the input, it will discard the
rule and backtrack in the input to the point where it started trying to
match the rule that failed. Then it sees if you have anything left in your
theory to describe what it is seeing. If not parse returns a value to you
indicating that your theory was "false".
If parse runs out of input, but your theory hasn't finished (you proposed
that there should be more there than there is), parse will again return a
value of "false"
If after processing everything, parse finds your theory was accurate you get
the value "true" returned to you.
Some of the valid keywords that you can put in a parse rule do not have any
effect on your "theory". They exist to allow side effects to occur while
parse is working through the "experiment".
Ok, some example rules. Each of these is a single rule and parse will need
to tick each off as being successful.
Rule Description
--------------------------
<
Expect the string consisting of a single less-than
character
thru {"} Expect 1 or more characters up to and including the
double-quote character
to {<} Expect 1 or more characters finishing with, but not
including the less-than character
4 digits Expect 4 occurrences of the pattern matched by the rule
named "digits".
copy name Set the word "name" to a copy of the input sequence that is
matched by the very next rule.
(print name) On encountering this execute it.
Hopefully this line of thinking clears it up a bit.
Brett.