'Parse is peculiar!
[1/18] from: shannon:ains:au at: 14-Dec-2000 17:05
Hi REBOL Community,
The REBOL philosophy goes "Simple things should be simple". Well I have to say
that the 'parse function is an exception! I've had to use it extensively for
parsing log files but it has literally taken me months to do simple things -
compared to a few weeks for the rest of the core set. Normally I go over the
manual entry extensively looking for clues as to what is going wrong, this time
i'm really stuck. Look at this:
REBOL/View 0.10.38.3.1 28-Nov-2000
Copyright 2000 REBOL Technologies. All rights reserved.
>> digits: charset "0123456789"
== make bitset! #{000000000000FF0300000... [snip] ...000}
>> line1: {Lets find "Julie<1234>"}
>> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
Julie<1234>
== true
but what if I don't want to include the <xxxx> in 'name?
>> parse line1 [thru {"} copy name [to {<} 4 digits {>} (print name)] to end]
-------------------------------------| Note the 'thru changed to 'to
== false
That doesn't make sense. Now the second problem:
>> line2: {Lets find "J<o>hn<1234>"}
>> parse line2 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
== false
So there's the second problem, parse seems to get stuck on '<o>'. I would assume
the sub rule should only match a string containing '<xxxx>' and x is a digit.
Please don't ask me to use 'find or to change the data structure, or to parse the
results twice. I want to understand why 'parse doesn't return the results I
expect.
Finally the follow line causes the rebol console to hang:
>>parse line2 [thru {"} copy name [some [to {<}]] to end]
and once again I can't get my head around it. Clearly the manual needs more
detail on 'parse.
SpliFF
[2/18] from: petr:krenzelok:trz:cz at: 14-Dec-2000 8:33
Shannon Baker wrote:
> Hi REBOL Community,
> The REBOL philosophy goes "Simple things should be simple". Well I have to say
<<quoted lines omitted: 11>>
> Julie<1234>
> == true
huh, what? tried above code and it fails ;-) (print name) has to be after the closing
bracket, or you get:
->> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
** Script Error: name has no value
** Near: print name
> but what if I don't want to include the <xxxx> in 'name?
>
> >> parse line1 [thru {"} copy name [to {<} 4 digits {>} (print name)] to end]
> -------------------------------------| Note the 'thru changed to 'to
> == false
>
hmm, why is the result false you ask? Well, because by stating "to {<}" parser stops
just in front of < char, so to turn it into 'true you would have to do it following
way:
->> parse line1 [thru {"} copy name [to {<} {<} 4 digits {>}] (print name) to end]
Julie<1234>
== true
> That doesn't make sense. Now the second problem:
Does it make more sense? Well, you wanted just the name, so why to use the strange
syntax above? What about following code:
->> parse line1 [thru {"} copy name to {<} (print name) to end]
Julie
== true
> >> line2: {Lets find "J<o>hn<1234>"}
> >> parse line2 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
> == false
>
> So there's the second problem, parse seems to get stuck on '<o>'. I would assume
> the sub rule should only match a string containing '<xxxx>' and x is a digit.
Well, and you are right. It doesn't match and that's why you got 'false, no? What's
the problem here?
->> alpha: charset [#"a" - #"z" #"A" - #"Z"]
== make bitset! #{
0000000000000000FEFFFF07FEFFFF0700000000000000000000000000000000
}
->> parse line2 [thru {"} copy name [any [some alpha | 4 digits | {>} | {<}]] (print
name) to end]
J<o>hn<1234>
== true
I know it's not so powerful expression, as it will also match any combination of
sequences of alphas, 4 digits, and separately <, >, so it would match even some
<>Jo1234><1234>
to better suite your needs:
>> parse line2 [thru {"} copy name [some alpha {<} some [some alpha {>} some alpha
{<} 4 digits {>} | 4 digits {>}]] (print name) to end]
J<o>hn<1234>
== true
> Please don't ask me to use 'find or to change the data structure, or to parse the
> results twice. I want to understand why 'parse doesn't return the results I
> expect.
>
> Finally the follow line causes the rebol console to hang:
>
> >>parse line2 [thru {"} copy name [some [to {<}]] to end]
:-) well, just because by using "to" statement you will be put in front of "<" and
then again and again and again ... untill you finally skip damned "<" ;-)
šjust change it to "thru" or try to match "<"
Cheers,
-pekr-
[3/18] from: al:bri:xtra at: 14-Dec-2000 21:35
> but what if I don't want to include the <xxxx> in 'name?
>
> >> parse line1 [thru {"} copy name [to {<} 4 digits {>} (print name)] to
end]
> -------------------------------------| Note the 'thru changed to 'to
> == false
>
> That doesn't make sense.
'to recognises the "<" and stops just before the "<" in the string. You'll
need to use:
thru "<"
to get "thru" it.
> Now the second problem:
>
> >> line2: {Lets find "J<o>hn<1234>"}
> >> parse line2 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> == false
>
> So there's the second problem, parse seems to get stuck on '<o>'. I would
assume the sub rule should only match a string containing '<xxxx>' and x is
a digit.
No, the rule hasn't yet got there.
{Lets find "J<o>hn<1234>"} ; Before
[thru {"}
{J<o>hn<1234>"} ; After [thru {"}
thru {<}
{o>hn<1234>"} ; After [thru {<}
4 digits
Parse fails here because "o" (lower case letter "O")
isn't a character that matches the 'digits bitset.
> Finally the follow line causes the rebol console to hang:
>
> >>parse line2 [thru {"} copy name [some [to {<}]] to end]
>
> and once again I can't get my head around it.
{Lets find "J<o>hn<1234>"} ; Before
[some [to {<}]]
{<o>hn<1234>"} ; After
Note that 'to doesn't go "thru" it's match. 'some specifies one or more
matches of the thing to the right. Because 'to doesn't go through it's
match, 'some keeps repeating the 'to "<" as it keeps matching (and not going
through the input). So you have an infinite loop, so the rebol console
hangs. I believe this is called "greedy parsing".
> Clearly the manual needs more detail on 'parse.
I agree. Rebol is still in a state of flux, though.
I hope that helps!
Andrew Martin
No longer so greedy...
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
[4/18] from: al:bri:xtra at: 14-Dec-2000 21:51
SpliFF wrote:
> >> line1: {Lets find "Julie<1234>"}
> >> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> Julie<1234>
> == true
>
> but what if I don't want to include the <xxxx> in 'name?
>> parse line1 [thru {"} copy Name to "<" "<" 4 digits {>"} end (print
Name)]
Julie
== true
A nicer way would be:
>> parse line1 [thru {"} copy Name to ["<" 4 digits {>"}] end (print Name)]
** Script Error: Invalid argument: < 4 digits >"
** Near: parse line1 [thru {"} copy Name to ["<" 4 digits {>"}] end (print
Name)]
but unfortunately, 'to isn't yet smart enough to understand a block of
rules.
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
[5/18] from: al:bri:xtra at: 14-Dec-2000 22:05
> A nicer way would be:
> >> parse line1 [thru {"} copy Name to ["<" 4 digits {>"}] end (print
Name)]
> ** Script Error: Invalid argument: < 4 digits >"
> ** Near: parse line1 [thru {"} copy Name to ["<" 4 digits {>"}] end (print
Name)]
And better still might be:
parse line1 [thru {"} copy Name to ["<" 4 digits {>"}] to end (print
Name)]
Note the 'to before 'end.
Andrew Martin
My excuse is I forgot...
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
[6/18] from: brett:codeconscious at: 14-Dec-2000 20:25
Howdy,
I'll address your immediate questions first then make a stab at explaining
what is happening.
> >> digits: charset "0123456789"
> == make bitset! #{000000000000FF0300000... [snip] ...000}
> >> line1: {Lets find "Julie<1234>"}
> >> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> Julie<1234>
> == true
>
> but what if I don't want to include the <xxxx> in 'name?
>
> >> parse line1 [thru {"} copy name [to {<} 4 digits {>} (print name)] to
end]
> -------------------------------------| Note the 'thru changed to 'to
> == false
>
> That doesn't make sense. Now the second problem:
It actually does make sense. Imagine a cursor on your line
to {<}
positions your cursor just before the {<}
Then your rule says it must be followed by 4 digits, but < is not a digit so
your rule fails.
> >> line2: {Lets find "J<o>hn<1234>"}
> >> parse line2 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]
> == false
>
> So there's the second problem, parse seems to get stuck on '<o>'. I would
assume
> the sub rule should only match a string containing '<xxxx>' and x is a
digit.
That right. You had a rule that began with matching a <. Parse now expects a
digit but you dissappointed it by giving it an o.
> Please don't ask me to use 'find or to change the data structure, or to
parse the
> results twice. I want to understand why 'parse doesn't return the results
I
> expect.
Best way to learn.
> Finally the follow line causes the rebol console to hang:
>
> >>parse line2 [thru {"} copy name [some [to {<}]] to end]
You put parse into an infinite loop.
> and once again I can't get my head around it. Clearly the manual needs
more
> detail on 'parse.
After many times reading it and finally getting my head around parse I
realise the manual is accurate. It is maybe deficient in not getting people
to think in the "right" way from the start. It takes longer to understand
parse because early on you can create rules that work 90% of the time, and
then all of a sudden after a small change don't work at all. The problem for
me was not parse, it was how I was thinking.
If you allow me a little licence, here is how I understand parse works.
The rules that you give parse are like hypotheses. Imagine you develop a
theory that you hope will explain the input. You give the "theory" (rules)
to the Parse function to check to see if you were right.
To check you rules, parse conducts experiments. It moves through the input
matching what it sees with your explanation. Each rule you give parse has
to complete to be successful. If it completes then the input that was
explained by that rule is left behind has having been dealt with. Parse
ticks off the successful rule and gets the next appropriate rule. In order
to tick off compound rules - those enclosed with a "[" and "]" - parse will
have to tick off each nested rule recursively.
If Parse finds that the rule fails to explain the input, it will discard the
rule and backtrack in the input to the point where it started trying to
match the rule that failed. Then it sees if you have anything left in your
theory to describe what it is seeing. If not parse returns a value to you
indicating that your theory was "false".
If parse runs out of input, but your theory hasn't finished (you proposed
that there should be more there than there is), parse will again return a
value of "false"
If after processing everything, parse finds your theory was accurate you get
the value "true" returned to you.
Some of the valid keywords that you can put in a parse rule do not have any
effect on your "theory". They exist to allow side effects to occur while
parse is working through the "experiment".
Ok, some example rules. Each of these is a single rule and parse will need
to tick each off as being successful.
Rule Description
--------------------------
<
Expect the string consisting of a single less-than
character
thru {"} Expect 1 or more characters up to and including the
double-quote character
to {<} Expect 1 or more characters finishing with, but not
including the less-than character
4 digits Expect 4 occurrences of the pattern matched by the rule
named "digits".
copy name Set the word "name" to a copy of the input sequence that is
matched by the very next rule.
(print name) On encountering this execute it.
Hopefully this line of thinking clears it up a bit.
Brett.
[7/18] from: shannon:ains:au at: 14-Dec-2000 21:25
Re: 'Parse is peculiar! - more details
Thanks Peter and Andrew, you both know your 'parse. Unfortunately your answers didn't
help me with the first issue. Perhaps I need to explain the problem more clearly. I have
a large collection of log files generated by a Counter-Strike games server. When a user
connects a line is generated that looks like these:
L 09/22/2000 - 15:37:25: "*`Ultimate_Master`*<4718>" <WON:26073391>" connected, ip
202.54.232.2
L 09/22/2000 - 15:43:10: "[IMPREA]Smart-Gun!!<4723><WON:23014199>" connected, ip
220.34.24.3
L 09/22/2000 - 15:47:30: "{[FrAg]}-MaN-<4727><WON:19220729>" connected, ip "132.76.43.24"
L 09/22/2000 - 15:49:22: "<Usyd> H4XX0R<124><WON:20007739>" connected, ip "160.34.64.112"
As you can see the game appears to impose few restrictions on the range of characters
and
spaces allowed in a name. It's also a stupid log format. This means I can't use:
parse line [thru {"} copy name to {<}]
...because names like <Usyd> H4XXOR would cause parse 'copy to return 'none or a partial
name. To make matters worse I just realised that the number following the name (the user
id) is not restricted to 4 digits like I originally thought. Therefore I need the
following behavoir from 'parse:
go thru {"} then copy all text to name until the UID pattern "< some digits >" is found.
I don't want the UID included in 'name though. In the lines above that would mean 'name
is set to "*`Ultimate_Master`*", "[IMPREA]Smart-Gun!!", "{[FrAg]}-MaN-" and "<Usyd>
H4XX0R" respectively.
I don't want to go thru the UID I want to go 'to it, but still get a 'true result.
Also Peter replied:
>> >> line1: {Lets find "Julie<1234>"}
>> >> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
<<quoted lines omitted: 5>>
>** Script Error: name has no value
>** Near: print name
I guarantee that this line works in the latest rebol/view (experimental) for Win9X. The
version is 'REBOL/View 0.10.38.3.1 28-Nov-2000'. Which version do you have? This looks
like inconsistent behavior. (print name) should execute after the closing '>' is matched.
Once again I realise it may be easier to do a find/last or some other trick but i'm more
interested in understanding how parse works. It would also make my code cleaner since
I
use the select convention to choose rules, as in:
REBOL ["CS Log Parser"]
; lines omited
digits: charset "0123456789"
search: func ["Returns matches for search item"
line [string!] "Line to search"
item [string!] {Valid types are "date", "time", "user", "won-id", "ip"} ]
[
if not value? 'item [item: ask {Search for? ("date", "time", "user", "won-id", "ip"):
}]
rules: [
"date" [thru {L } copy match to { -} to end (print match)]
"time" [thru { - } copy match to {: "} to end (print match)]
"user" [thru {: "} copy match to {"} to end (print match)] ; this line
is wrong
"ip" [thru {ip "} copy match to {"} to end]
"won-id" [thru {<WON:} copy match to {>} to end]
]
parse line select rules item
]
; __________example_______________________________________________________
log-line: {L 09/22/2000 - 15:49:22: "<Usyd> H4XX0R<124><WON:20007739>" connected, ip
160.34.64.112
}
search log-line "user"
[8/18] from: shannon:ains:au at: 14-Dec-2000 21:38
NOTE: I think my last reply was blocked due to having one 'Re:' too many
in the sunbject line. If you already read this please ignore.
---- Message Begins ----
Thanks Peter and Andrew, you both know your 'parse. Unfortunately your
answers didn't
help me with the first issue. Perhaps I need to explain the problem more
clearly. I have
a large collection of log files generated by a Counter-Strike games
server. When a user
connects a line is generated that looks like these:
L 09/22/2000 - 15:37:25: "*`Ultimate_Master`*<4718>" <WON:26073391>"
connected, ip
202.54.232.2
L 09/22/2000 - 15:43:10: "[IMPREA]Smart-Gun!!<4723><WON:23014199>"
connected, ip
220.34.24.3
L 09/22/2000 - 15:47:30: "{[FrAg]}-MaN-<4727><WON:19220729>" connected,
ip "132.76.43.24"
L 09/22/2000 - 15:49:22: "<Usyd> H4XX0R<124><WON:20007739>" connected,
ip "160.34.64.112"
As you can see the game appears to impose few restrictions on the range
of characters and
spaces allowed in a name. It's also a stupid log format. This means I
can't use:
parse line [thru {"} copy name to {<}]
...because names like <Usyd> H4XXOR would cause parse 'copy to return
'none or a partial
name. To make matters worse I just realised that the number following
the name (the user
id) is not restricted to 4 digits like I originally thought. Therefore I
need the
following behavoir from 'parse:
go thru {"} then copy all text to name until the UID pattern "< some
digits >" is found.
I don't want the UID included in 'name though. In the lines above that
would mean 'name
is set to "*`Ultimate_Master`*", "[IMPREA]Smart-Gun!!", "{[FrAg]}-MaN-"
and "<Usyd>
H4XX0R" respectively.
I don't want to go thru the UID I want to go 'to it, but still get a
'true result.
Also Peter replied:
>> >> line1: {Lets find "Julie<1234>"}
>> >> parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
<<quoted lines omitted: 5>>
>** Script Error: name has no value
>** Near: print name
I guarantee that this line works in the latest rebol/view (experimental)
for Win9X. The
version is 'REBOL/View 0.10.38.3.1 28-Nov-2000'. Which version do you
have? This looks
like inconsistent behavior. (print name) should execute after the
closing '>' is matched.
Once again I realise it may be easier to do a find/last or some other
trick but i'm more
interested in understanding how parse works. It would also make my code
cleaner since I
use the select convention to choose rules, as in:
REBOL ["CS Log Parser"]
; lines omited
digits: charset "0123456789"
search: func ["Returns matches for search item"
line [string!] "Line to search"
item [string!] {Valid types are "date", "time", "user",
won-id
, "ip"} ]
[
if not value? 'item [item: ask {Search for? ("date", "time", "user",
won-id
, "ip"):
}]
rules: [
"date" [thru {L } copy match to { -} to end (print match)]
"time" [thru { - } copy match to {: "} to end (print
match)]
"user" [thru {: "} copy match to {"} to end (print
match)] ; this line
is wrong
"ip" [thru {ip "} copy match to {"} to end]
"won-id" [thru {<WON:} copy match to {>} to end]
]
parse line select rules item
]
;
__________example_______________________________________________________
log-line: {L 09/22/2000 - 15:49:22: "<Usyd> H4XX0R<124><WON:20007739>"
connected, ip
160.34.64.112
}
search log-line "user"
SpliFF
[9/18] from: brett:codeconscious at: 14-Dec-2000 22:36
> L 09/22/2000 - 15:37:25: "*`Ultimate_Master`*<4718>" <WON:26073391>"
connected, ip
> "202.54.232.2"
> L 09/22/2000 - 15:43:10: "[IMPREA]Smart-Gun!!<4723><WON:23014199>"
connected, ip
> "220.34.24.3"
> L 09/22/2000 - 15:47:30: "{[FrAg]}-MaN-<4727><WON:19220729>" connected, ip
132.76.43.24
> L 09/22/2000 - 15:49:22: "<Usyd> H4XX0R<124><WON:20007739>" connected, ip
160.34.64.112
> As you can see the game appears to impose few restrictions on the range of
characters and
> spaces allowed in a name. It's also a stupid log format.
Hmm. I agree really ugly log format...
It really isn't the best example for getting confident on parse.
But I can see it would be really good to get it to work :)
>Therefore I need the
> following behavoir from 'parse:
>
> go thru {"} then copy all text to name until the UID pattern "< some
digits >" is found.
> I don't want the UID included in 'name though. In the lines above that
would mean 'name
> is set to "*`Ultimate_Master`*", "[IMPREA]Smart-Gun!!", "{[FrAg]}-MaN-"
and "<Usyd>
> H4XX0R" respectively.
>
> I don't want to go thru the UID I want to go 'to it, but still get a 'true
result.
Is that log line you gave for Ultimate_Master accurate? There seems to be a
double-quote after the UID.
The problem that I can see is that somebody might have a name like <4444>.
They might get really wierd and do something like <4444><4444> as a name as
well.
The only consistency I can see for determining what is really after the name
is the string that starts <WON:
So I reckon on the basis of what I've seen, the only way to get the name
accurately is to find the UID accurately and to do that requires searching
backwards from before the <WON: for a <UID>.
The problem here is parse won't do this neatly in one go.
One idea then is to get parse to caputure from the beginning of the name
throught to the <WON and then call itself to process the internal bit - my
worry with this is to know if parse is reenterant or not.
Or better use parse to do as before but instead of calling itself, use find
instead. I know you are trying to avoid find, but it may actually be the
simplest to code and understand.
Alternatively you could probably get parse to backtrack in a fashion by
creating a truly evil parse rule. I managed to do this once but as I said -
it is truly evil (modifies the input stream and other horrors). :)
Anyway I'll wait to hear your thoughts and to find out if that double-quote
was real or not.
Brett.
[10/18] from: emptyhead:home:nl at: 14-Dec-2000 12:32
Because the game doesn't check the characters being inserted in the name the grammar
of
the logfile is not correct and not parseable thru a left to right parser. (not with this
'parse function).
This works fine if the plater does not have a name with the string {>"}
r-tag: [ ">" thru "<" ]
parse line [
thru {"} copy name thru {>"}
(
; parse the name in reverse.
reverse name
parse name [
{"}
copy won r-tag
copy num r-tag
copy name to end
(reverse name reverse won reverse num)
]
) to end
]
You can add more rules to parse the ip-numer and date.
Daan Oosterveld
Shannon Baker schreef:
[11/18] from: ingo:2b1 at: 14-Dec-2000 13:53
Hi Brett,
here's my take, assuming that "<WON:" is sure not to be in the name-string
(and it is the only thing you can be sure about ...
line1: {Lets find "Ju<l>ie<1234><WON:90966776>"}
main-rule: [ (next-part: "") thru {"} copy name to {<} some sub-rule to end ]
sub-rule: [ "<WON:" (print name) to end
| (if next-part <> "" [append name join "<" next-part] ) skip copy next-part to "<" ]
>> parse line1 main-rule
Ju<l>i<97868769>
OK, what do I do?
in main-rule first create an empty-string next-part (no copy needed, as the
string is not changed, the word gets assigned to new strings later on)
Up to copy name it's the same as before, but then we go into the sub-rule.
when entering sub-rule we are right before the "<" so we can test if it is
<WON:
by chance, if it is, w're done, print the name, and go to the end.
If it's not "<WON:" we first test if next-part is still "" (first pass) and
we either do nothing, or append a "<" and next-part to the name", then we
skip over the "<" (remember? that's where we entered the sub-rule!) and then
we copy the next-part of the name to the next "<" the rule ends here, but
we've told parse to check this rule multiple times (some), so we just reenter
the sub-rule ...
I hope this helps,
Ingo
Once upon a time Brett Handley spoketh thus:
> > L 09/22/2000 - 15:37:25: "*`Ultimate_Master`*<4718>" <WON:26073391>"
> connected, ip
<<quoted lines omitted: 54>>
> [rebol-request--rebol--com] with "unsubscribe" in the
> subject, without the quotes.
--
do http://www.2b1.de/
_ . _
ingo@)|_ /| _| _ <We ARE all ONE www._|_o _ _ ._ _
www./_|_) |o(_|(/_ We ARE all FREE> ingo@| |(_|o(_)| (_|
._| ._|
[12/18] from: arolls::bigpond::net::au at: 15-Dec-2000 1:18
Shannon,
If name has a value before the parse, then there is no
error and it returns true.
Can you check from a fresh start?
digits: charset "0123456789"
line1: {Lets find "Julie<1234>"}
parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to end]
Anton.
[13/18] from: vodalee::gte::net at: 14-Dec-2000 5:06
Re: 'Parse is peculiar!
I hate to butt in but I just stumbled across REBOL while surfing thru an ice storm
here in Texas. If you would like an example of 'A PARSE' function take a look at
OBJREXX. www2.hursley.ibm.com
For string handling, it's hard to beat REXX on any box.
The first thing that impressed me about REBOL -- they evidently have the Gregorian
Calendar right -- something Micro$oft, H_P et al. never have done. Date functions
are the first thing I check in any Computer Language. Also, REBOL's e-mail
handling techniques appear useful.
Bob Hamilton
Richardson, Texas
[mail--bobh--to]
[14/18] from: allenk:powerup:au at: 15-Dec-2000 7:00
Re: 'Parse is peculiar! - more details
Why not read/lines and then use
entry: parse line {"}
== ["L" "09/22/2000" "-" "15:49:22:" "<Usyd> H4XX0R<124><WON:20007739>"
connected,
"ip" "160.34.64.112"]
This will give you a consistent 8 part format. Can then use
entry/1 entry/2 etc for simple access to the results.
Cheers,
Allen K
[15/18] from: brett:codeconscious at: 15-Dec-2000 12:00
Hi Ingo,
Good one! I stand corrected, you don't need something evil to achieve it,
just another way of looking at it.
Your code fails on the fourth example though, because name has a value of
none. Just making a small change, it works
sub-rule: [
"<WON:" (print name) to end |
(
if next-part <> "" [if not name [name: copy ""]
append name join "<" next-part]
)
skip copy next-part to "<"
]
Brett
[16/18] from: ingo:2b1 at: 15-Dec-2000 8:19
Hi Brett,
found an error in my last post, it didn't work for "<<"
line1: {Lets find "Ju<<<l>ie<1234><WON:90966776>"}
main-rule: [ (next-part: "") thru {"} copy name to {<} some sub-rule to end ]
sub-rule: [ "<WON:" (print name) to end
| (if next-part <> "" [if none? next-part [next-part: ""] append name join "<" next-part]
)
skip copy next-part to "<" ]
>> parse line1 main-rule
Ju<<<l>i
(if "<" are directly following each other, name-part is set to none,
so I have to change to "")
kind regards,
Ingo
Once upon a time Ingo Hohmann spoketh thus:
> Hi Brett,
> here's my take, assuming that "<WON:" is sure not to be in the name-string
<<quoted lines omitted: 19>>
> I hope this helps,
> Ingo
--
YES! That's just me, just being! http://www.2b1.de/
We ARE all ONE --- [ingo--2b1--de] --- We ARE all FREE
[17/18] from: shannon::ains::net::au at: 16-Dec-2000 20:16
'Parse is peculiar! - fresh start
REBOL/View 0.10.38.3.1 28-Nov-2000
Copyright 2000 REBOL Technologies. All rights reserved.
>> do {
{ digits: charset "0123456789"
{ line1: {Lets find "Julie<1234>"}
{ parse line1 [thru {"} copy name [thru {<} 4 digits {>} (print name)] to
end]}
** Script Error: name has no value.
** Where: print name
>> parse line1 [thru {"} copy name [thru {<} 4 digits {>}] (print name) to end]}
Julie<1234>
You're right Anton. 'name is assigned outside the sub-block and can't be
referenced in it. Either the usual rebol word-within-context system doesn't
apply to parse or it doesn't assign name until it reachs the last ']' in the
sub-block. I assumed it would work similar to this:
>>name: "Anton" do [print name]
Anton
Anton wrote:
[18/18] from: arolls:bigpond:au at: 16-Dec-2000 23:16
Shannon,
> You're right Anton. 'name is assigned outside the sub-block and can't be
> referenced in it. Either the usual rebol word-within-context
> system doesn't
> apply to parse or it doesn't assign name until it reachs the last
> ']' in the
> sub-block.
That's not why! :) There is no such limitation.
It's the mistake pointed out earlier by Pekr.
The (print name) *should* be after the final end-bracket ]
in the sub-rule [thru {<} 4 digits {>}].
The copy expects a successful sub-rule after 'name,
before it assigns 'name a value.
But the sub-rule tries print out 'name first.
This is like trying to say:
>> name: rejoin ["<1234>" name]
** Script Error: name has no value.
** Where: name
You get this error because rejoin ["<1234>" name]
happens first. 'name has no value and therefore
can't be evaluated.
You probably didn't catch this problem with
your advanced parse because 'name was set a
value in an earlier test.
Try this out:
line1: "Ju<li>e<><><<1234>"
id: [thru "<" 3 4 digits ">"]
rule: [a: some [id | [skip b:]] (print copy/part a b)]
parse line1 rule
'a is set to the beginning of the input.
'some keeps trying to match the 'id.
If it can't, which is true for the first 12 characters,
it just 'skips over a character and sets 'b to that position.
After it has got through the guts of 'rule, it prints out
all the stuff between 'a and 'b.
Anton.
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted