Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Strange parsing behavior

 [1/13] from: robert:muench:robertmuench at: 2-Aug-2002 19:27


Hi, please have a look at this code and try it out: ---START rebol [] rule1: [start: copy string1 [to " - " (?? string1) | to newline] copy string2 to end] rule2: [start: copy string1 [to " - " (?? string1) 3 skip | to newline] copy string2 to end] text: "this is a - parsing test" parse/all text rule1 ?? string1 ?? string2 print "-----" parse/all text rule2 ?? string1 ?? String2 ---END
>> do %parsing-error.r
string1: "this is a - " string1: "this is a" string2: " - parsing test" ----- string1: "this is a" string1: "this is a - " string2: "parsing test" == "parsing test" I find some things very strange here: Rule1: - Why does string1 contain " - " when printed from inside the rule? The rule states 'to and not 'thru? IMO string1 should be "this is a" - After the parsing string1 holds what I did expect in the first place too. This indicates that the varaibles input gets copied to will be changed after the parsing. But when and how? Rule2: - This time the string1 is as expected inside the rule but changes to include the to " - " sequence after the parsing Any idea what's up here? Is this a bug? Robert

 [2/13] from: brett:codeconscious at: 3-Aug-2002 10:03


Hi Robert, I run your code and I get an error straight away: ** Script Error: string1 has no value ** Where: rejoin ** Near: mold name: get name So you probably had string1 set before you ran your test code.
> rule1: [start: copy string1 [to " - " (?? string1) | to newline] copy > string2 to end]
To start, lets just look at the first COPY in your Rule1. The variable is string1, the pattern is [to " - " (?? string1) | to newline] Rule1 looks bad to me because you are asking parse to COPY the input stream matched by the pattern into string1, but then from inside the pattern (and therefore before the pattern completes) you try to print string1. So the short answer to your first question is - nothing has been copied to string1 by the time you want to print it from inside the rule.
> - After the parsing string1 holds what I did expect in the > first place too. This indicates that the varaibles input > gets copied to will be changed after the parsing. But when and how?
When the pattern completes (after the first ] ). Compare with this:
>> parse/all "this is a - parsing test" [copy string1 to " - " (??
string1)] string1: "this is a" == false In this case the pattern is just to "- " and so copy has finished by the time string1 is printed.
> Rule2: > - This time the string1 is as expected inside the rule but > changes to include the to " - " sequence after the parsing > > Any idea what's up here? Is this a bug? Robert
Same explanation as for Rule1. Not a bug. Regards, Brett.

 [3/13] from: tomc::darkwing::uoregon::edu at: 2-Aug-2002 21:01


maybe this is what you were after rule1: [start: copy string1 [ [to " - " (?? string1)] | to newline] copy string2 to end] == [start: copy string1 [[to " - " (?? string1)] | to newline] copy string2 to end]
>> string1
== "this is a - "
>> string2
== "parsing test"
>>
notice the first optional matching pattern is in its own block. On Fri, 2 Aug 2002, Robert M. Muench wrote:

 [4/13] from: robert:muench:robertmuench at: 3-Aug-2002 8:39


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]
<<quoted lines omitted: 8>>
> >> string1 > == "this is a - "
Shouldn't this be "this is a"? IIRC to indicates parsing up to the pattern not thru it... I have the same effect here and IMO this is strange.
> notice the first optional matching pattern is in its own block.
I play around with this a bit to see if it helps but I don't thing so. Robert

 [5/13] from: robert:muench:robertmuench at: 3-Aug-2002 8:39


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]
<<quoted lines omitted: 7>>
> ** Near: mold name: get name > So you probably had string1 set before you ran your test code.
Hi, hmm... I used the link-client to test it, or there was a problem with line breaks?
> > rule1: [start: copy string1 [to " - " (?? string1) | to newline]
copy string2 to end]
> To start, lets just look at the first COPY in your Rule1. The > variable is string1, the pattern is > [to " - " (?? string1) | to newline]
So far we agree ;-)
> Rule1 looks bad to me because you are asking parse to COPY > the input stream matched by the pattern into string1, but > then from inside the pattern (and therefore before the > pattern completes) you try to print string1.
Well, this dependes how you expect parse to work. The rule [to " - " (?? string1) | to newline] is a choice and Rebol parse uses a one-shot evaluation while parsing, this means, it executes a rule as long as possible starting with the first rule from several choices. As soon as the rule, in this case to " - " could be parsed successfully, the rule block ends. I expect parse to be in sync with the progress of the parsing. So, after parse did to " - " I expect the copy operation as terminated because internally it look like this: Copy string to " - " which can be executed successfully. So the print should succeed. The question is what is the trigger for the copy: You say it's the end of the rule block (lat trigger) and I would expect as soon as copying makes sense (early trigger). From a debugging point of view an early trigger is much more useful, further it would allow to create context-sensitive grammars, as you could chnage parsing rules on the fly. Robert

 [6/13] from: brett:codeconscious at: 3-Aug-2002 18:11


> Well, this dependes how you expect parse to work.
I agree we have differing expectations. :^)
> As > soon as the rule, in this case to " - " could be parsed successfully, > the rule block ends.
I doubt that. It still has to process the paren! and to check if there is something after the paren!.
> The question is what is the trigger for the copy: You say it's the end > of the rule block (lat trigger) and I would expect as soon as copying > makes sense (early trigger).
Here are two interesting examples. In this first example the pattern given to COPY does not have a normal pattern to match on, but it does affect the parse position: parse [#a #b #c] [ #a pp: COPY block [ (pp: next next pp) :pp ] ] This example yields: == true
>> block
== [#b #c] In this second example the pattern given to COPY appears to match the stream initially but finishes by resetting the parse position to where it started at the beginning of COPY: parse [#a #b #c] [ #a pp: COPY block [#b #c :pp] #b #c ] This example yields: == true
>> block
== none I tend to think of Parse's COPY as equivalent to the REBOL's COPY/PART, and a parse pattern (rule) as a function that returns a series. Using this metaphor I can understand how the above two examples work (in fact I used the metaphor to dream up these odd examples in the first place). I wonder what you could expect from these two examples using the "early trigger" point of view.
> From a debugging point of view an early > trigger is much more useful, further it would allow to create > context-sensitive grammars, as you could chnage parsing rules on the > fly. Robert
Yes, debugging some rules can be challenging but I've never found it insurmountable. Parse as it works right now allows changing parsing rules on the fly. Here is another contrived example, this one shows a dynamic rule (perhaps the rules could be loaded from a file or database): stream-rules: [ 1 [#a #b #c] 2 [#d #e #f] 3 [#g #h #i] ] parse [1 #a #b #c 3 #g #h #i] [ (stream-rule: none) any [ copy section [ set num integer! (dynamic-rule: select stream-rules num) dynamic-rule ] (print mold section) ] ] Parse demonstrates a pretty flexible implementation! :^) Brett.

 [7/13] from: g:santilli:tiscalinet:it at: 3-Aug-2002 11:01


Hi Robert, On Saturday, August 3, 2002, 8:39:15 AM, you wrote: RMM> Well, this dependes how you expect parse to work. The rule RMM> [to " - " (?? string1) | to newline] is a choice and Rebol parse uses a Well, that is a sub-rule. The fact that it contains "choices" is just incidental. What do you expect from:
>> parse "this is a test" [copy word [to "a" to end]]
== true
>> word
== "this is a test" RMM> The question is what is the trigger for the copy: You say it's the end RMM> of the rule block (lat trigger) and I would expect as soon as copying It's the end of the sub-rule, since COPY copies what is matched by the next element in the rule, that is the sub-rule in this case. Place COPY *inside* the sub-rule if you want to get the result you were expecting. Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r

 [8/13] from: robert:muench:robertmuench at: 3-Aug-2002 13:04


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]
<<quoted lines omitted: 9>>
> matched by the next element in the rule, that is the > sub-rule in this case.
Ok, ok but I still find it a bit strange. In your example to "a" can be deleted and the result won't change. I would expect the resulting string to grow while parsing is done:
>> parse "this is a test" [copy word [to "a" (?? word) to end (??
word)]]
> word: "this is" > word: "this is a test" > Place COPY *inside* the sub-rule if > you want to get the result you were expecting.
Unfortunately this doesn't work for complex cases. Try the following, which can't be done: Text: "this-is-a test of some parsing^/ this-is-a - test of some parsing" The result should be Text: ["this-is-a test of some parsing" "this-is-a" "test of some parsing"] You have to solve the problem to trigger on " - " or 'newline whichever comes first ;-). Robert

 [9/13] from: g:santilli:tiscalinet:it at: 3-Aug-2002 17:03


Hi Robert, On Saturday, August 3, 2002, 1:04:43 PM, you wrote: RMM> Ok, ok but I still find it a bit strange. In your example to "a" can be RMM> deleted and the result won't change. Exactly, but simply because my example did nothing useful at all. :-) RMM> I would expect the resulting string RMM> to grow while parsing is done: Instead, the COPY is applied *after* the next element of the rule has been processed, and only if it succeeds. As Brett pointed out, you can think of it as: word: copy/part index-of-series-before-processing-next-element index-of-series-after-processing-next-element RMM> You have to solve the problem to trigger on " - " or 'newline whichever RMM> comes first ;-). Robert Is this a challenge? ;^)
>> Text: "this-is-a test of some parsing^/this-is-a - test of some parsing"
== {this-is-a test of some parsing this-is-a - test of some parsing}
>> result: []
== []
>> rule: [some [copy string sentence (append result string) separator]]
== [some [copy string sentence (append result string) separator]]
>> sentence: [some [some sentence-chars [tmp: separator :tmp break | some #" " | end]]]
== [some [some sentence-chars [tmp: separator :tmp break | some #" " | end]]]
>> separator: [" - " | #"^/"]
== [" - " | #"^/"]
>> sentence-chars: complement charset " ^/"
== make bitset! #{ FFFBFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF }
>> parse/all text rule
== false
>> result
== ["this-is-a test of some parsing" "this-is-a" "test of some parsing"] What's the prize? :-9 Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r

 [10/13] from: robert:muench:robertmuench at: 4-Aug-2002 16:50


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]
<<quoted lines omitted: 5>>
> rule has been processed, and only if it succeeds. As Brett > pointed out, you can think of it as
Hi, as said: I know about this pattern (at least now ;-)) but I still think it's not that smart it could be.
> Is this a challenge? ;^)
Why not.
> What's the prize? :-9
Well, you use the new 'break function, that's easy and gives a virtual cookie price :-)). Try it without the use of 'break ;-)) Robert

 [11/13] from: g:santilli:tiscalinet:it at: 4-Aug-2002 19:43


Hi Robert, On Sunday, August 4, 2002, 4:50:56 PM, you wrote: RMM> Well, you use the new 'break function, that's easy and gives a virtual RMM> cookie price :-)). Try it without the use of 'break ;-)) Robert I admit is a bit trickier, but it only requires a few changes.
>> Text: "this-is-a test of some parsing^/this-is-a - test of some parsing"
== {this-is-a test of some parsing this-is-a - test of some parsing}
>> result: []
== []
>> rule: [some [copy string sentence (append result string) separator]]
== [some [copy string sentence (append result string) separator]]
>> sentence: [any [some sentence-chars (break-rule: none) [tmp: separator :tmp (break-rule: [end skip]) | some #" " | end] tmp: break-rule] :tmp]
== [any [some sentence-chars (break-rule: none) [tmp: separator :tmp (break-rule: [end skip]) | some #" " | end] tmp: break-rule] :...
>> separator: [" - " | #"^/"]
== [" - " | #"^/"]
>> sentence-chars: complement charset " ^/"
== make bitset! #{ FFFBFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF }
>> parse/all text rule
== false
>> result
== ["this-is-a test of some parsing" "this-is-a" "test of some parsing"] Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amigan -- AGI L'Aquila -- REB: http://web.tiscali.it/rebol/index.r

 [12/13] from: robert:muench:robertmuench at: 5-Aug-2002 14:40


> -----Original Message----- > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]
<<quoted lines omitted: 3>>
> Subject: [REBOL] Re: Strange parsing behavior > I admit is a bit trickier, but it only requires a few changes.
Hi, than lets have a look.
> rule: [some [copy string sentence (append result string)separator]] > sentence: [
<<quoted lines omitted: 8>>
> :tmp > ]
Ok, I see. Using "end skip" to simulate the new 'break command is a very good trick! Well done! BTW: IMO the | end part can be removed without creating any problems. Robert

 [13/13] from: andreas:bolka:gmx at: 5-Aug-2002 18:30


I know Gabriele has provided a (very nice!) solution already, but I'm contributing that anyway. Saturday, August 3, 2002, 1:04:43 PM, Robert wrote:
> Try the following, which can't be done: > Text: "this-is-a test of some parsing^/
<<quoted lines omitted: 4>>
> You have to solve the problem to trigger on " - " or 'newline > whichever comes first ;-). Robert
generally, everything that can be described with an (E)BNF expression, can be parsed/validated by REBOL's 'parse. so here's the pure (? :) bnf-based 'parse rule-set: -- snip -- char: complement charset " -^/" name: [some [char | propws | prophyph2] opt [name]] propws: [" " [char | propws | prophyph | end]] prophyph: ["-" [char | prophyph2 | end]] prophyph2: ["-" [char | " " | prophyph2 | end]] sep: [newline | " - "] -- snap -- to try this out, use for example: -- snip -- teststr: {this-is-a test of some parsing^/this-is-a - test of some parsing} p_name: [ copy t_name name (probe t_name) ] parse/all teststr [ some [ p_name opt [ sep ] ] end ] -- snap-- -- Best regards, Andreas mailto:[andreas--bolka--gmx--net]

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted