Problems with parsing
[1/13] from: peter:carlsson:space:se at: 29-Nov-2001 14:28
Hello!
Any suggestions on how to make a parse rule for text
NOT including a special pattern?
Crystal clear?!?
Best regards,
Peter Carlsson
----------------------------------------------------------------
Peter Carlsson Tel: +46 31 735 45 26
Saab Ericsson Space AB Fax: +46 31 735 40 00
S-405 15 Göteborg Email: [peter--carlsson--space--se]
SWEDEN URL: http://www.space.se
[2/13] from: mario:cassani:icl at: 29-Nov-2001 16:20
Hallo Peter,
> Any suggestions on how to make a parse rule for text
> NOT including a special pattern?
'parse returns 'true or 'false if the rule is or isn't
matched so, if you make a rule to check if the pattern
exists:
not parse given-text pattern-rule
will be 'true if the pattern is not included.
> Crystal clear?!?
Hope this helps. If you share the piece of code making
you mad maybe helping you will be easiear as 'parse is a
beast to be tamed playing with the REBOL console and some
sample data and rules.
Mario
[3/13] from: brett:codeconscious at: 30-Nov-2001 11:41
Hi Peter,
> Any suggestions on how to make a parse rule for text
> NOT including a special pattern?
>
>Crystal clear?!?
Somewhat. It would be better to provide an example, because now I have to
show the solution *and* design an example :)
I have come across this sort of problem a few times and I thank Ladislav for
showing me a solution.
One example where you might do this is when you have a sub rule that might
consume
something needed
by an enclosing rule.
For my example, I'll parse a block rather than text but the concept still
applies. I want to parse the following block, and print
out every word, but if I encounter a "|" I'll print out the text
**********
:
my-block: [ the quick brown fox | jumped | over the lazy]
This next bit of code will not work. If you try it you will see that there
are no "*"s printed, instead you will see the "|":
single-word: [set item word! (print mold item)]
phrase: [some single-word]
parse my-block [ phrase some ['| (print "**********") phrase] ]
The thing to note is that | is a word too. Therefore the "|" is "consumed"
by the rule called SINGLE-WORD. So one way
to solve this is to give SINGLE-WORD some indigestion (make it fail) when it
encounters a "|". To do this I will use a dynamic rule - a rule that is
modified as parse is executing.
To force a rule to fail, make sure it cannot match anything any more. A way
to ensure this is to skip past the end of the input. This next rule is
guaranteed to fail every time:
always-fails: [to end skip]
Using this I now wrap SINGLE-WORD with a rule I call WORD-EXCEPT-BAR. The
purpose of this new rule is to fail if it finds the "|" word otherwise it
goes ahead and runs SINGLE-WORD. I also need to modify PHRASE to call
WORD-EXCEPT-BAR: The dynamic rule I mentioned earlier is called WEB. Here
are rules with the complex one split over multiple lines to improve
readability:
phrase: [some word-except-bar]
word-except-bar: [
[
'| (web: :always-fails) | (web: :single-word)
]
web
]
To finish off I'll create a function to call parse with the correct rule and
wrap the whole lot in an object just to be tidy:
word-parsing-object: context [
always-fails: [to end skip]
single-word: [set item word! (print mold item)]
word-except-bar: [ ['| (web: :always-fails) | (web: :single-word)]
web ]
phrase: [some word-except-bar]
set 'parse-words func[ a-block [block!] ] [
parse my-block [ phrase some ['| (print "**********") phrase] ]
]
]
Here is a test run:
>> parse-words [the quick brown fox | jumped | over the lazy]
the
quick
brown
fox
**********
jumped
**********
over
the
lazy
== true
HTH
Brett.
[4/13] from: rotenca:telvia:it at: 30-Nov-2001 2:52
Hi Brett,
why not:
rule: [any ['| (print "*******") opt rule | set item word! (print mold
item)]]
parse my-block rule
this, by design, return true also for a void block.
---
Ciao
Romano
[5/13] from: brett:codeconscious at: 30-Nov-2001 13:52
Hi Romano,
Thanks for your post. It is a good demonstration of how a problem can often
be thought about differently and therefore solved. Something important to
keep in mind with creating parse rules.
Something to think about regarding your solution is the ways that it is not
equivalent to mine. You already pointed out
that yours returns true for a void block by design. But yours also returns
true when '| is the first word in the block. Also
by design? :) It doesn't really matter if was or wasn't by design, but it
might be interesting to work out how you would change your rule to ensure
that '| is not the first word in the block.
However, my purpose wasn't to show how my example block could be parsed.
Peter asked for a rule that matched text NOT including a special pattern.
NOT is a useful operator in logic, I wonder why it is not in Parse as a
dialect keyword. That is, would it not be nice to have the following
statement return true?
parse [b] [not 'a]
Ladislav orginally solved this problem when I asked about it before. He has
some parse enhancements on his rebsite in the script called parseen.r. Worth
a look. I could have saved some typing by responding to Peter that his
question is answered
in parseen.r by Ladislav - though you may need to look twice or thrice and
learn something new to follow it - as is typical of Ladislav's work ;-)
Regards,
Brett.
[6/13] from: rotenca:telvia:it at: 30-Nov-2001 12:32
Hi, Brett
> Something to think about regarding your solution is the ways that it is not
> equivalent to mine. You already pointed out
> that yours returns true for a void block by design. But yours also returns
> true when '| is the first word in the block. Also
> by design? :)
No, of course :-)
> It doesn't really matter if was or wasn't by design, but it
> might be interesting to work out how you would change your rule to ensure
> that '| is not the first word in the block.
rule1: [some ['| (print "*******") opt rule1 | set item word! (print mold
item)]]
rule: [h: opt ['| (h: tail h)] :h rule1]
parse block rule
> However, my purpose wasn't to show how my example block could be parsed.
> Peter asked for a rule that matched text NOT including a special pattern.
I understand. My idea is that one should match first the special pattern and
take some consequent actions, like to put the input index to the end of block
and then asking at least any-type!.
> Ladislav orginally solved this problem when I asked about it before. He has
> some parse enhancements on his rebsite in the script called parseen.r. Worth
> a look. I could have saved some typing by responding to Peter that his
> question is answered
> in parseen.r by Ladislav - though you may need to look twice or thrice and
> learn something new to follow it - as is typical of Ladislav's work ;-)
But there is at least one little problem with Ladislav's not-rule:
>> nr: not-rule [1]
== [[[1] (finish: [end skip]) | (finish: [])] finish]
>> parse [1] nr
== false
>> parse [2] nr
== false
> Regards,
> Brett.
---
Ciao
Romano
[7/13] from: lmecir:mbox:vol:cz at: 30-Nov-2001 14:12
Hi Romano,
<<Romano>>
But there is at least one little problem with Ladislav's not-rule:
>> nr: not-rule [1]
== [[[1] (finish: [end skip]) | (finish: [])] finish]
>> parse [1] nr
== false
>> parse [2] nr
== false
> Regards,
> Brett.
---
Ciao
Romano
<</Romano>>
This is not a problem with my rule, it is by design. If you want to match a
block that doesn't match the Rule: [integer!] at the start, you have to use
it as follows:
rule: [integer!]
nr: not-rule rule
parse [1 a] [nr to end]
parse [a 1] [nr to end]
If you want to match a block that doesn't match the rule anywhere, you have
to write it as follows:
parse [a b c 1 d] [any [nr skip]]
parse [a b c d e] [any [nr skip]]
HTH
Ladislav
[8/13] from: lmecir:mbox:vol:cz at: 30-Nov-2001 14:28
Hi once again,
I am adding a variation on Romano's example to show how it can be solved:
nr: not-rule [1 1 1]
parse [1] [any [nr skip]]
parse [2] [any [nr skip]]
HTH
Ladislav
<<Romano>>
But there is at least one little problem with Ladislav's not-rule:
>> nr: not-rule [1]
== [[[1] (finish: [end skip]) | (finish: [])] finish]
>> parse [1] nr
== false
>> parse [2] nr
== false
> Regards,
> Brett.
---
Ciao
Romano
<</Romano>>
This is not a problem with my rule, it is by design. If you want to match a
block that doesn't match the Rule: [integer!] at the start, you have to use
it as follows:
rule: [integer!]
nr: not-rule rule
parse [1 a] [nr to end]
parse [a 1] [nr to end]
If you want to match a block that doesn't match the rule anywhere, you have
to write it as follows:
parse [a b c 1 d] [any [nr skip]]
parse [a b c d e] [any [nr skip]]
HTH
Ladislav
[9/13] from: peter:carlsson:space:se at: 30-Nov-2001 8:33
Hello!
To all who helped me out with the parsing problem I
finally found a solution by myself.
Thanks a lot anyway!
Best regards,
Peter Carlsson
----------------------------------------------------------------
Peter Carlsson Tel: +46 31 735 45 26
Saab Ericsson Space AB Fax: +46 31 735 40 00
S-405 15 Göteborg Email: [peter--carlsson--space--se]
SWEDEN URL: http://www.space.se
[10/13] from: mario::cassani::icl::com at: 30-Nov-2001 14:16
Hallo Peter,
> To all who helped me out with the parsing problem I
> finally found a solution by myself.
>
> Thanks a lot anyway!
can you please share it with us if it's different?
Zaijian
Mario
[11/13] from: lmecir:mbox:vol:cz at: 30-Nov-2001 15:42
Hi all,
I just uploaded a newer version of %parseen.r to my Rebsite
(Sites/Ladislav). It contains more examples, I left out the unnecessary A-B
rule and returned to iterative version of To-rule (the recursive version was
limited by the size of input).
Cheers
Ladislav
[12/13] from: rotenca:telvia:it at: 1-Dec-2001 17:01
Hi Ladislav,
>> But there is at least one little problem with Ladislav's not-rule:
> This is not a problem with my rule, it is by design.
I did not say it was a bug. Only that it must be adapted to the goal you want
to reach. And your examples confirm my idea. :-)
---
Ciao
Romano
[13/13] from: peter:carlsson:space:se at: 3-Dec-2001 7:31
At 16:00 2001-11-30 +0100, you wrote:
>Hallo Peter,
>
> > To all who helped me out with the parsing problem I
> > finally found a solution by myself.
> >
> > Thanks a lot anyway!
>
> can you please share it with us if it's different?
Well, I used another way to parse where I included some
more rules. That's all. But I will have a look at the
suggestions and see if I could use these instead.
Best regards,
Peter Carlsson
----------------------------------------------------------------
Peter Carlsson Tel: +46 31 735 45 26
Saab Ericsson Space AB Fax: +46 31 735 40 00
S-405 15 Göteborg Email: [peter--carlsson--space--se]
SWEDEN URL: http://www.space.se