Parsing optional tags
[1/6] from: geza67:freestart:hu at: 4-Nov-2001 16:00
Hello REBOLers,
Beyond View-related stuff, IMHO parsing is the most underdocumented
but most powerful feature of REBOL. Now I am stucking with a rather
trivial problem :
I want to collect texts embedded in tags:
<b>some_text</b> or
<b><i>some_text</i></b>
Finding the beginning of the text is obvious, but matching the
closing tags is NOT symmetrical :-( - as I thought:
parse html [any [
thru <b> [<i> | none] copy Collect to [</i> | none] </b>
(append Collects Collect)
]]
... does not work as expected: Collects is full of 'none values.
Could you help me, please?
--
Best regards,
Geza Lakner MD mailto:[geza67--freestart--hu]
[2/6] from: lmecir:mbox:vol:cz at: 4-Nov-2001 17:06
Hi Geza,
how about:
i-tag: [<i> copy collect to </i>]
not-i-tag: [copy collect to </b>]
b-tag: [[i-tag | not-i-tag] </b> (append collects collect)]
htmlrule: [[to <b> | to end] [<b> b-tag htmlrule | none]]
parse html htmlrule
Look out! not tested
Cheers
Ladislav
[3/6] from: lmecir:mbox:vol:cz at: 4-Nov-2001 18:05
Hi myself,
I should have tested it, here is an improved version:
i-tag: [copy collect to </i> </i>]
not-i-tag: [copy collect to </b>]
b-tag: [[<i> (following: i-tag) | (following: not-i-tag)] following (append
collects collect) </b>]
htmlrule: [[thru <b> (follow: [b-tag htmlrule]) | to end (follow: [none])]
follow]
parse html htmlrule
Cheers
Ladislav
[4/6] from: office::thousand-hills::net at: 4-Nov-2001 12:03
I have used this reading by lines:
parse [ <B><I> ] parse [ </B></> ]
John
At 04:00 PM 11/4/2001 +0100, you wrote:
[5/6] from: geza67:freestart:hu at: 4-Nov-2001 23:53
Hello Ladislav,
> i-tag: [copy collect to </i> </i>]
> not-i-tag: [copy collect to </b>]
<<quoted lines omitted: 3>>
> follow]
> parse html htmlrule
Thank you for the working version but I just can't believe it has to
be so complicated - building a (in a way) complete syntax to strip off
two lousy HTML-tags ... :-(
If this is the more-or-less only "obvious" way to do it in the REBOL
parse dialect, I'll better stick to the old-fashioned way and build a
regex and feed it through sed/awk/vim or something like that.
--
Best regards,
Geza mailto:[geza67--freestart--hu]
[6/6] from: lmecir:mbox:vol:cz at: 5-Nov-2001 0:35
Hi Geza,
it doesn't have to be so complicated, my version probably checks more things
than you needed (check if the tags are balanced etc.). There obviously is a
simpler way how to do this.
Hello Ladislav,
> i-tag: [copy collect to </i> </i>]
> not-i-tag: [copy collect to </b>]
> b-tag: [[<i> (following: i-tag) | (following: not-i-tag)] following
(append
> collects collect) </b>]
> htmlrule: [[thru <b> (follow: [b-tag htmlrule]) | to end (follow: [none])]
> follow]
> parse html htmlrule
Thank you for the working version but I just can't believe it has to
be so complicated - building a (in a way) complete syntax to strip off
two lousy HTML-tags ... :-(
If this is the more-or-less only "obvious" way to do it in the REBOL
parse dialect, I'll better stick to the old-fashioned way and build a
regex and feed it through sed/awk/vim or something like that.
--
Best regards,
Geza mailto:[geza67--freestart--hu]
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted