[REBOL] Re: parsing html : is this correct ?
From: jjmmes:yaho:o:es at: 5-Jun-2002 23:37
I've checked the HTML manually and the sequence of
tags is
proper set of
1. <script ... </script>
and then an orphan (unnoticed by browsers)
2. </script>
and finally
3. <script ... </script>
The parsing stops just before the orphan </script>,
which I don't understand . The rule should go beyond 2
!
You can check the real html at http://www.abc.es
Thanks
--- Anton <[anton--lexicon--net]> escribió: > Jose,
> Your parse rule looks fine to me.
> I tested out your parse rule with long
> strings of matching <script></script> pairs,
> but I didn't see any problems.
>
> I would ask you to look at your input
> more carefully. Maybe there is something in
> there that tricks this rule.
>
> Do this:
> - Save a copy of your input.
> - Cut selected pieces out of your input so that it
> still
> breaks your rule. Save each time.
> - When you can't cut any more out, look at what you
> have left, and if you can't figure it out, post the
> input
> here and we can have a look.
>
> Anton.
>
> > I use the following parse code to remove scripting
> > from the html before I do other parsing. This
> seems to
> > work fine for all pages, but I just found a page
> with
> > lots of script tags and it only removes the first
> 86
> > and leaves the last one.
> >
> > What am I doing wrong ?
> >
> > Thanks
> > Jose
> > -----------------------------------------------
> > parse/all html [ any [
> > to "<script" mark1:
> > thru "/script>" mark2:
> > (remove/part mark1 mark2)
> > :mark1
> > ] to end
> > ]
> >
>
> --
> To unsubscribe from this list, please send an email
> to
> [rebol-request--rebol--com] with "unsubscribe" in the
> subject, without the quotes.
>
_______________________________________________________________
Copa del Mundo de la FIFA 2002
El único lugar de Internet con vídeos de los 64 partidos.
¡Apúntante ya! en http://fifaworldcup.yahoo.com/fc/es/