[REBOL] Re: Parsing HTML links (newbie question)
From: gjones05:mail:orion at: 13-Jun-2001 8:14
From: "James Carlyle"
> I've been looking through the documentation for parsing links from an HTML
> page with the structure
>
> .......<a someAttr="abc" href="def" someOtherAttr="ghi">link text goes
> here</a>........
>
> I currently have
>
> parse page [
> any [
> thru {href="} copy link to {"}
> (do something)
> ]
> ]
>
> I really want to pick up the link text as well, so that I can create
> link/linkText pairs.
>
> I know this is a trivial problem - is the answer to use blocks of rules and
> break the <a> tag down piece by piece, or to mark the positions and then
> extract subsequently, or is there a one-pass solution (there probably is).
Hi, James.
This solution places link url and text in a list. Note that it includes all
text between <a ...> and </a>, which may not be ultimately what you want.
list: copy []
parse page [
any [
thru "<a"
thru {href="}
copy url
to {"}
thru ">"
copy link-text
to "</a>"
(append/only list reduce [url trim/lines link-text])
]
to end
]
foreach l list [print l]
Hope this gives you some ideas.
--Scott Jones