Parsing HTML links (newbie question)
[1/2] from: james::calaba::com at: 13-Jun-2001 12:49
Hi
I've been looking through the documentation for parsing links from an HTML
page with the structure
.......<a someAttr="abc" href="def" someOtherAttr="ghi">link text goes
here</a>........
I currently have
parse page [
any [
thru {href="} copy link to {"}
(do something)
]
]
I really want to pick up the link text as well, so that I can create
link/linkText pairs.
I know this is a trivial problem - is the answer to use blocks of rules and
break the <a> tag down piece by piece, or to mark the positions and then
extract subsequently, or is there a one-pass solution (there probably is).
Many thanks if you can help.
James Carlyle
[2/2] from: gjones05:mail:orion at: 13-Jun-2001 8:14
From: "James Carlyle"
> I've been looking through the documentation for parsing links from an HTML
> page with the structure
<<quoted lines omitted: 12>>
> break the <a> tag down piece by piece, or to mark the positions and then
> extract subsequently, or is there a one-pass solution (there probably is).
Hi, James.
This solution places link url and text in a list. Note that it includes all
text between <a ...> and </a>, which may not be ultimately what you want.
list: copy []
parse page [
any [
thru "<a"
thru {href="}
copy url
to {"}
thru ">"
copy link-text
to "</a>"
(append/only list reduce [url trim/lines link-text])
]
to end
]
foreach l list [print l]
Hope this gives you some ideas.
--Scott Jones
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted