Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Parsing HTML links (newbie question)

 [1/2] from: james::calaba::com at: 13-Jun-2001 12:49


Hi I've been looking through the documentation for parsing links from an HTML page with the structure .......<a someAttr="abc" href="def" someOtherAttr="ghi">link text goes here</a>........ I currently have parse page [ any [ thru {href="} copy link to {"} (do something) ] ] I really want to pick up the link text as well, so that I can create link/linkText pairs. I know this is a trivial problem - is the answer to use blocks of rules and break the <a> tag down piece by piece, or to mark the positions and then extract subsequently, or is there a one-pass solution (there probably is). Many thanks if you can help. James Carlyle

 [2/2] from: gjones05:mail:orion at: 13-Jun-2001 8:14


From: "James Carlyle"
> I've been looking through the documentation for parsing links from an HTML > page with the structure
<<quoted lines omitted: 12>>
> break the <a> tag down piece by piece, or to mark the positions and then > extract subsequently, or is there a one-pass solution (there probably is).
Hi, James. This solution places link url and text in a list. Note that it includes all text between <a ...> and </a>, which may not be ultimately what you want. list: copy [] parse page [ any [ thru "<a" thru {href="} copy url to {"} thru ">" copy link-text to "</a>" (append/only list reduce [url trim/lines link-text]) ] to end ] foreach l list [print l] Hope this gives you some ideas. --Scott Jones

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted