Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parsing HTML links (newbie question)

From: gjones05:mail:orion at: 13-Jun-2001 8:14

From: "James Carlyle"
> I've been looking through the documentation for parsing links from an HTML > page with the structure > > .......<a someAttr="abc" href="def" someOtherAttr="ghi">link text goes > here</a>........ > > I currently have > > parse page [ > any [ > thru {href="} copy link to {"} > (do something) > ] > ] > > I really want to pick up the link text as well, so that I can create > link/linkText pairs. > > I know this is a trivial problem - is the answer to use blocks of rules and > break the <a> tag down piece by piece, or to mark the positions and then > extract subsequently, or is there a one-pass solution (there probably is).
Hi, James. This solution places link url and text in a list. Note that it includes all text between <a ...> and </a>, which may not be ultimately what you want. list: copy [] parse page [ any [ thru "<a" thru {href="} copy url to {"} thru ">" copy link-text to "</a>" (append/only list reduce [url trim/lines link-text]) ] to end ] foreach l list [print l] Hope this gives you some ideas. --Scott Jones