Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Unix REBOL/Command trick

From: jeff::rebol::com at: 18-May-2001 11:23

This example needs COMMAND features of REBOL and unix. Okay, we all know that REBOL does a great job parsing html, but the fact is there are some other good utilities for this out there that do particular tasks exclusively well and we might as well let them do their jobs and work with REBOL. Besides, by offloading some of the work to a separate app we keep the work out of the controlling script's memory space. Enough preamble.. here's a silly trick for getting all the links from a web page very fast using the unix text browser 'lynx': urls-from-url: func [url [url!] /local out lnks][ out: copy "" call/wait/output rejoin ["lynx -dump '" url "'"] out load find/reverse/tail tail out "^/References^/" ] 'lynx -dump' produces a nice fully loadable output of all the urls on that page starting below the word "References" . What you get back is a block containing: [1 http://example 2 mailto:[foo--bar] ...] Lynx does a good job of parsing the page for links quickly for you (dealing with all the quirky sloppiness of href's) and you get to get on with more interesting tasks, like what to do with all those urls you just snarfed.. :-) Caveat: You may need a little error checking around the LOAD because URLs, being what they are, sometimes have bogus chars and REBOL might not recognize them, but the condition is pretty rare. Depends on where you're mining... Another trick from the "whattajoy" department. (: -jeff