[REBOL] Unix REBOL/Command trick
From: jeff::rebol::com at: 18-May-2001 11:23
This example needs COMMAND features of REBOL and unix.
Okay, we all know that REBOL does a great job parsing html, but the
fact is there are some other good utilities for this out there that
do particular tasks exclusively well and we might as well let them
do their jobs and work with REBOL.
Besides, by offloading some of the work to a separate app we keep
the work out of the controlling script's memory space. Enough
preamble.. here's a silly trick for getting all the links from a
web page very fast using the unix text browser 'lynx':
urls-from-url: func [url [url!] /local out lnks][
out: copy ""
call/wait/output rejoin ["lynx -dump '" url "'"] out
load find/reverse/tail tail out "^/References^/"
]
'lynx -dump' produces a nice fully loadable output of all the urls
on that page starting below the word "References" . What you get
back is a block containing: [1 http://example 2 mailto:[foo--bar] ...]
Lynx does a good job of parsing the page for links quickly for you
(dealing with all the quirky sloppiness of href's) and you get to
get on with more interesting tasks, like what to do with all those
urls you just snarfed.. :-)
Caveat: You may need a little error checking around the LOAD because
URLs, being what they are, sometimes have bogus chars and REBOL
might not recognize them, but the condition is pretty rare. Depends
on where you're mining...
Another trick from the "whattajoy" department. (:
-jeff