[REBOL] Re: The "It's Mine Now and I'll Do What I Want With It" Project Proposal
From: koolauscott::yahoo::com at: 10-Mar-2001 11:27
Yes, you can do this. I'm working on a similar project
right now. It's easy to do for any given page, but it
is difficult to write generic code. My approach is to
parse a webpage, and then run each result through a
series of simple functions each one which tests for
something desired or not desired. After I have taken
what I want, I run it through a html preprocess
function and then the result can be used in generating
a web page.
To make dead links live I generally use split-path on
the main url but there are some exceptions. Some sites
require a function just to find the correct url for
the day.
I tie all these functions together with a master
function so I only need to create one function call to
process any given web page.
Because this function call can be complicated I'm
working on a tool that makes it easy to examine any
web page and then create the parameters needed to
build the function call.
A good example of this is the website
http://moreover.com. They extract headlines from news
pages and they produce excellent results. To cover
thousands of sites they must have good generic code.
My first website doing this is at
http://www.geocities.com/tamarind_climb
It has a couple of bugs but it has been working fairly
well. The problem with this is I have to write a new
page of code for each webpage I want to extract
headlines from and that's time consuming. That's why
I'm in the process of rewriting the code to be more
generic.
I think the key idea is to write simple functions that
only do one thing but when combined together have the
power to extract and reformat just about anything from
a web page. When and if I get further along in this I
will make my scripts available.
--- Terry Brownell <[depotcity--home--com]> wrote: