Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: The "It's Mine Now and I'll Do What I Want With It" Project Proposal

From: koolauscott::yahoo::com at: 10-Mar-2001 11:27

Yes, you can do this. I'm working on a similar project right now. It's easy to do for any given page, but it is difficult to write generic code. My approach is to parse a webpage, and then run each result through a series of simple functions each one which tests for something desired or not desired. After I have taken what I want, I run it through a html preprocess function and then the result can be used in generating a web page. To make dead links live I generally use split-path on the main url but there are some exceptions. Some sites require a function just to find the correct url for the day. I tie all these functions together with a master function so I only need to create one function call to process any given web page. Because this function call can be complicated I'm working on a tool that makes it easy to examine any web page and then create the parameters needed to build the function call. A good example of this is the website http://moreover.com. They extract headlines from news pages and they produce excellent results. To cover thousands of sites they must have good generic code. My first website doing this is at http://www.geocities.com/tamarind_climb It has a couple of bugs but it has been working fairly well. The problem with this is I have to write a new page of code for each webpage I want to extract headlines from and that's time consuming. That's why I'm in the process of rewriting the code to be more generic. I think the key idea is to write simple functions that only do one thing but when combined together have the power to extract and reformat just about anything from a web page. When and if I get further along in this I will make my scripts available. --- Terry Brownell <[depotcity--home--com]> wrote: