Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Searching For Data On Websites

 [1/10] from: louisaturk:eudoramail at: 27-Dec-2001 0:16


Fellow Rebols, Let's say you want to search for certain data on a list of websites. How do you assure that you search every page on each web site on your list, but not go to any pages on sites linked to from those web sites? Louis

 [2/10] from: tomc:darkwing:uoregon at: 26-Dec-2001 22:37


Hi Louis, see: http://www.reboltech.com/library/scripts/site-check.r for how to not stray from the site you specify On Thu, 27 Dec 2001, Dr. Louis A. Turk wrote:

 [3/10] from: louisaturk:eudoramail at: 27-Dec-2001 1:28


Hi Tom, Thanks for the link. But site-check.r doen't work on the web sites I checked. On one it entered an infinite loop. On a second it exited immediately without gathering any data. On a third it gave very wrong data. Louis At 10:37 PM 12/26/2001 -0800, you wrote:

 [4/10] from: carl:cybercraft at: 27-Dec-2001 22:54


On 27-Dec-01, Dr. Louis A. Turk wrote:
> Fellow Rebols, > Let's say you want to search for certain data on a list of websites. > How do you assure that you search every page on each web site on > your list, but not go to any pages on sites linked to from those web > sites?
Off the top of my head, I'd say just comparing the URLs with the site's URL (by length) should do it...
>> site: http://www.abc.de/
== http://www.abc.de/
>> url1: http://www.abc.de/home.html
== http://www.abc.de/home.html
>> url2: http://www.abb.de/home.html
== http://www.abb.de/home.html
>> site = copy/part url1 length? site
== true
>> site = copy/part url2 length? site
== false -- Carl Read

 [5/10] from: bpaddock:csonline at: 27-Dec-2001 7:17


> Thanks for the link. But site-check.r doen't work on the web sites I > checked. On one it entered an infinite loop. On a second it exited > immediately without gathering any data. On a third it gave very wrong > data.
Maybe some one can code a version of WGET in Rebol. WGET saves web pages locally. You can instruct it to stay on the same server, and set the link depth. http://www.wget.org/ http://www.ccp14.ac.uk/mirror/wget.htm Some thing to VID?: http://www.jensroesner.de/wgetgui/ wGetGUI The easy to use Graphical User Interface (GUI) for the powerful webgrabber wGet

 [6/10] from: tomc:darkwing:uoregon at: 27-Dec-2001 11:26


Ahh, I had trouble (seg faults) running in on unix but it seemed to work from windows. There was an improvment I thought it needed that might help the _long_ loop which is basicly to force each local url into an aboslute form so http://a.b.c/d/e/../foo becomes http://a.b.c/d/foo I look into this again unless someone already has a fix. On Thu, 27 Dec 2001, Dr. Louis A. Turk wrote:

 [7/10] from: louisaturk:eudoramail at: 27-Dec-2001 13:26


Carl and Bob, Thanks for your responses. I believe you guys have steered me in the right direction, and a solution is just about at hand. However, I'm going to take a break before I work on it any more. Louis At 10:54 PM 12/27/2001 +1200, you wrote:

 [8/10] from: louisaturk:eudoramail at: 27-Dec-2001 13:45


Hi Tom, At 11:26 AM 12/27/2001 -0800, you wrote:
>Ahh, >I had trouble (seg faults) running in on unix but it seemed to work >from windows. There was an improvment I thought it needed that might >help the _long_ loop which is basicly to force each local url into an >aboslute form so http://a.b.c/d/e/../foo becomes http://a.b.c/d/foo
I'm running W2K. Perhaps %site-check.r was written specifically to check www.rebol.com, and not for general use to check any web site. Louis

 [9/10] from: ddalley:idirect at: 27-Dec-2001 15:48


----- Original Message ----- From: "Dr. Louis A. Turk" <[louisaturk--eudoramail--com]> To: <[rebol-list--rebol--com]> Sent: Thursday, December 27, 2001 2:26 PM Subject: [REBOL] Re: Searching For Data On Websites Dr.Turk: What is the problem collecting the correct web page? I collect hundreds per day and never have a problem, if the URL is correct. Presuming you know the correct URL, you should be getting your data, no? Donald Dalley

 [10/10] from: louisaturk::eudoramail::com at: 27-Dec-2001 15:18


Donald, I think I pretty much have my problem solved now. I am learning html and website design along with rebol, and my lack of knowledge of html was part of the problem. But I'm learning fast and things are starting to fall into place. Thanks for responding, Louis At 03:48 PM 12/27/2001 -0500, you wrote: