Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Searching news articles for a string

 [1/5] from: David_A_Brown::vanguard::com at: 28-May-2003 15:07


Does anyone have a sample of a job that does the following: 1) Searches a "news headline" type web page for the occurrence of a string. 2) If the string is located determine what URL is associated with the detailed news article 3) Use that URL to do another search on the web page with the detailed news article Thanks

 [2/5] from: mat:plothatching at: 29-May-2003 14:21


Hello David, Dvc> Does anyone have a sample of a job that does the following: Plenty. Do you have a particular news site in mind? Regards, Mat Bettinson - +44-(0)20-83401514.

 [3/5] from: David_A_Brown:vanguard at: 29-May-2003 12:59


HI Mat, In order my preferences would be: New York Times Boston Globe CNN Money Thanks. Mat Bettinson <[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]> .com> cc: (bcc: David A Brown/IT/VGI) Sent by: Subject: [REBOL] Re: Searching news articles for a string [rebol-bounce--rebo] l.com 05/29/2003 09:21 AM Please respond to rebol-list Hello David, Dvc> Does anyone have a sample of a job that does the following: Plenty. Do you have a particular news site in mind? Regards, Mat Bettinson - +44-(0)20-83401514.

 [4/5] from: mat:plothatching at: 3-Jun-2003 16:06


Hello David, Dvc> New York Times Dvc> Boston Globe Dvc> CNN Money OK I'm not sure exactly what sort of thing you wanted to see but here included a version of a kind of 'google news watching' function in my IRC bot. It searches all the sub-sections of Googlenews. It looks at the pages itself rather than using Google's news searcher - so the results are very recent. You give 'GoogleNewsSearch' two strings. The first is a type of search, match all or any of the words in the second string. The words would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea - http://www.iht.com/articles/98268.htm 2. Euro states let France off deficit hook - http://www.expatica.com/francemain.asp?pad=278,313,&item_id=31706 3. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319 Maybe this might be useful sort of thing for what you wanted to do? GoogleNewsSearch: func [ typehit [string!] searchterms [string!] /local Searchwords GoogleNewsURLs fgurl fgtitle fgstory i ][ AnyHits: func [ targetdata [string!] searchblock [block!] ][ foreach sbit searchblock [ if found? find targetdata sbit [return true] ] return false ] AllHits: func [ targetdata [string!] searchblock [block!] ][ foreach sbit searchblock [ if not found? find targetdata sbit [return false] ] return true ] Searchwords: parse searchterms none GoogleNewsURLs: [http://news.google.com/news/gnworldleftnav.html http://news.google.com/news/gnusaleftnav.html http://news.google.com/news/gnbusinessleftnav.html http://news.google.com/news/gntechnologyleftnav.html http://news.google.com/news/gnsportsleftnav.html http://news.google.com/news/gnenterleftnav.html http://news.google.com/news/gnhealthleftnav.html] GoogleHits: make block! [] Foreach GoogleURL GoogleNewsURLs [ either error? try [Googlepage: read GoogleURL][ return false ][ parse Googlepage [any [thru "<td width=80 align=center valign=top>" thru {<a class=y href="} copy fgurl to {"} thru {>} copy fgtitle to {</a>} thru "<font size=-1>" thru "<br>" copy fgstory to "<br>" ( if typehit = "all" [ if ((AllHits fgstory Searchwords) or (AllHits fgtitle Searchwords)) [ append GoogleHits reduce [fgurl fgtitle fgstory] ] ] if typehit = "any" [ if ((AnyHits fgstory Searchwords) or (AnyHits fgtitle Searchwords)) [ append GoogleHits reduce [fgurl fgtitle fgstory] ] ] ) ] ] ] ] i: 0 foreach [fgurl fgtitle fgstory] GoogleHits [ i: i + 1 Print rejoin[i". "fgtitle" - "fgurl] ] ] Regards, Mat Bettinson - +44-(0)20-83401514.

 [5/5] from: David_A_Brown:vanguard at: 5-Jun-2003 9:12


Mat, Thanks a lot. That is a handy function. Dave Mat Bettinson <[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]> .com> cc: (bcc: David A Brown/IT/VGI) Sent by: Subject: [REBOL] Re: Searching news articles for a string [rebol-bounce--rebo] l.com 06/03/2003 11:06 AM Please respond to rebol-list Hello David, Dvc> New York Times Dvc> Boston Globe Dvc> CNN Money OK I'm not sure exactly what sort of thing you wanted to see but here included a version of a kind of 'google news watching' function in my IRC bot. It searches all the sub-sections of Googlenews. It looks at the pages itself rather than using Google's news searcher - so the results are very recent. You give 'GoogleNewsSearch' two strings. The first is a type of search, match all or any of the words in the second string. The words would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea - http://www.iht.com/articles/98268.htm 2. Euro states let France off deficit hook - http://www.expatica.com/francemain.asp?pad=278,313,&item_id=31706 3. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319 Maybe this might be useful sort of thing for what you wanted to do? GoogleNewsSearch: func [ typehit [string!] searchterms [string!] /local Searchwords GoogleNewsURLs fgurl fgtitle fgstory i ][ AnyHits: func [ targetdata [string!] searchblock [block!] ][ foreach sbit searchblock [ if found? find targetdata sbit [return true] ] return false ] AllHits: func [ targetdata [string!] searchblock [block!] ][ foreach sbit searchblock [ if not found? find targetdata sbit [return false] ] return true ] Searchwords: parse searchterms none GoogleNewsURLs: [http://news.google.com/news/gnworldleftnav.html http://news.google.com/news/gnusaleftnav.html http://news.google.com/news/gnbusinessleftnav.html http://news.google.com/news/gntechnologyleftnav.html http://news.google.com/news/gnsportsleftnav.html http://news.google.com/news/gnenterleftnav.html http://news.google.com/news/gnhealthleftnav.html] GoogleHits: make block! [] Foreach GoogleURL GoogleNewsURLs [ either error? try [Googlepage: read GoogleURL][ return false ][ parse Googlepage [any [thru "<td width=80 align=center valign=top>" thru {<a class=y href="} copy fgurl to {"} thru {>} copy fgtitle to {</a>} thru "<font size=-1>" thru "<br>" copy fgstory to "<br>" ( if typehit = "all" [ if ((AllHits fgstory Searchwords) or (AllHits fgtitle Searchwords)) [ append GoogleHits reduce [fgurl fgtitle fgstory] ] ] if typehit = "any" [ if ((AnyHits fgstory Searchwords) or (AnyHits fgtitle Searchwords)) [ append GoogleHits reduce [fgurl fgtitle fgstory] ] ] ) ] ] ] ] i: 0 foreach [fgurl fgtitle fgstory] GoogleHits [ i: i + 1 Print rejoin[i". "fgtitle" - "fgurl] ] ] Regards, Mat Bettinson - +44-(0)20-83401514.