Searching news articles for a string
[1/5] from: David_A_Brown::vanguard::com at: 28-May-2003 15:07
Does anyone have a sample of a job that does the following:
1) Searches a "news headline" type web page for the occurrence of a
string.
2) If the string is located determine what URL is associated with the
detailed news article
3) Use that URL to do another search on the web page with the detailed news
article
Thanks
[2/5] from: mat:plothatching at: 29-May-2003 14:21
Hello David,
Dvc> Does anyone have a sample of a job that does the following:
Plenty. Do you have a particular news site in mind?
Regards,
Mat Bettinson - +44-(0)20-83401514.
[3/5] from: David_A_Brown:vanguard at: 29-May-2003 12:59
HI Mat,
In order my preferences would be:
New York Times
Boston Globe
CNN Money
Thanks.
Mat Bettinson
<[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]>
.com> cc: (bcc: David A Brown/IT/VGI)
Sent by: Subject: [REBOL] Re: Searching news articles for a string
[rebol-bounce--rebo]
l.com
05/29/2003 09:21
AM
Please respond to
rebol-list
Hello David,
Dvc> Does anyone have a sample of a job that does the following:
Plenty. Do you have a particular news site in mind?
Regards,
Mat Bettinson - +44-(0)20-83401514.
[4/5] from: mat:plothatching at: 3-Jun-2003 16:06
Hello David,
Dvc> New York Times
Dvc> Boston Globe
Dvc> CNN Money
OK I'm not sure exactly what sort of thing you wanted to see but here
included a version of a kind of 'google news watching' function in my
IRC bot. It searches all the sub-sections of Googlenews. It looks at
the pages itself rather than using Google's news searcher - so the
results are very recent.
You give 'GoogleNewsSearch' two strings. The first is a type of
search, match all or any of the words in the second string. The words
would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea - http://www.iht.com/articles/98268.htm
2. Euro states let France off deficit hook - http://www.expatica.com/francemain.asp?pad=278,313,&item_id=31706
3. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program - http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
Maybe this might be useful sort of thing for what you wanted to do?
GoogleNewsSearch: func [
typehit [string!]
searchterms [string!]
/local
Searchwords GoogleNewsURLs fgurl fgtitle fgstory i
][
AnyHits: func [
targetdata [string!]
searchblock [block!]
][
foreach sbit searchblock [
if found? find targetdata sbit [return true]
]
return false
]
AllHits: func [
targetdata [string!]
searchblock [block!]
][
foreach sbit searchblock [
if not found? find targetdata sbit [return false]
]
return true
]
Searchwords: parse searchterms none
GoogleNewsURLs: [http://news.google.com/news/gnworldleftnav.html
http://news.google.com/news/gnusaleftnav.html
http://news.google.com/news/gnbusinessleftnav.html
http://news.google.com/news/gntechnologyleftnav.html
http://news.google.com/news/gnsportsleftnav.html
http://news.google.com/news/gnenterleftnav.html
http://news.google.com/news/gnhealthleftnav.html]
GoogleHits: make block! []
Foreach GoogleURL GoogleNewsURLs [
either error? try [Googlepage: read GoogleURL][
return false
][
parse Googlepage [any [thru "<td width=80 align=center valign=top>"
thru {<a class=y href="} copy fgurl to {"}
thru {>} copy fgtitle to {</a>}
thru "<font size=-1>" thru "<br>" copy fgstory to "<br>"
(
if typehit = "all" [
if ((AllHits fgstory Searchwords) or (AllHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle fgstory]
]
]
if typehit = "any" [
if ((AnyHits fgstory Searchwords) or (AnyHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle fgstory]
]
]
)
]
]
]
]
i: 0
foreach [fgurl fgtitle fgstory] GoogleHits [
i: i + 1
Print rejoin[i". "fgtitle" - "fgurl]
]
]
Regards,
Mat Bettinson - +44-(0)20-83401514.
[5/5] from: David_A_Brown:vanguard at: 5-Jun-2003 9:12
Mat,
Thanks a lot. That is a handy function.
Dave
Mat Bettinson
<[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]>
.com> cc: (bcc: David A Brown/IT/VGI)
Sent by: Subject: [REBOL] Re: Searching news articles for a string
[rebol-bounce--rebo]
l.com
06/03/2003 11:06
AM
Please respond to
rebol-list
Hello David,
Dvc> New York Times
Dvc> Boston Globe
Dvc> CNN Money
OK I'm not sure exactly what sort of thing you wanted to see but here
included a version of a kind of 'google news watching' function in my
IRC bot. It searches all the sub-sections of Googlenews. It looks at
the pages itself rather than using Google's news searcher - so the
results are very recent.
You give 'GoogleNewsSearch' two strings. The first is a type of
search, match all or any of the words in the second string. The words
would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea -
http://www.iht.com/articles/98268.htm
2. Euro states let France off deficit hook -
http://www.expatica.com/francemain.asp?pad=278,313,&item_id=31706
3. Putin Wants IAEA Checks on Iran Nuclear Program -
http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program -
http://reuters.com/newsArticle.jhtml?type=worldNews&storyID=2868319
Maybe this might be useful sort of thing for what you wanted to do?
GoogleNewsSearch: func [
typehit [string!]
searchterms [string!]
/local
Searchwords GoogleNewsURLs fgurl fgtitle fgstory i
][
AnyHits: func [
targetdata [string!]
searchblock [block!]
][
foreach sbit searchblock [
if found? find targetdata sbit [return true]
]
return false
]
AllHits: func [
targetdata [string!]
searchblock [block!]
][
foreach sbit searchblock [
if not found? find targetdata sbit [return false]
]
return true
]
Searchwords: parse searchterms none
GoogleNewsURLs: [http://news.google.com/news/gnworldleftnav.html
http://news.google.com/news/gnusaleftnav.html
http://news.google.com/news/gnbusinessleftnav.html
http://news.google.com/news/gntechnologyleftnav.html
http://news.google.com/news/gnsportsleftnav.html
http://news.google.com/news/gnenterleftnav.html
http://news.google.com/news/gnhealthleftnav.html]
GoogleHits: make block! []
Foreach GoogleURL GoogleNewsURLs [
either error? try [Googlepage: read GoogleURL][
return false
][
parse Googlepage [any [thru "<td width=80 align=center valign=top>"
thru {<a class=y href="} copy fgurl to {"}
thru {>} copy fgtitle to {</a>}
thru "<font size=-1>" thru "<br>" copy fgstory
to "<br>"
(
if typehit = "all" [
if ((AllHits fgstory Searchwords) or
(AllHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle
fgstory]
]
]
if typehit = "any" [
if ((AnyHits fgstory Searchwords) or
(AnyHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle
fgstory]
]
]
)
]
]
]
]
i: 0
foreach [fgurl fgtitle fgstory] GoogleHits [
i: i + 1
Print rejoin[i". "fgtitle" - "fgurl]
]
]
Regards,
Mat Bettinson - +44-(0)20-83401514.