Searching news articles for a string
[1/5] from: David_A_Brown::vanguard::com at: 28-May-2003 15:07
Does anyone have a sample of a job that does the following:
1) Searches a "news headline" type web page for the occurrence of a
2) If the string is located determine what URL is associated with the
detailed news article
3) Use that URL to do another search on the web page with the detailed news
[2/5] from: mat:plothatching at: 29-May-2003 14:21
Hello David,
Dvc> Does anyone have a sample of a job that does the following:
Plenty. Do you have a particular news site in mind?
Mat Bettinson - +44-(0)20-83401514.
[3/5] from: David_A_Brown:vanguard at: 29-May-2003 12:59
HI Mat,
In order my preferences would be:
New York Times
Boston Globe
CNN Money
Mat Bettinson
<[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]>
.com> cc: (bcc: David A Brown/IT/VGI)
Sent by: Subject: [REBOL] Re: Searching news articles for a string
05/29/2003 09:21
Please respond to
Hello David,
Dvc> Does anyone have a sample of a job that does the following:
Plenty. Do you have a particular news site in mind?
Mat Bettinson - +44-(0)20-83401514.
[4/5] from: mat:plothatching at: 3-Jun-2003 16:06
Hello David,
Dvc> New York Times
Dvc> Boston Globe
Dvc> CNN Money
OK I'm not sure exactly what sort of thing you wanted to see but here
included a version of a kind of 'google news watching' function in my
IRC bot. It searches all the sub-sections of Googlenews. It looks at
the pages itself rather than using Google's news searcher - so the
results are very recent.
You give 'GoogleNewsSearch' two strings. The first is a type of
search, match all or any of the words in the second string. The words
would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea -
2. Euro states let France off deficit hook -,313,&item_id=31706
3. Putin Wants IAEA Checks on Iran Nuclear Program -
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program -
Maybe this might be useful sort of thing for what you wanted to do?
GoogleNewsSearch: func [
typehit [string!]
searchterms [string!]
Searchwords GoogleNewsURLs fgurl fgtitle fgstory i
AnyHits: func [
targetdata [string!]
searchblock [block!]
foreach sbit searchblock [
if found? find targetdata sbit [return true]
return false
AllHits: func [
targetdata [string!]
searchblock [block!]
foreach sbit searchblock [
if not found? find targetdata sbit [return false]
return true
Searchwords: parse searchterms none
GoogleNewsURLs: []
GoogleHits: make block! []
Foreach GoogleURL GoogleNewsURLs [
either error? try [Googlepage: read GoogleURL][
return false
parse Googlepage [any [thru "<td width=80 align=center valign=top>"
thru {<a class=y href="} copy fgurl to {"}
thru {>} copy fgtitle to {</a>}
thru "<font size=-1>" thru "<br>" copy fgstory to "<br>"
if typehit = "all" [
if ((AllHits fgstory Searchwords) or (AllHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle fgstory]
if typehit = "any" [
if ((AnyHits fgstory Searchwords) or (AnyHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle fgstory]
i: 0
foreach [fgurl fgtitle fgstory] GoogleHits [
i: i + 1
Print rejoin[i". "fgtitle" - "fgurl]
Mat Bettinson - +44-(0)20-83401514.
[5/5] from: David_A_Brown:vanguard at: 5-Jun-2003 9:12
Thanks a lot. That is a handy function.
Mat Bettinson
<[mat--plothatching] To: "[David_A_Brown--vanguard--com]" <[rebol-list--rebol--com]>
.com> cc: (bcc: David A Brown/IT/VGI)
Sent by: Subject: [REBOL] Re: Searching news articles for a string
06/03/2003 11:06
Please respond to
Hello David,
Dvc> New York Times
Dvc> Boston Globe
Dvc> CNN Money
OK I'm not sure exactly what sort of thing you wanted to see but here
included a version of a kind of 'google news watching' function in my
IRC bot. It searches all the sub-sections of Googlenews. It looks at
the pages itself rather than using Google's news searcher - so the
results are very recent.
You give 'GoogleNewsSearch' two strings. The first is a type of
search, match all or any of the words in the second string. The words
would be space delimited. Here's some examples;
>> GoogleNewsSearch "any" "france nuclear"
1. A conflict of views sharpens in Korea -
2. Euro states let France off deficit hook -,313,&item_id=31706
3. Putin Wants IAEA Checks on Iran Nuclear Program -
>> GoogleNewsSearch "all" "france nuclear"
1. Putin Wants IAEA Checks on Iran Nuclear Program -
Maybe this might be useful sort of thing for what you wanted to do?
GoogleNewsSearch: func [
typehit [string!]
searchterms [string!]
Searchwords GoogleNewsURLs fgurl fgtitle fgstory i
AnyHits: func [
targetdata [string!]
searchblock [block!]
foreach sbit searchblock [
if found? find targetdata sbit [return true]
return false
AllHits: func [
targetdata [string!]
searchblock [block!]
foreach sbit searchblock [
if not found? find targetdata sbit [return false]
return true
Searchwords: parse searchterms none
GoogleNewsURLs: []
GoogleHits: make block! []
Foreach GoogleURL GoogleNewsURLs [
either error? try [Googlepage: read GoogleURL][
return false
parse Googlepage [any [thru "<td width=80 align=center valign=top>"
thru {<a class=y href="} copy fgurl to {"}
thru {>} copy fgtitle to {</a>}
thru "<font size=-1>" thru "<br>" copy fgstory
to "<br>"
if typehit = "all" [
if ((AllHits fgstory Searchwords) or
(AllHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle
if typehit = "any" [
if ((AnyHits fgstory Searchwords) or
(AnyHits fgtitle Searchwords)) [
append GoogleHits reduce [fgurl fgtitle
i: 0
foreach [fgurl fgtitle fgstory] GoogleHits [
i: i + 1
Print rejoin[i". "fgtitle" - "fgurl]
Mat Bettinson - +44-(0)20-83401514.