Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Rebol web presence statistics

From: SunandaDH:aol at: 19-Mar-2004 18:02

Hallvard:
> So Google reports 188000 pages... But clicking "next" repeatedly never
gets
> you to the end. I wonder if this figure is really real...
The best you'll ever get by clicking next repeatedly is the first 1000 results. Google won't give you more than that for a single query, even if you use the SOAP API. To get additional results, you have to get clever: use the Advanced search and limit by file type or domain etc. Then deduplicate the various lists. Google's numbers wobble as it builds and rolls out new indexes. If you do the same query on each of these: www.google.com www2.google.com www3.google.com You'll generally see different numbers -- in fact, you'll often see different numbers if you repeat the same query on www.google.com. The same query can be answered by any of about a dozen Google data centers, and they very rarely are all in sync. So, the short answer is that there is no easy way of knowing how many pages Google has on a single subject. One way of discovering more sites that you don't have queued for your spider is to do this query in Google: link:www.rebol.com (no spaces on either side of the colon). And repeat for other central REBOL websites. Though this only returns pages that have a Google PageRank of 4 or above. The same query format on Altavista may give you many more sites as they are not limited by in the same way.. Sunanda.