Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[ANN] (but what should I call it?)

 [1/6] from: hallvard:ystad:helpinhand at: 11-May-2003 12:18


Hi folks We don't really need this - there are enough search engines for the web as it is. Still - I made a little robot that searches the web for pages containing the word "rebol". (It was fun making this, but I do believe that a google search including +rebol will yield better results.) Pages lacking the magical word are not indexed. You can try it on http://folk.uio.no/hallvary/rebindex/ At the moment, anything you write in the search box will be treated as a string, and the search engine will look for that exact string. I'll change this soon. I called this a "rebindexer", but I'd very much like name suggestions! Thanks. If the robot hasn't indexed your page, but you'd like it to, just send me a note, and I'll fix it. I haven't got any /pro licence and cannot access a database from rebol, so this application stores all indexed pages (compressed) in a textfile. It runs with the newest rebol/core on a mac OSX. I don't know how it will scale, so if you guys all try it at the same time, we will see. This will be an interresting experience... Enjoy, ~H

 [2/6] from: hallvard:ystad:helpinhand at: 11-May-2003 15:30


Dixit Andreas Bolka (14.20 11.05.2003):
>does it obey robots.txt? if yes, what is its agend id? if no, why not?
It reads robots.txt, but doesn't obey yet. I plan to of course, but haven't gotten that far. For the moment, it excludes anything that smells like dynamically created pages (cgi-bin, .php, .r, .asp, .jsp, .pl, .cfm ...) I just couldn't resist posting this now, even though some work is yet to be done with it... Agent ID? I haven't decided on its name yet. Propositions are welcome. ~H

 [3/6] from: andreas:bolka:gmx at: 11-May-2003 13:20


Sunday, May 11, 2003, 11:18:30 AM, Hallvard wrote:
> We don't really need this - there are enough search engines for the > web as it is. Still - I made a little robot that searches the web > for pages containing the word "rebol".
does it obey robots.txt? if yes, what is its agend id? if no, why not? (see eg. http://www.searchengineworld.com/robots/robots_tutorial.htm or http://www.robotstxt.org/wc/norobots.html) -- Best regards, Andreas mailto:[andreas--bolka--gmx--net]

 [4/6] from: ingo::2b1::de at: 11-May-2003 17:12


Hi Hallvard, Hallvard Ystad wrote: <..>
> For the moment, it excludes anything that smells like dynamically > created pages (cgi-bin, .php, .r, .asp, .jsp, .pl, .cfm ...) I just > couldn't resist posting this now, even though some work is yet to be > done with it...
To make it really useful try to read *.r files first, and index them if they contain a REBOL[] header. That's something we _could_ need. Kind regards, Ingo

 [5/6] from: ingo:2b1 at: 11-May-2003 17:56


Hi Hallvard, Hallvard Ystad wrote: <..>
> I haven't got any /pro licence and cannot access a database from > rebol,
For MySQL access you don't need /Pro, DocKimbels /Core driver is available through softinnov ( http://www.softinnov.com ) at http://rebol.softinnov.org/mysql/ Kind regards, Ingo

 [6/6] from: hallvard:ystad:helpinhand at: 12-May-2003 11:20


Dixit Ingo Hohmann (17.56 11.05.2003):
>Hi Hallvard, >For MySQL access you don't need /Pro, DocKimbels /Core driver is available through softinnov ( http://www.softinnov.com ) at >http://rebol.softinnov.org/mysql/
Ingo, that's brilliant news! I'll look into it as soon as I can... ~H