Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: regex and robots.txt

From: SunandaDH::aol::com at: 3-Aug-2004 15:51

Hallvard:
> I'm thinking about implementing a sort of robots.txt version 2 with regex > for the rix robot. I need to give certain directions of my own for a
number
> of sites, and I wish to use robots.txt files to feed to the bot. Only
with
> regular expressions in them.
The main problem here is that a robots.txt with REs is unlikely to validate. Given that there are any number of badly written spiders out there that mis-read or mis-parse robots.txt already, I'd be unhappy about deploying a robots.txt that did not validate -- it may confuse well-meaning bots. (Having said that, REBOL.org's robots.txt does have the non-standard crawl-delay parameter to placate msnbot.....If Rixbot was as big as Microsoft, we might look at a non-standard entry for you too :-) ). One other approach: Consider looking for a rixbot-robots.txt file that contains the specific rules for rixbot. Rixbot-friendly sites would have to create such a file; give us a compelling reason, and I'm sure we would. One other thought: why RE? Why not parse-friendly BNF? Sunanda.