[REBOL] Re: regex and robots.txt
From: SunandaDH::aol::com at: 3-Aug-2004 15:51
Hallvard:
> I'm thinking about implementing a sort of robots.txt version 2 with regex
> for the rix robot. I need to give certain directions of my own for a
number
> of sites, and I wish to use robots.txt files to feed to the bot. Only
with
> regular expressions in them.
The main problem here is that a robots.txt with REs is unlikely to validate.
Given that there are any number of badly written spiders out there that
mis-read or mis-parse robots.txt already, I'd be unhappy about deploying a
robots.txt that did not validate -- it may confuse well-meaning bots.
(Having said that, REBOL.org's robots.txt does have the non-standard
crawl-delay parameter to placate msnbot.....If Rixbot was as big as Microsoft, we
might look at a non-standard entry for you too :-) ).
One other approach:
Consider looking for a rixbot-robots.txt file that contains the specific
rules for rixbot. Rixbot-friendly sites would have to create such a file; give us
a compelling reason, and I'm sure we would.
One other thought: why RE? Why not parse-friendly BNF?
Sunanda.