Mailing List Archive: Re: ANN: SiteCrawl

[REBOL] Re: ANN: SiteCrawl

From: arolls:bigpond:au at: 25-Jul-2001 4:34


> rebol-pages: copy []
> SiteCrawl http://www.rebol.com rebol-pages
>
> I need feedback on this. Do you have a small site where you can test
> 'SiteCrawl for me?

> -Ryan

My site is fairly small, you can check it out easily:
http://users.bigpond.net.au/datababies/anton/index.html

I think not all links are written with surrounding
quotes, as assumed by your pageLinks function.
Maybe it's not the official way, but IE lets this through:
<a href=http://antonrolls.net>mysite</a>

Also, it doesn't catch a link such as this:
<a href="TechSupport/">Tech Support</a><br>
(as found in my site.)

without an index.html file specified.
I think it should look for:
 TechSupport/index.htm(l)
 TechSupport/default.htm(l)

It's interesting, if you

 trace/net on
 read http://users.bigpond.net.au/datababies/anton/TechSupport
 trace/net off

you can see it tries first to find the file "TechSupport",
then it tries to get the directory "TechSupport/".

In your SiteCrawl function, where it is written:
if error? try [...][
    links: next links
]
It seems as if you are relying on the error to
occur. An error occurs for all of the links in
my site.
And why do you write  links: next links  ? Surely
the next link will come along in the next iteration
of the foreach loop. I suggest just do nothing [].

Regards,

Anton.