error when processing multiple web pages!!
[1/3] from: akhar:videotron:ca at: 18-Aug-2000 23:43
I am currently rying to do my own version of a web crawler it gets the list
of sites to crawl from a separate text file and attenmps to visit them but
after a few sites I get th following error and it quits to the console and
even there afterwards I can open up any other web site:
cannot find it
connecting to: www.multimania.com
connecting to: www.multimania.com
** Access Error: Cannot connect to www.multimania.com.
** Where: as: read join http:// [url]
>>
does rebol have a buffer?? here is my code
REBOL [
Title: "e-mail finder"
Date: 13-May-2000
Author: "Stephane Jolicoeur"
File: %octo.r
Email: [akhar--bemail--org]
Purpose: {
To find urls within a file!!!
}
Comments: {
do not use for SPAM
}
]
urls: make block!
text: make string! 0
html-code: [
thru "http://" copy url to newline (append urls url) |
copy txt to "http:" (append text txt)
]
page: read %urls.txt
parse page [to "http://" some html-code]
foreach url urls
if exists? join http:// [url] [
as: read join http:// [url]
;print url
if find as "@" [
print ["@" "fut trouver sur" url]
newline
print " "
] print "cannot find it"
clear as
][print ["je ne peux acceder ce site:" url]]
];
as: ask "done??"
thanks for any help
Akhar
[2/3] from: al:bri:xtra at: 19-Aug-2000 20:16
akhar wrote:
> REBOL [
> Title: "e-mail finder"
<<quoted lines omitted: 10>>
> ]
> urls: make block!
Need a integer size after 'block!.
> text: make string! 0
> html-code: [
<<quoted lines omitted: 4>>
> parse page [to "http://" some html-code]
> foreach url urls
Need a '[ after 'urls, as you're repeating the below code for each URL.
Have the line:
url: join http:// url
as it simplifies code.
> if exists? join http:// [url] [
Don't need '[ and '] around 'url.
> as: read join http:// [url]
Better as:
as: read url
> ;print url
> if find as "@" [
Above is better as:
either find as "@" [
BTW are you looking for URLs or email addresses in the page?
> print ["@" "fut trouver sur" url]
> newline
> print " "
Above is better as:
print ["@" "fut trouver sur" url newline
> ] print "cannot find it"
Above is better as:
] [print "cannot find it"] ; Other branch of 'either.
> clear as
No need for above line
> ][print ["je ne peux acceder ce site:" url]]
I'm fairly sure this line isn't needed.
> ];
No need for semi-colon in above line.
> as: ask "done??"
This line should be moved above the '[ above.
Fix these problems and your problem should be solved for all sites that
don't require cookies, or that don't require specific browsers.
I hope that helps! Good luck.
Andrew Martin
ICQ: 26227169
http://members.xoom.com/AndrewMartin/
[3/3] from: al::bri::xtra::co::nz at: 19-Aug-2000 20:30
> > if exists? join http:// [url] [
>
> Don't need '[ and '] around 'url.
And this is better:
if exists? url [
> > print ["@" "fut trouver sur" url]
> > newline
> > print " "
>
> Above is better as:
> print ["@" "fut trouver sur" url newline
And, of course, don't forget the '] at the end, as I did. :-)
Andrew Martin
Rebo... Nut
ICQ: 26227169
http://members.xoom.com/AndrewMartin/
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted