Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

error when processing multiple web pages!!

 [1/3] from: akhar:videotron:ca at: 18-Aug-2000 23:43


I am currently rying to do my own version of a web crawler it gets the list of sites to crawl from a separate text file and attenmps to visit them but after a few sites I get th following error and it quits to the console and even there afterwards I can open up any other web site: cannot find it connecting to: www.multimania.com connecting to: www.multimania.com ** Access Error: Cannot connect to www.multimania.com. ** Where: as: read join http:// [url]
>>
does rebol have a buffer?? here is my code REBOL [ Title: "e-mail finder" Date: 13-May-2000 Author: "Stephane Jolicoeur" File: %octo.r Email: [akhar--bemail--org] Purpose: { To find urls within a file!!! } Comments: { do not use for SPAM } ] urls: make block! text: make string! 0 html-code: [ thru "http://" copy url to newline (append urls url) | copy txt to "http:" (append text txt) ] page: read %urls.txt parse page [to "http://" some html-code] foreach url urls if exists? join http:// [url] [ as: read join http:// [url] ;print url if find as "@" [ print ["@" "fut trouver sur" url] newline print " " ] print "cannot find it" clear as ][print ["je ne peux acceder ce site:" url]] ]; as: ask "done??" thanks for any help Akhar

 [2/3] from: al:bri:xtra at: 19-Aug-2000 20:16


akhar wrote:
> REBOL [ > Title: "e-mail finder"
<<quoted lines omitted: 10>>
> ] > urls: make block!
Need a integer size after 'block!.
> text: make string! 0 > html-code: [
<<quoted lines omitted: 4>>
> parse page [to "http://" some html-code] > foreach url urls
Need a '[ after 'urls, as you're repeating the below code for each URL. Have the line: url: join http:// url as it simplifies code.
> if exists? join http:// [url] [
Don't need '[ and '] around 'url.
> as: read join http:// [url]
Better as: as: read url
> ;print url > if find as "@" [
Above is better as: either find as "@" [ BTW are you looking for URLs or email addresses in the page?
> print ["@" "fut trouver sur" url] > newline > print " "
Above is better as: print ["@" "fut trouver sur" url newline
> ] print "cannot find it"
Above is better as: ] [print "cannot find it"] ; Other branch of 'either.
> clear as
No need for above line
> ][print ["je ne peux acceder ce site:" url]]
I'm fairly sure this line isn't needed.
> ];
No need for semi-colon in above line.
> as: ask "done??"
This line should be moved above the '[ above. Fix these problems and your problem should be solved for all sites that don't require cookies, or that don't require specific browsers. I hope that helps! Good luck. Andrew Martin ICQ: 26227169 http://members.xoom.com/AndrewMartin/

 [3/3] from: al::bri::xtra::co::nz at: 19-Aug-2000 20:30


> > if exists? join http:// [url] [ > > Don't need '[ and '] around 'url.
And this is better: if exists? url [
> > print ["@" "fut trouver sur" url] > > newline > > print " " > > Above is better as: > print ["@" "fut trouver sur" url newline
And, of course, don't forget the '] at the end, as I did. :-) Andrew Martin Rebo... Nut ICQ: 26227169 http://members.xoom.com/AndrewMartin/

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted