Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Depth first search

From: nate:securitylogics at: 4-Dec-2001 15:55

Here is my faulty script: I think the problem lies in the line that says: append _LINKS LinkCrawler (depth - 1) getlinks url _LINKS Nate REBOL [ Title: "Web Page Crawler" File: %weblinks.r Date: "Nov. 15, 2001" Purpose: "Make an index of words for a group of interlinked html pages" ] web: make object! [ _LINKS: make block! 1000 _WORDS: make block! 1000 getlinks: func [ "Gets all html links from a page and returns a block without dups." baseURL [url!] "the site to get links from" blck [block!] "list of urls not to return" ][ result: make block! 100 tags: make block! 100 text: make string! 8000 html-code: [ copy tag ["<" thru ">"] (append tags tag) | copy txt to "<" (append text txt) ] if error? try [page: read baseURL] [print "error reading baseURL"] if find/last baseURL "html" [ ; if it ends with .html then strip off the last part baseURL: rejoin [copy/part baseURL find/last baseURL "/"] ] if (pick baseURL length? baseURL) <> #"/" [ baseURL: join baseURL "/" ] ;parse page [to "<" some [copy tag ["<" thru ">"] (append tags tag) | copy txt to "<" (append text txt) ]] parse page [to "<" some html-code] foreach tag tags [ if parse tag ["<A" thru "HREF=" [{"} copy link thru ".html" to {"} | copy link thru .html to ">"] to end ][ if (copy/part form link 7) <> "http://" [ link: rejoin [baseURL clean-path to-url link] ] if all [not(find blck link) not(find result link)] [ result: append result link] ] ] return result ] LinkCrawler: func [ "Does a breadth first search depth deep" depth [integer!] "how deep do you want to go" blck [block!] "List of urls to visit" ; this is the line that did it! /local result pagelinks i ][ either any [depth == 0 tail? blck] [ ;print ["basis " depth tail? blck] return [] ][ result: make block! 100 append _LINKS blck foreach url blck [ print ["getlinks = " url] append _LINKS LinkCrawler (depth - 1) getlinks url _LINKS ] ] ] ] d: web/linkcrawler 3 web/getlinks http://students.cs.byu.edu/~nate [] foreach i d [ print i ] At 06:31 AM 12/5/2001 +1300, you wrote: