[REBOL] Re: Depth first search
From: nate:securitylogics at: 4-Dec-2001 15:55
Here is my faulty script:
I think the problem lies in the line that says:
append _LINKS LinkCrawler (depth - 1) getlinks url _LINKS
Nate
REBOL [
Title: "Web Page Crawler"
File: %weblinks.r
Date: "Nov. 15, 2001"
Purpose: "Make an index of words for a group of interlinked html pages"
]
web: make object! [
_LINKS: make block! 1000
_WORDS: make block! 1000
getlinks: func [
"Gets all html links from a page and returns a block
without dups."
baseURL [url!] "the site to get links from"
blck [block!] "list of urls not to return"
][
result: make block! 100
tags: make block! 100
text: make string! 8000
html-code: [
copy tag ["<" thru ">"] (append tags tag) |
copy txt to "<" (append text txt)
]
if error? try [page: read baseURL] [print "error reading
baseURL"]
if find/last baseURL "html" [
; if it ends with .html then strip off the last part
baseURL: rejoin [copy/part baseURL find/last
baseURL "/"]
]
if (pick baseURL length? baseURL) <> #"/" [
baseURL: join baseURL "/"
]
;parse page [to "<" some [copy tag ["<" thru ">"] (append
tags tag) | copy txt to "<" (append text txt) ]]
parse page [to "<" some html-code]
foreach tag tags [
if parse tag ["<A" thru "HREF="
[{"} copy link thru ".html" to {"} | copy link thru
.html
to ">"]
to end
][
if (copy/part form link 7) <> "http://" [
link: rejoin [baseURL clean-path to-url link] ]
if all [not(find blck link) not(find
result link)] [
result: append result link]
]
]
return result
]
LinkCrawler: func [
"Does a breadth first search depth deep"
depth [integer!] "how deep do you want to go"
blck [block!] "List of urls to visit"
; this is the line that did it!
/local result pagelinks i
][
either any [depth == 0 tail? blck] [
;print ["basis " depth tail? blck]
return []
][
result: make block! 100
append _LINKS blck
foreach url blck [
print ["getlinks = " url]
append _LINKS LinkCrawler (depth - 1)
getlinks url _LINKS
]
]
]
]
d: web/linkcrawler 3 web/getlinks http://students.cs.byu.edu/~nate []
foreach i d [
print i
]
At 06:31 AM 12/5/2001 +1300, you wrote: