parsing rebsites

[1/3] from: anton::lexicon::net at: 26-Jul-2002 18:59

Anyone interested in parsing rebsite index.r files (eg. to make a reb browser or desktop replacement) can find the parser function used by desktop here: ctx-viewtop/parse-index Here is a chunk of code that parses an index file, then prints out all the folders. page: load/all file if block? ctx-viewtop [ctx-viewtop: context ctx-viewtop] obj: ctx-viewtop/parse-index page if obj [ ; index file parsed ok? foreach icon obj/icons [ if icon/type = 'folder [print icon/item] ] ] To see the full contents of obj: probe obj Anton.

[2/3] from: anton:lexicon at: 29-Jul-2002 23:42

I should have defined file in my example: file: http://www.rebol.com/index.r

> Here is a chunk of code that parses an index file, > then prints out all the folders.

<<quoted lines omitted: 6>>

> ] > ]

which gives output like this: http://www.reboltech.com/index.r view/demos/index.r view/tools/index.r view/docs/index.r http://www.reboltech.com/library/index.r

[3/3] from: oliva:david:seznam:cz at: 4-Aug-2002 15:42

Hello Anton, here is first (not tested online yet) version of my simle Rebol-sites bot: <code> rebol [ title: "Rebol Reb-sites-BOT" file: %reb-reb-bot.r author: 'Oldes email: [oliva--david--seznam--cz] purpose: {Simple BOT for travelling thru Reb-sites - uses read-thru so downloads these files as well:-)} version: 0.0.1 date: 4-Aug-2002/15:27:23+2:00 ] path-thru: func [url /local purl][ if file? url [return url] if not all [purl: decode-url url purl/host] [return none] rejoin [ view-root/public slash purl/host either none? purl/port-id [""][join "%3A" purl/port-id] slash any [purl/path ""] any [purl/target ""] ] ] page-url: to-url ask "Start URL (default is: http://www.rebol.com/index.r): " if empty? page-url [page-url: http://www.rebol.com/index.r] if block? ctx-viewtop [ctx-viewtop: context ctx-viewtop] files-to-search: make block! 1000 files-searched: make block! 1000 insert files-to-search page-url read-file: func[ {Read a net file from thru the disk cache. Returns binary, else none on error.} url [url!] /update "Force update from source site" /check "Update only if date/size do not match." /local info ] either connected? [ [;online version either check [ if error? try [ info: info? url return read-thru/check url reduce [info/size info/date] ][ return none] ][ either update [ read-thru/update url ][ read-thru url ] ] ] ][ [;offline version if error? try [return read/binary path-thru url][return none] ] ] while [not empty? files-to-search ][ insert tail files-searched page-url: first files-to-search if not none? page: read-file/check page-url [ page: load/all to-string page obj: ctx-viewtop/parse-index page purl: decode-url page-url base-href: rejoin [ http:// purl/host either none? purl/port-id [""][join ":" purl/port-id] #"/" any [purl/path ""] ] if obj [ ; index file parsed ok? print reform ["NEW INDEX:" page-url] print reform ["TITLE:" obj/title "ICONS:" length? obj/icons] foreach icon obj/icons [ if file? icon/item [icon/item: join base-href icon/item] either icon/type = 'folder [ print reform [tab "FOLDER:" icon/item] if not found? find files-searched icon/item [ insert tail files-to-search icon/item ] ][ ;place your own code for ather file types: print reform [tab uppercase to-string icon/type #":" icon/item] ;You will probably want to donwload them as well:-) if url? icon/item [ read-file/check icon/item ] ] ] ] ] remove files-to-search ] </code>

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted