Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Downloading patents?

From: arolls:bigpond:au at: 1-Sep-2001 17:31

Your example has 7 pages, right? If you look in the source to your example link below, you see that there are two frames. The first one is the navigation bar, which you are clicking the right arrow button all the time to get to the next page. If you alter your url below, replacing bnsviewer with "bnsviewnav" (and keeping the rest of the query junk on the end) you have the link to the specific navbar for your specific patent. So, in rebol (watch out for line wrapping): url: http://l2.espacenet.com/dips/bnsviewnav?CY=gb&LG=en&DB=EPD&PN=US4215330&ID=U S+++4215330A1+I+ print page: read url If you look at that html code and search for TOTPG you can see "TOTPG=7". So now you can find out how many pages there are. find page "TOTPG" You should not find it difficult to grab the first number from that string. Now to construct the urls that point to each page. If we look back a little bit from we can see another nice variable "PG": find page "PG=" Great. Now we can simply modify url, adding PG=x, where x is your desired page number, for example, to go to page 3 (watch wrap): http://l2.espacenet.com/dips/bnsviewer?CY=gb&LG=en&DB=EPD&PN=US4215330&ID=US +++4215330A1+I+&PG=3 Now to find out how to get directly to each pdf file. find page ".pdf" We can see this relative link: /dips/bns.pdf?CY=gb&LG=en&PN=US4215330&ID=US+++4215330A1+I+&PG=1 So, here is an absolute link that returns the pdf file for page 3 (watch wrap): pdf-url: http://l2.espacenet.com/dips/bns.pdf?&PN=US4215330&ID=US+++4215330A1+I+&PG=3 This has (my thinking) the essentials, PN, ID and PG. (You can keep CY and LG if it causes problems). write/binary %test.pdf read/binary pdf-url browse %test.pdf Now, it's up to you to stick it all together into a nice program. :)