[REBOL] Re: Downloading patents?
From: arolls:bigpond:au at: 1-Sep-2001 17:31
Your example has 7 pages, right?
If you look in the source to your
example link below, you see that there
are two frames. The first one is the
navigation bar, which you are clicking
the right arrow button all the time
to get to the next page.
If you alter your url below, replacing
bnsviewer
with "bnsviewnav" (and keeping
the rest of the query junk on the end)
you have the link to the specific navbar
for your specific patent.
So, in rebol (watch out for line wrapping):
url:
http://l2.espacenet.com/dips/bnsviewnav?CY=gb&LG=en&DB=EPD&PN=US4215330&ID=U
S+++4215330A1+I+
print page: read url
If you look at that html code and search for TOTPG
you can see "TOTPG=7".
So now you can find out how many pages there are.
find page "TOTPG"
You should not find it difficult to grab the
first number from that string.
Now to construct the urls that point to each page.
If we look back a little bit from we can see
another nice variable "PG":
find page "PG="
Great. Now we can simply modify url, adding
PG=x, where x is your desired page number, for example,
to go to page 3 (watch wrap):
http://l2.espacenet.com/dips/bnsviewer?CY=gb&LG=en&DB=EPD&PN=US4215330&ID=US
+++4215330A1+I+&PG=3
Now to find out how to get directly to each pdf file.
find page ".pdf"
We can see this relative link:
/dips/bns.pdf?CY=gb&LG=en&PN=US4215330&ID=US+++4215330A1+I+&PG=1
So, here is an absolute link that returns the pdf file for page 3 (watch
wrap):
pdf-url:
http://l2.espacenet.com/dips/bns.pdf?&PN=US4215330&ID=US+++4215330A1+I+&PG=3
This has (my thinking) the essentials, PN, ID and PG.
(You can keep CY and LG if it causes problems).
write/binary %test.pdf read/binary pdf-url
browse %test.pdf
Now, it's up to you to stick it all together into a nice
program. :)