Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Downloading patents?

From: bpaddock:csonline at: 22-Nov-2001 7:27

On Friday 31 August 2001 07:16 pm, you wrote:
> Every now and then I need to get a patent from one of the Internet Patent > Servers. > > Unfortunately they make you download one page at a time. Downloading a 38 > page patent this way took over a hour. Wait-for page to finish loading, > click 'next page' button, repeat cycle till no more pages.
At long last I got my patent downloading script in a workable state. If nothing else it will ack as example of how to do progress bars and downloads. -- Attached file included as plaintext by Listar -- -- File: patent.r REBOL [ Title: "Get Patent" Date: "Nov-21-2001 20:48" File: %patent.r Author: "Bob Paddock and Astrid Sindle" Version: 1.0.7 Purpose: { Downloads various types of patents from the server. Normally espace forces you to download the patents one page at a time. This script gets all of the pages for you automatically. Shows various progress bars and time estimates. There may be patents that this does not get because I can not find any documentation on how l2.espacenet encodes its URL's. If you know how to encode a specific URL please let me know so that I can add support for it. - [bpaddock--csonline--net] } ] GetPatent: func [ { Request a Patent Number to download from the net. Show progress. Displays alert box then aborts script on error. } PatentServer PatentNumber /local url page pdf-url LastPage CurrentPage GetPageURL OutputNameFILE stop PatentDownload ] [ url: probe to-url rejoin ["" PatentServer PatentNumber "&ID=" PatentServer "+++" PatentNumber "A1+I+"] page: read url ; Uncomment the following to see what the page we just got looks like: ; print PatentServer ; Print PatentNumber ; print page not-now1: "Service is temporarily unavailable" if find page not-now1 [alert not-now1 quit] not-now2: "The document request could not be processed" if find page not-now2 [alert not-now2 quit] ; Copy 10 chars after "TOPPG=" to find the number of pages in this patent LastPage: to-integer second parse copy/part find page "TOTPG=" 10 "=&" ;LastPage: 3 ; testing if LastPage < 1 [alert "Zero Pages to This Patent?" quit] print rejoin ["There are " LastPage " pages to this Patent, downloading now:"] pdf-url: rejoin ["" PatentServer PatentNumber "&ID=" PatentServer "+++" PatentNumber "A1+I+&PG="] ; Download all of the pages in the following loop, ; printout the URL and the name of the file being saved as we go: view/new center-face PatentDownload: layout [ title: text 300 bold red black center ProgressBar: progress 300x30 across toggle 90 "Cancel" "Stop" [stop: true] stat: text 240 bold red black middle return ElapsedTimeText: text 240 bold red black center return EstimatedTimeText: text 240 bold red black center return RemainingTimeText: text 240 bold red black center ] stop: false ProgressBar/data: 0 title/text: reform ["Patent " PatentNumber " has " LastPage "pages"] show title StartTime: now/time ElapsedTimeText/text: reform ["Start Time: " StartTime] show ElapsedTimeText ; Do{}While CurrentPageNumber <= LastPage: repeat CurrentPageNumber LastPage [ wait 1 ; Required to get the 'cancel' button to work if stop [break] stat/text: reform ["Downloading Page " CurrentPageNumber " Now"] show stat GetPageURL: probe to-url rejoin ["http://" pdf-url to-string CurrentPageNumber] OutputNameFILE: probe to-file rejoin [PatentServer PatentNumber "pg" CurrentPageNumber ".pdf"] ; Don't get pages that we do not need: if not exists? OutputNameFILE [ local-file: OutputNameFILE if not request-download/to GetPageURL local-file [ alert "Download failed or canceled." quit ] ] ProgressBar/data: ProgressBar/data + (1 / LastPage) elapsed: now/time - StartTime estimated: elapsed * ((LastPage + 1) / CurrentPageNumber) ElapsedTimeText/text: reform ["Elapsed Time: " elapsed] EstimatedTimeText/text: reform ["Estimated Time: " estimated] RemainingTimeText/text: reform ["Remaining Time: " estimated - elapsed] show [stat ProgressBar ElapsedTimeText EstimatedTimeText RemainingTimeText] ] ; Repeat unview/only PatentDownload print "Leaving GetPatent" ] ; GetPatent ; Derived from emailsend.r: view layout [ backdrop 30.40.100 effect [grid 10x10] origin 40x20 help-lbl: h2 white "Select Patent Server:" help-lbl-2: h3 white "" 200 PatentServer: choice "Select" "EP" "US" "WO" [ switch PatentServer/text [ "Select" [ help-lbl/text: "Select patent server:" help-lbl-2/text: "" ] "US" [ help-lbl/text: "Download US Patent:" help-lbl-2/text: "e.g. 4215330 or 6163242" ] "WO" [ help-lbl/text: "Download PCT Application [WO]:" help-lbl-2/text: "e.g. 0177456 or 9912345" ] "EP" [ help-lbl/text: "Download EP Application:" help-lbl-2/text: "e.g. 0234567 (7 digit)" ] ] show help-lbl show help-lbl-2 ] msg: field "Enter number here..." 210 text white "Press button to retrieve patent:" across return button "Get Patent" [ if all [not equal? msg/text "Enter number here..." not equal? PatentServer/Text "Select"] GetPatent PatentServer/text msg/text ] ] ] return button "Quit" [quit] ] do-events