Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[UPDATE] web.log data mining

 [1/3] from: ryan:christiansen:intellisol at: 20-Mar-2001 9:56


Below is an updated version of the CLF format web.log file data mining script, which parses CLF format web.log files and places the data in .csv files for use in Excel and other applications. The script now calls functions instead of being simply linear. Usage: You must begin with a 0-byte %dns-library.r file in the same directory as the script. Change username, password, domain.dom and path to web.log file in ftp protocol statement. The script will write separate .csv files in the same directory as the script, each .csv file named for the date of the data for which it contains. If you run the script a second day, it will append to a .csv file for a given date if that .csv file exists in the directory. The script is designed to mine a web.log file which has been rotated to a 0-byte log file on a daily basis. REBOL [] log-file: read/lines ftp://username:[password--domain--dom]/logs/web.log retrieved-library: read %dns-library.r dns-library: parse retrieved-library none assemble-date: func [ "Parse one line in CLF web.log format and return the date as a REBOL date! datatype" log-line [string!] "One line in CLF web.log format" /local date-line date-string return-date ][ date-line: parse log-line none date-line/4: remove date-line/4 {[} date-line/5: remove date-line/5 {]} date-string: rejoin [date-line/4 " " date-line/5] return-date: make date! date-string ] assemble-time: func [ "Parse one line in CLF web.log format and return the time as a REBOL time! datatype" log-line [string!] "One line in CLF web.log format" /local date-line date-string return-time ][ date-line: parse log-line none date-line/4: remove date-line/4 {[} date-line/5: remove date-line/5 {]} date-string: rejoin [date-line/4 " " date-line/5] parse date-string [thru ":" copy text to end (return-time: make time! text)] return-time ] dns-lookup: func [ "Convert an IP address to a domain name" dns-cache "A cache of IP addresses and corresponding domain names" IP [string!] "The IP address that needs to be converted to a domain" /local domain ][ domain: select dns-cache IP if ( domain == none ) [ domain: read join dns:// IP if ( domain == none ) [ domain: "unresolved" ] append/only dns-cache IP append/only dns-cache domain ] domain ] parse-log-line: func [ "Parse one line in CLF web.log format and return the IP address, hit date, hit time, file hit, bytes used, referring page, and browser type" log-line [string!] "One line in CLF web.log format" file-to-save [file!] "The name of the target file to write returned variables" /local current-line ][ current-line: parse log-line none IP-address: make string! current-line/1 domain-address: dns-lookup dns-library IP-address hit-date: assemble-date log-line hit-time: assemble-time log-line hit-file: current-line/6 hit-bytes: current-line/8 referring-page: current-line/9 browser-type: current-line/10 write/append csv-file-name (rejoin [IP-address "," domain-address "," hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page "," browser-type newline]) ] checksum-date: assemble-date log-file/1 csv-file-name: make file! (rejoin [checksum-date ".csv"]) log-directory: read %. either find/any log-directory csv-file-name [ foreach log-line log-file [ current-line: parse log-line none hit-date: assemble-date log-line either not-equal? hit-date checksum-date [ csv-file-name: make file! (rejoin [hit-date ".csv"]) write csv-file-name {User IP Address, User Domain Address, Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type} write/append csv-file-name (newline newline) if error? try [parse-log-line log-line csv-file-name][next log-file] checksum-date: hit-date ][ if error? try [parse-log-line log-line csv-file-name][next log-file] next log-file ] ] ][ write csv-file-name {User IP Address, User Domain Address, Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type} write/append csv-file-name (newline newline) foreach log-line log-file [ current-line: parse log-line none hit-date: assemble-date log-line either not-equal? hit-date checksum-date [ csv-file-name: make file! (rejoin [hit-date ".csv"]) write csv-file-name {User IP Address, User Domain Address, Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type} write/append csv-file-name (newline newline) if error? try [parse-log-line log-line csv-file-name][next log-file] checksum-date: hit-date ][ if error? try [parse-log-line log-line csv-file-name][next log-file] next log-file ] ] ] write %dns-library.r "" foreach library-entry dns-library [ write/append %dns-library.r library-entry write/append %dns-library.r " " ] Ryan C. Christiansen Web Developer Intellisol International 4733 Amber Valley Parkway Fargo, ND 58104 701-235-3390 ext. 6671 FAX: 701-235-9940 http://www.intellisol.com Global Leader in People Performance Software _____________________________________ Confidentiality Notice This message may contain privileged and confidential information. If you think, for any reason, that this message may have been addressed to you in error, you must not disseminate, copy or take any action in reliance on it, and we would ask you to notify us immediately by return email to [ryan--christiansen--intellisol--com]

 [2/3] from: gchiu::compkarori::co::nz at: 21-Mar-2001 7:41


On Tue, 20 Mar 2001 09:56:45 -0600 [ryan--christiansen--intellisol--com] wrote:
> Below is an updated version of the CLF format web.log > file data mining
<<quoted lines omitted: 3>>
> now calls > functions instead of being simply linear.
Have you thought about saving the dns-cache between iterations, and loading it each time? I was going to do that for mine, but never got around to it. -- Graham Chiu

 [3/3] from: ryan:christiansen:intellisol at: 20-Mar-2001 14:14


>> Below is an updated version of the CLF format web.log >> file data mining
<<quoted lines omitted: 6>>
>iterations, and loading it each time? >I was going to do that for mine, but never got around to it.
Yes, the first time the script runs, it loads a 0-byte cache file called %dns-library.r, as such... retrieved-library: read %dns-library.r dns-library: parse retrieved-library none It then uses the following function for the DNS lookups... dns-lookup: func [ "Convert an IP address to a domain name" dns-cache "A cache of IP addresses and corresponding domain names" IP [string!] "The IP address that needs to be converted to a domain" /local domain ][ domain: select dns-cache IP if ( domain == none ) [ domain: read join dns:// IP if ( domain == none ) [ domain: "unresolved" ] append/only dns-cache IP append/only dns-cache domain ] domain ] At the end of the script, it saves the dns cache back to the %dns-library.r file, as such... write %dns-library.r "" foreach library-entry dns-library [ write/append %dns-library.r library-entry write/append %dns-library.r " " ]

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted