[REBOL] [UPDATE] web.log data mining
From: ryan:christiansen:intellisol at: 20-Mar-2001 9:56
Below is an updated version of the CLF format web.log file data mining
script, which parses CLF format web.log files and places the data in .csv
files for use in Excel and other applications. The script now calls
functions instead of being simply linear.
Usage:
You must begin with a 0-byte %dns-library.r file in the same directory as
the script.
Change username, password, domain.dom and path to web.log file in ftp
protocol statement.
The script will write separate .csv files in the same directory as the
script, each .csv file named for the date of the data for which it
contains. If you run the script a second day, it will append to a .csv file
for a given date if that .csv file exists in the directory. The script is
designed to mine a web.log file which has been rotated to a 0-byte log file
on a daily basis.
REBOL []
log-file: read/lines ftp://username:[password--domain--dom]/logs/web.log
retrieved-library: read %dns-library.r
dns-library: parse retrieved-library none
assemble-date: func [
"Parse one line in CLF web.log format and return the date as a REBOL
date! datatype"
log-line [string!] "One line in CLF web.log format"
/local date-line date-string return-date
][
date-line: parse log-line none
date-line/4: remove date-line/4 {[}
date-line/5: remove date-line/5 {]}
date-string: rejoin [date-line/4 " " date-line/5]
return-date: make date! date-string
]
assemble-time: func [
"Parse one line in CLF web.log format and return the time as a REBOL
time! datatype"
log-line [string!] "One line in CLF web.log format"
/local date-line date-string return-time
][
date-line: parse log-line none
date-line/4: remove date-line/4 {[}
date-line/5: remove date-line/5 {]}
date-string: rejoin [date-line/4 " " date-line/5]
parse date-string [thru ":" copy text to end (return-time: make time!
text)]
return-time
]
dns-lookup: func [
"Convert an IP address to a domain name"
dns-cache "A cache of IP addresses and corresponding domain names"
IP [string!] "The IP address that needs to be converted to a domain"
/local domain
][
domain: select dns-cache IP
if ( domain == none ) [
domain: read join dns:// IP
if ( domain == none ) [
domain: "unresolved"
]
append/only dns-cache IP
append/only dns-cache domain
]
domain
]
parse-log-line: func [
"Parse one line in CLF web.log format and return the IP address, hit
date, hit time, file hit, bytes used, referring page, and browser type"
log-line [string!] "One line in CLF web.log format"
file-to-save [file!] "The name of the target file to write returned
variables"
/local current-line
][
current-line: parse log-line none
IP-address: make string! current-line/1
domain-address: dns-lookup dns-library IP-address
hit-date: assemble-date log-line
hit-time: assemble-time log-line
hit-file: current-line/6
hit-bytes: current-line/8
referring-page: current-line/9
browser-type: current-line/10
write/append csv-file-name (rejoin [IP-address "," domain-address ","
hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page ","
browser-type newline])
]
checksum-date: assemble-date log-file/1
csv-file-name: make file! (rejoin [checksum-date ".csv"])
log-directory: read %.
either find/any log-directory csv-file-name [
foreach log-line log-file [
current-line: parse log-line none
hit-date: assemble-date log-line
either not-equal? hit-date checksum-date [
csv-file-name: make file! (rejoin [hit-date ".csv"])
write csv-file-name {User IP Address, User Domain Address, Date
Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type}
write/append csv-file-name (newline newline)
if error? try [parse-log-line log-line csv-file-name][next
log-file]
checksum-date: hit-date
][
if error? try [parse-log-line log-line csv-file-name][next
log-file]
next log-file
]
]
][
write csv-file-name {User IP Address, User Domain Address, Date Hit,
Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type}
write/append csv-file-name (newline newline)
foreach log-line log-file [
current-line: parse log-line none
hit-date: assemble-date log-line
either not-equal? hit-date checksum-date [
csv-file-name: make file! (rejoin [hit-date ".csv"])
write csv-file-name {User IP Address, User Domain Address,
Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser
Type}
write/append csv-file-name (newline newline)
if error? try [parse-log-line log-line csv-file-name][next
log-file]
checksum-date: hit-date
][
if error? try [parse-log-line log-line csv-file-name][next
log-file]
next log-file
]
]
]
write %dns-library.r ""
foreach library-entry dns-library [
write/append %dns-library.r library-entry
write/append %dns-library.r " "
]
Ryan C. Christiansen
Web Developer
Intellisol International
4733 Amber Valley Parkway
Fargo, ND 58104
701-235-3390 ext. 6671
FAX: 701-235-9940
http://www.intellisol.com
Global Leader in People Performance Software
_____________________________________
Confidentiality Notice
This message may contain privileged and confidential information. If you
think, for any reason, that this message may have been addressed to you in
error, you must not disseminate, copy or take any action in reliance on it,
and we would ask you to notify us immediately by return email to
[ryan--christiansen--intellisol--com]