.log to .csv conversion
[1/6] from: gchiu:compkarori at: 18-Mar-2001 17:16
On Thu, 15 Mar 2001 09:51:19 -0600
[ryan--christiansen--intellisol--com] wrote:
> I do have a question, however: the script takes a LONG
> time to run because
> of all of the DNS lookups. Is there any way to speed this
> up?
>
I wrote a script last year that just resolves ip addresses
contained within weblogs prior to further analysis. Cached
the lookup for each ip address, but it still took ages :-(
http://www.compkarori.co.nz/x.php?/zwiki/LogResolver
--
Graham Chiu
[2/6] from: ryan:christiansen:intellisol at: 15-Mar-2001 9:51
Below I am sharing a script that will parse CLF Apache web.log files and
place the data into Excel-friendly .csv files.
I do have a question, however: the script takes a LONG time to run because
of all of the DNS lookups. Is there any way to speed this up?
Thanks.
Ryan C. Christiansen
Web Developer
Intellisol International
REBOL []
log-file: read/lines ftp://username:[password--domain--dom]/logs/web.log
first-line: parse log-file/1 none
first-line/4: remove first-line/4 {[}
first-line/5: remove first-line/5 {]}
checksum-string: rejoin [first-line/4 " " first-line/5]
checksum-date: make date! checksum-string
csv-file-name: make file! (rejoin [checksum-date ".csv"])
write csv-file-name {User IP Address, User Domain Address, Date Hit, Time
Hit, File Hit, Bytes Transferred, Referring Page, Browser Type}
write/append csv-file-name (newline newline)
foreach log-line log-file [
current-line: parse log-line none
current-line/4: remove current-line/4 {[}
current-line/5: remove current-line/5 {]}
date-string: rejoin [current-line/4 " " current-line/5]
hit-date: make date! date-string
either not-equal? hit-date checksum-date [
csv-file-name: make file! (rejoin [hit-date ".csv"])
write csv-file-name {User IP Address, User Domain Address, Date
Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type}
write/append csv-file-name (newline newline)
current-line: parse log-line none
IP-address: make tuple! current-line/1
domain-address: read join dns:// IP-address
current-line/4: remove current-line/4 {[}
current-line/5: remove current-line/5 {]}
date-string: rejoin [current-line/4 " " current-line/5]
hit-date: make date! date-string
parse date-string [thru ":" copy text to end (hit-time: make time!
text)]
hit-file: current-line/6
hit-bytes: current-line/8
referring-page: make url! current-line/9
browser-type: current-line/10
write/append csv-file-name (rejoin [IP-address "," domain-address
,
hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page ","
browser-type newline])
checksum-date: hit-date
][
current-line: parse log-line none
IP-address: make tuple! current-line/1
domain-address: read join dns:// IP-address
current-line/4: remove current-line/4 {[}
current-line/5: remove current-line/5 {]}
date-string: rejoin [current-line/4 " " current-line/5]
hit-date: make date! date-string
parse date-string [thru ":" copy text to end (hit-time: make time!
text)]
hit-file: current-line/6
hit-bytes: current-line/8
referring-page: make url! current-line/9
browser-type: current-line/10
write/append csv-file-name (rejoin [IP-address "," domain-address
,
hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page ","
browser-type newline])
print (rejoin [IP-address "," domain-address "," hit-date ","
hit-time "," hit-file "," hit-bytes "," referring-page "," browser-type
newline])
next log-file
]
]
[3/6] from: tomc:darkwing:uoregon at: 15-Mar-2001 12:58
make (and save) a hash of ip -> domain-name
and only do the dns lookup once per ip
On Thu, 15 Mar 2001 [ryan--christiansen--intellisol--com] wrote:
[4/6] from: ryan:christiansen:intellisol at: 15-Mar-2001 15:59
I did try this, but ran into some problems. Here is the code I was using...
DNS-library: load %dns-library.r
foreach entry DNS-library [
either find/any entry/2/1 IP-address [
domain-address: entry/2/2
][
domain-address: read join dns:// IP-address
new-entry: make block! (rejoin [{DNS ["} IP-address {" "}
domain-address {"]} ])
append/only DNS-library new-entry
save %dns-library.r copy/deep DNS-library
]
]
Here is (an example of) the contents of the %dns-library.r file
[
DNS ["0.0.0.0" "sub.domain.dom"]
]
The above code would not find IP matches and instead would just keep
appending the same DNS entries to the library over-and-over.
>make (and save) a hash of ip -> domain-name
>and only do the dns lookup once per ip
>> Below I am sharing a script that will parse CLF Apache web.log files and
>> place the data into Excel-friendly .csv files.
>>
>> I do have a question, however: the script takes a LONG time to run
because
>> of all of the DNS lookups. Is there any way to speed this up?
Ryan C. Christiansen
Web Developer
Intellisol International
4733 Amber Valley Parkway
Fargo, ND 58104
701-235-3390 ext. 6671
FAX: 701-235-9940
http://www.intellisol.com
Global Leader in People Performance Software
_____________________________________
Confidentiality Notice
This message may contain privileged and confidential information. If you
think, for any reason, that this message may have been addressed to you in
error, you must not disseminate, copy or take any action in reliance on it,
and we would ask you to notify us immediately by return email to
[ryan--christiansen--intellisol--com]
[5/6] from: brett::codeconscious::com at: 16-Mar-2001 9:26
> > I do have a question, however: the script takes a LONG time to run
because
> > of all of the DNS lookups. Is there any way to speed this up?
> make (and save) a hash of ip -> domain-name
> and only do the dns lookup once per ip
If you are using an experimental version of Rebol, you could open a port to
DNS instead of reading from it each time.
As I understand it this will maintain a connection to the DNS server
allowing multiple queries without the connection overhead.
Though I don't know how much a gain you'll get with this.
Also, I've found that quite a few IPs will not resolve to a domain name. I
figured if I could use whois I could not only find out who has been assigned
the IP but get an ip range possibly satisfying more than the one ip I'm
looking for. Alas, as far as I can tell at the moment, whois is not standard
so you would have to write a parser for each whois you need to query (btw,
anyone done this ?).
For the purposes of statistics then, it may be better not to resolve the
names, unless you really really need to.
Brett.
[6/6] from: tomc:darkwing:uoregon at: 15-Mar-2001 21:59
I would keep it simpler
this is not tested but something like would work
given a file dns-cache.save with format
0.0.0.0 sub.domain.dom
1.0.0.0 sub.domain.dom1
2.0.0.0 sub.domain.dom2
...
dns-cache: parse %dns-cache.save none
foreach ip log [
either none? domain-address: select dns-cache ip
[append dns-cache ip
append dns-cache either none? domain-address: read join dns:// ip
[domain-address: ip]
[domain-address]
]
[domain-address]
use yer domain-address
]
at the end to save I would probably just clobber the old dns-cache file
buffer copy ""
foreach [a b] dns-cache [append buffer rejoin [a "^-" b "^/"]]
write %dns-cache.save buffer
On Thu, 15 Mar 2001 [ryan--christiansen--intellisol--com] wrote: