Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

.log to .csv conversion

 [1/6] from: gchiu:compkarori at: 18-Mar-2001 17:16


On Thu, 15 Mar 2001 09:51:19 -0600 [ryan--christiansen--intellisol--com] wrote:
> I do have a question, however: the script takes a LONG > time to run because > of all of the DNS lookups. Is there any way to speed this > up? >
I wrote a script last year that just resolves ip addresses contained within weblogs prior to further analysis. Cached the lookup for each ip address, but it still took ages :-( http://www.compkarori.co.nz/x.php?/zwiki/LogResolver -- Graham Chiu

 [2/6] from: ryan:christiansen:intellisol at: 15-Mar-2001 9:51


Below I am sharing a script that will parse CLF Apache web.log files and place the data into Excel-friendly .csv files. I do have a question, however: the script takes a LONG time to run because of all of the DNS lookups. Is there any way to speed this up? Thanks. Ryan C. Christiansen Web Developer Intellisol International REBOL [] log-file: read/lines ftp://username:[password--domain--dom]/logs/web.log first-line: parse log-file/1 none first-line/4: remove first-line/4 {[} first-line/5: remove first-line/5 {]} checksum-string: rejoin [first-line/4 " " first-line/5] checksum-date: make date! checksum-string csv-file-name: make file! (rejoin [checksum-date ".csv"]) write csv-file-name {User IP Address, User Domain Address, Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type} write/append csv-file-name (newline newline) foreach log-line log-file [ current-line: parse log-line none current-line/4: remove current-line/4 {[} current-line/5: remove current-line/5 {]} date-string: rejoin [current-line/4 " " current-line/5] hit-date: make date! date-string either not-equal? hit-date checksum-date [ csv-file-name: make file! (rejoin [hit-date ".csv"]) write csv-file-name {User IP Address, User Domain Address, Date Hit, Time Hit, File Hit, Bytes Transferred, Referring Page, Browser Type} write/append csv-file-name (newline newline) current-line: parse log-line none IP-address: make tuple! current-line/1 domain-address: read join dns:// IP-address current-line/4: remove current-line/4 {[} current-line/5: remove current-line/5 {]} date-string: rejoin [current-line/4 " " current-line/5] hit-date: make date! date-string parse date-string [thru ":" copy text to end (hit-time: make time! text)] hit-file: current-line/6 hit-bytes: current-line/8 referring-page: make url! current-line/9 browser-type: current-line/10 write/append csv-file-name (rejoin [IP-address "," domain-address , hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page "," browser-type newline]) checksum-date: hit-date ][ current-line: parse log-line none IP-address: make tuple! current-line/1 domain-address: read join dns:// IP-address current-line/4: remove current-line/4 {[} current-line/5: remove current-line/5 {]} date-string: rejoin [current-line/4 " " current-line/5] hit-date: make date! date-string parse date-string [thru ":" copy text to end (hit-time: make time! text)] hit-file: current-line/6 hit-bytes: current-line/8 referring-page: make url! current-line/9 browser-type: current-line/10 write/append csv-file-name (rejoin [IP-address "," domain-address , hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page "," browser-type newline]) print (rejoin [IP-address "," domain-address "," hit-date "," hit-time "," hit-file "," hit-bytes "," referring-page "," browser-type newline]) next log-file ] ]

 [3/6] from: tomc:darkwing:uoregon at: 15-Mar-2001 12:58


make (and save) a hash of ip -> domain-name and only do the dns lookup once per ip On Thu, 15 Mar 2001 [ryan--christiansen--intellisol--com] wrote:

 [4/6] from: ryan:christiansen:intellisol at: 15-Mar-2001 15:59


I did try this, but ran into some problems. Here is the code I was using... DNS-library: load %dns-library.r foreach entry DNS-library [ either find/any entry/2/1 IP-address [ domain-address: entry/2/2 ][ domain-address: read join dns:// IP-address new-entry: make block! (rejoin [{DNS ["} IP-address {" "} domain-address {"]} ]) append/only DNS-library new-entry save %dns-library.r copy/deep DNS-library ] ] Here is (an example of) the contents of the %dns-library.r file [ DNS ["0.0.0.0" "sub.domain.dom"] ] The above code would not find IP matches and instead would just keep appending the same DNS entries to the library over-and-over.
>make (and save) a hash of ip -> domain-name >and only do the dns lookup once per ip >> Below I am sharing a script that will parse CLF Apache web.log files and >> place the data into Excel-friendly .csv files. >> >> I do have a question, however: the script takes a LONG time to run
because
>> of all of the DNS lookups. Is there any way to speed this up?
Ryan C. Christiansen Web Developer Intellisol International 4733 Amber Valley Parkway Fargo, ND 58104 701-235-3390 ext. 6671 FAX: 701-235-9940 http://www.intellisol.com Global Leader in People Performance Software _____________________________________ Confidentiality Notice This message may contain privileged and confidential information. If you think, for any reason, that this message may have been addressed to you in error, you must not disseminate, copy or take any action in reliance on it, and we would ask you to notify us immediately by return email to [ryan--christiansen--intellisol--com]

 [5/6] from: brett::codeconscious::com at: 16-Mar-2001 9:26


> > I do have a question, however: the script takes a LONG time to run
because
> > of all of the DNS lookups. Is there any way to speed this up? > make (and save) a hash of ip -> domain-name > and only do the dns lookup once per ip
If you are using an experimental version of Rebol, you could open a port to DNS instead of reading from it each time. As I understand it this will maintain a connection to the DNS server allowing multiple queries without the connection overhead. Though I don't know how much a gain you'll get with this. Also, I've found that quite a few IPs will not resolve to a domain name. I figured if I could use whois I could not only find out who has been assigned the IP but get an ip range possibly satisfying more than the one ip I'm looking for. Alas, as far as I can tell at the moment, whois is not standard so you would have to write a parser for each whois you need to query (btw, anyone done this ?). For the purposes of statistics then, it may be better not to resolve the names, unless you really really need to. Brett.

 [6/6] from: tomc:darkwing:uoregon at: 15-Mar-2001 21:59


I would keep it simpler this is not tested but something like would work given a file dns-cache.save with format 0.0.0.0 sub.domain.dom 1.0.0.0 sub.domain.dom1 2.0.0.0 sub.domain.dom2 ... dns-cache: parse %dns-cache.save none foreach ip log [ either none? domain-address: select dns-cache ip [append dns-cache ip append dns-cache either none? domain-address: read join dns:// ip [domain-address: ip] [domain-address] ] [domain-address] use yer domain-address ] at the end to save I would probably just clobber the old dns-cache file buffer copy "" foreach [a b] dns-cache [append buffer rejoin [a "^-" b "^/"]] write %dns-cache.save buffer On Thu, 15 Mar 2001 [ryan--christiansen--intellisol--com] wrote: