Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Small admin/report benchmark

From: joel:neely:fedex at: 17-Sep-2000 23:16

Here's a small benchmark based on a fairly typical kind of sysadmin, file processing, or data reduction task. The timing results are given after the description of the problem and the code I used for timing. I'm including a sample data file and output so that anyone who wants to improve on my code can test his/her solution with the same data. Problem: Read through the colon-delimited file below (a copy of an /etc/passwd file, mangled for security purposes), and print a report tallying the distinct values in the 7th field (in this case, the field that identifies the default shell for each userID). Sort the results by the value of the field being tallied, and print the results in neat columns. ========== begin sample data file ========== ayxa:a:824:277:Zmgxoy "Uucl" Tmaam:/ibmg/rsrn:/bin/bash ciyp:x:8:72:jksi:/zfp/qnffy/drgn: grnfk:p:115:383:SNMKW hwxyzry:/evfk/tsmgt:/bin/bash guqkvtwn:o:2:2:blvbnjsg:/lbld:/sbin/shutdown kpsbwt:z:85:98:frwbqf:/zeu/bml/egtyin-mexi: kst:y:1:3:gsi:/opj: kvik:f:3:9:clnd:/bfje:/bin/sync lw:b:518:941:Nxpn "VAXVAzjzw" Muhxp:/esaq/jy:/bin/bash mmlmgxep:g:69:4:lnuytrat:/vvot: nnd:g:46:89:WQT Pcpu:/shrs/vzq: ospax:t:707:92:Lpojk bro Gokzfe:/qaqf/gaedx:/bin/bash pbubex:v:7:9:eworcq:/khuc: qpdd:l:39:19:qckl:/omg/clghd/szdr: rkuu:p:2:1:rusj:/ltnm:/sbin/halt sft:q:045:235:H Buya Gtdtre:/stj/W75/ot:/bin/false srimrg:c:19:41:Ybltzu:/: vgonz:n:86:051:oqufv:/hgo/awkxv: vqn:n:7:0:ant:/npy/hpm: wbci:s:2:5:pyiy:/xngl:/bin/bash wlzr:o:7:00:rqwe:/mxy/rkchz/lsfu: xao:c:21:50::/fgqu/orw:/bin/bash yf:d:1:6:ay:/xzb/njwes/kvd: =========== end sample data file =========== ========= begin sample output list ========= 12 (none) 6 /bin/bash 1 /bin/false 1 /bin/sync 1 /sbin/halt 1 /sbin/shutdown ========== end sample output list ========== Perl solution: A fairly typical Perl script to perform this task is given below. I don't claim any particular brilliance here, but it does use some fairly Perlish idioms. ============= begin Perl script ============= #!/usr/bin/perl -w my ($line, $shell, %shells) = ("", ""); open (PW, "<passwords.txt") or die "can't read file\n"; while ($line = <PW>) { ($shell = (split /:/, $line, 7) [6]) =~ s/\s+$//; ++$shells{$shell or "(none)"}; } close (PW); foreach $shell (sort keys %shells) { printf "%3d %s\n", $shells{$shell}, $shell; } ============== end Perl script ============== REBOL solution: My effort at producing a comparable REBOL script is given next. I tried to use appropriate REBOLish idioms to accomplish equivalent results. ============ begin REBOL script ============ #!/usr/local/bin/rebol -sq REBOL [] shells: copy make block! 5 countshell: func [sh [string!] /local shr] [ either none? shr: find shells sh [ append shells reduce [sh 1] ][ change shr: next shr 1 + shr/1 ] ] foreach line read/lines %passwords.txt [ countshell any [pick parse/all line ":" 7 "(none)"] ] foreach [sh n] sort/skip shells 2 [ n: to-string n while [3 > length? n] [insert n " "] print [n sh] ] ============= end REBOL script ============= Remarks: Note in particular the need in REBOL for: 1) A function (or other hand-written code) to handle the separate cases of updating a counter for a key value already present versus initializing a counter for the first time a key is encountered. 2) The slightly awkward phrase that updates the counter (the "change shr: ..." line). Can anyone suggest a tidier way to do this? Bear in mind that the keys are strings coming from a data file, so we don't know in advance what values may occur. 3) The explicit code to pad the numeric value of the counter with leading spaces (the "while..." inside the last "foreach ..."). Again, any suggestions for improvement are welcome. Benchmark results: Both scripts were run from the command line using the time command to accumulate statistics. In order to scale the run times up to expose significant differences, I concatenated 16k copies of the above sample data file into a single data file of 360,448 lines (13,287,424 bytes). The output from the benchmark runs (with lines slightly rewrapped to fit in email) follows below: =========== begin benchmark output =========== $ time ./ 196608 (none) 98304 /bin/bash 16384 /bin/false 16384 /bin/sync 16384 /sbin/halt 16384 /sbin/shutdown 34.98user 0.18system 0:35.22elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (259major+48minor)pagefaults 0swaps $ time ./nsh.r include done 196608 (none) 98304 /bin/bash 16384 /bin/false 16384 /bin/sync 16384 /sbin/halt 16384 /sbin/shutdown 70.28user 3.85system 1:17.72elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (4911major+13977minor)pagefaults 3045swaps ============ end benchmark output ============ Interpretation: 1) The Perl version was approximately twice as fast as the REBOL version -- not too bad, considering Perl's reputation for a mature, reasonably well- optimized interpreter. However... 2) Note the CONSIDERABLY larger number of page faults. This led me to wonder how much of the total run time for REBOL was due to the fact that the entire file was slurped into memory at once, instead of being dealt with a line at a time. OTOH, isn't that how most of us would code up a small QAD task similar to this? (If anyone wants to code up a buffered version, I'll be glad to rerun the timings for all three versions.) I reran the test with top going in another terminal window, and saw that the Perl version ran in about 1/20th the memory of the REBOL version. Both nearly saturated the CPU, with Perl slightly higher, in both the original benchmark run given above and the second run (times not reported due to the degradation imposed by running top concurrently with the processing). Comments welcome. -jn-