[REBOL] Small admin/report benchmark
From: joel:neely:fedex at: 17-Sep-2000 23:16
Here's a small benchmark based on a fairly typical kind
of sysadmin, file processing, or data reduction task.
The timing results are given after the description of
the problem and the code I used for timing.
I'm including a sample data file and output so that
anyone who wants to improve on my code can test his/her
solution with the same data.
Problem:
Read through the colon-delimited file below (a copy of
an /etc/passwd file, mangled for security purposes),
and print a report tallying the distinct values in the
7th field (in this case, the field that identifies the
default shell for each userID). Sort the results by
the value of the field being tallied, and print the
results in neat columns.
========== begin sample data file ==========
ayxa:a:824:277:Zmgxoy "Uucl" Tmaam:/ibmg/rsrn:/bin/bash
ciyp:x:8:72:jksi:/zfp/qnffy/drgn:
grnfk:p:115:383:SNMKW hwxyzry:/evfk/tsmgt:/bin/bash
guqkvtwn:o:2:2:blvbnjsg:/lbld:/sbin/shutdown
kpsbwt:z:85:98:frwbqf:/zeu/bml/egtyin-mexi:
kst:y:1:3:gsi:/opj:
kvik:f:3:9:clnd:/bfje:/bin/sync
lw:b:518:941:Nxpn "VAXVAzjzw" Muhxp:/esaq/jy:/bin/bash
mmlmgxep:g:69:4:lnuytrat:/vvot:
nnd:g:46:89:WQT Pcpu:/shrs/vzq:
ospax:t:707:92:Lpojk bro Gokzfe:/qaqf/gaedx:/bin/bash
pbubex:v:7:9:eworcq:/khuc:
qpdd:l:39:19:qckl:/omg/clghd/szdr:
rkuu:p:2:1:rusj:/ltnm:/sbin/halt
sft:q:045:235:H Buya Gtdtre:/stj/W75/ot:/bin/false
srimrg:c:19:41:Ybltzu:/:
vgonz:n:86:051:oqufv:/hgo/awkxv:
vqn:n:7:0:ant:/npy/hpm:
wbci:s:2:5:pyiy:/xngl:/bin/bash
wlzr:o:7:00:rqwe:/mxy/rkchz/lsfu:
xao:c:21:50::/fgqu/orw:/bin/bash
yf:d:1:6:ay:/xzb/njwes/kvd:
=========== end sample data file ===========
========= begin sample output list =========
12 (none)
6 /bin/bash
1 /bin/false
1 /bin/sync
1 /sbin/halt
1 /sbin/shutdown
========== end sample output list ==========
Perl solution:
A fairly typical Perl script to perform this task is
given below. I don't claim any particular brilliance
here, but it does use some fairly Perlish idioms.
============= begin Perl script =============
#!/usr/bin/perl -w
my ($line, $shell, %shells) = ("", "");
open (PW, "<passwords.txt") or die "can't read file\n";
while ($line = <PW>) {
($shell = (split /:/, $line, 7) [6]) =~ s/\s+$//;
++$shells{$shell or "(none)"};
}
close (PW);
foreach $shell (sort keys %shells) {
printf "%3d %s\n", $shells{$shell}, $shell;
}
============== end Perl script ==============
REBOL solution:
My effort at producing a comparable REBOL script is
given next. I tried to use appropriate REBOLish
idioms to accomplish equivalent results.
============ begin REBOL script ============
#!/usr/local/bin/rebol -sq
REBOL []
shells: copy make block! 5
countshell: func [sh [string!] /local shr] [
either none? shr: find shells sh [
append shells reduce [sh 1]
][
change shr: next shr 1 + shr/1
]
]
foreach line read/lines %passwords.txt [
countshell any [pick parse/all line ":" 7 "(none)"]
]
foreach [sh n] sort/skip shells 2 [
n: to-string n
while [3 > length? n] [insert n " "]
print [n sh]
]
============= end REBOL script =============
Remarks:
Note in particular the need in REBOL for:
1) A function (or other hand-written code) to handle
the separate cases of updating a counter for a
key value already present versus initializing a counter
for the first time a key is encountered.
2) The slightly awkward phrase that updates the
counter (the "change shr: ..." line). Can anyone
suggest a tidier way to do this? Bear in mind that
the keys are strings coming from a data file, so we
don't know in advance what values may occur.
3) The explicit code to pad the numeric value of the
counter with leading spaces (the "while..." inside
the last "foreach ..."). Again, any suggestions for
improvement are welcome.
Benchmark results:
Both scripts were run from the command line using the
time
command to accumulate statistics. In order to
scale the run times up to expose significant differences,
I concatenated 16k copies of the above sample data file
into a single data file of 360,448 lines (13,287,424
bytes). The output from the benchmark runs (with lines
slightly rewrapped to fit in email) follows below:
=========== begin benchmark output ===========
$ time ./nsh.pl
196608 (none)
98304 /bin/bash
16384 /bin/false
16384 /bin/sync
16384 /sbin/halt
16384 /sbin/shutdown
34.98user 0.18system 0:35.22elapsed 99%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (259major+48minor)pagefaults 0swaps
$ time ./nsh.r
include done
196608 (none)
98304 /bin/bash
16384 /bin/false
16384 /bin/sync
16384 /sbin/halt
16384 /sbin/shutdown
70.28user 3.85system 1:17.72elapsed 95%CPU
(0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (4911major+13977minor)pagefaults 3045swaps
============ end benchmark output ============
Interpretation:
1) The Perl version was approximately twice as fast
as the REBOL version -- not too bad, considering
Perl's reputation for a mature, reasonably well-
optimized interpreter. However...
2) Note the CONSIDERABLY larger number of page faults.
This led me to wonder how much of the total run
time for REBOL was due to the fact that the entire file
was slurped into memory at once, instead of being dealt
with a line at a time. OTOH, isn't that how most of us
would code up a small QAD task similar to this?
(If anyone wants to code up a buffered version, I'll be
glad to rerun the timings for all three versions.)
I reran the test with top going in another terminal
window, and saw that the Perl version ran in about 1/20th
the memory of the REBOL version. Both nearly saturated
the CPU, with Perl slightly higher, in both the original
benchmark run given above and the second run (times not
reported due to the degradation imposed by running top
concurrently with the processing).
Comments welcome.
-jn-