[REBOL] Re: percent! - new datatype request
From: joel:neely:fedex at: 14-Jun-2002 7:05
Hi, Gregg,
Have done.
Gregg Irwin wrote:
> Excellent info! I don't want to make more work for you, but to
> be as apples-to-apples as possible, could you try using a hash!
> for tally and see what you get? I tried it here, with your last
> version, and got a pretty good increase.
>
> tally: make hash! []
>
> Also, the timings exclude display of the results, correct?
>
Right. All timings were done with a command similar to
(/export/home/jneely/try)# time tally.pl > /dev/null
real 0m0.29s
user 0m0.23s
sys 0m0.04s
which, of course, sends all script output to the bit bucket.
(Notice that finding a time when the box is lightly loaded
helps the run times...)
As before, comparable median-of-3 times are (in increasing
run-time order, the inserted "h" indicates the HASH! version
of yesterday's scripts):
tally.pl real 0m0.29s
user 0m0.23s
sys 0m0.04s
tallyhs.r real 0m1.39s
user 0m1.37s
sys 0m0.02s
tallyhb.r real 0m1.25s
user 0m1.17s
sys 0m0.08s
tallyhz.r real 0m1.68s
user 0m1.59s
sys 0m0.08s
tallyb.r real 0m5.63s
user 0m5.53s
sys 0m0.08s
tallyz.r real 0m5.82s
user 0m5.67s
sys 0m0.10s
tallys.r real 0m8.58s
user 0m8.48s
sys 0m0.08s
tallyn.r real 0m9.77s
user 0m9.63s
sys 0m0.10s
All of the times for the "h" version were obtained after
replacing the initialization of TALLY with
tally: make hash! []
(I didn't pre-allocate a large hash for the sake of fairness,
as I hadn't done so for the block version nor for the Perl
implementation.)
Notice that there's no time for TALLYHN.R for a very good/bad
reason. I wasted considerable time trying to figure out if I
had inadvertently broken it!!! I finally used top to verify
that, indeed, it was running (and eating up about 0.25 of a
CPU -- that's 25% for the REBOL/Viewtopia users ;-). Inserting
tracing output as shown below revealed that after adding 380
words, the *hn* version gets stuck, but continues to eat CPU
time. In multiple runs it gets stuck at exactly the same
place in the data, as illustrated in the following trace:
...
added turned : now 754
found a : a 23
added corner : now 756
found oh : oh 3
found my : my 2
added ears : now 758
found and : and 30
added whiskers : now 760
found how : how 5
found late : late 2
found it : it 32
found s : s 4
found getting : getting 3
found she : she 35
I added the same tracing to *hz* and captured the output:
...
added whiskers : now 760
found how : how 5
found late : late 2
found it : it 32
found s : s 4
found getting : getting 3
found she : she 35
found was : was 31
found close : close 2
added behind : now 762
...
which shows that *hn* is getting stuck trying to increment
the count for the word "was", rather than trying to add a
new word (if that matters to anybody or helps identify the
nature of the problem).
I am enclosing the code below in the hopes that someone
will see something silly that I've overlooked. Otherwise I'll
have to conclude that there's a bug involving path-based access
into a hash! value.
8<--------------------tallyhn.r--------------------
#!/export/home/jneely/bin/rebol -sq
REBOL []
text: read %alice.txt
tally: make hash! []
alpha: charset [#"a" - #"z"]
word: ""
parse/all lowercase text [
any [
copy word some alpha (
either here: find/skip tally word 2 [
here/2: here/2 + 1
print ["found" word ":" copy/part here 2]
][
repend tally [word 1]
print ["added" word ": now" length? tally]
]
)
|
skip
]
]
foreach [word count] sort/skip tally 2 [
print [count tab word]
]
quit
8<-------------------------------------------------
Incidentally, don't take the timings *too* terribly seriously;
as with all QAD benchmarks. These timings were run on a 4-CPU
Sun 4500, but a variety of factors (transient loads, other tasks,
stuff still in buffers, etc.) can introduce noticeable jitter in
the times for runs as short as these.
-jn-
--
; Joel Neely joeldotneelyatfedexdotcom
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]