Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: percent! - new datatype request

From: joel:neely:fedex at: 14-Jun-2002 7:05

Hi, Gregg, Have done. Gregg Irwin wrote:
> Excellent info! I don't want to make more work for you, but to > be as apples-to-apples as possible, could you try using a hash! > for tally and see what you get? I tried it here, with your last > version, and got a pretty good increase. > > tally: make hash! [] > > Also, the timings exclude display of the results, correct? >
Right. All timings were done with a command similar to (/export/home/jneely/try)# time > /dev/null real 0m0.29s user 0m0.23s sys 0m0.04s which, of course, sends all script output to the bit bucket. (Notice that finding a time when the box is lightly loaded helps the run times...) As before, comparable median-of-3 times are (in increasing run-time order, the inserted "h" indicates the HASH! version of yesterday's scripts): real 0m0.29s user 0m0.23s sys 0m0.04s tallyhs.r real 0m1.39s user 0m1.37s sys 0m0.02s tallyhb.r real 0m1.25s user 0m1.17s sys 0m0.08s tallyhz.r real 0m1.68s user 0m1.59s sys 0m0.08s tallyb.r real 0m5.63s user 0m5.53s sys 0m0.08s tallyz.r real 0m5.82s user 0m5.67s sys 0m0.10s tallys.r real 0m8.58s user 0m8.48s sys 0m0.08s tallyn.r real 0m9.77s user 0m9.63s sys 0m0.10s All of the times for the "h" version were obtained after replacing the initialization of TALLY with tally: make hash! [] (I didn't pre-allocate a large hash for the sake of fairness, as I hadn't done so for the block version nor for the Perl implementation.) Notice that there's no time for TALLYHN.R for a very good/bad reason. I wasted considerable time trying to figure out if I had inadvertently broken it!!! I finally used top to verify that, indeed, it was running (and eating up about 0.25 of a CPU -- that's 25% for the REBOL/Viewtopia users ;-). Inserting tracing output as shown below revealed that after adding 380 words, the *hn* version gets stuck, but continues to eat CPU time. In multiple runs it gets stuck at exactly the same place in the data, as illustrated in the following trace: ... added turned : now 754 found a : a 23 added corner : now 756 found oh : oh 3 found my : my 2 added ears : now 758 found and : and 30 added whiskers : now 760 found how : how 5 found late : late 2 found it : it 32 found s : s 4 found getting : getting 3 found she : she 35 I added the same tracing to *hz* and captured the output: ... added whiskers : now 760 found how : how 5 found late : late 2 found it : it 32 found s : s 4 found getting : getting 3 found she : she 35 found was : was 31 found close : close 2 added behind : now 762 ... which shows that *hn* is getting stuck trying to increment the count for the word "was", rather than trying to add a new word (if that matters to anybody or helps identify the nature of the problem). I am enclosing the code below in the hopes that someone will see something silly that I've overlooked. Otherwise I'll have to conclude that there's a bug involving path-based access into a hash! value. 8<--------------------tallyhn.r-------------------- #!/export/home/jneely/bin/rebol -sq REBOL [] text: read %alice.txt tally: make hash! [] alpha: charset [#"a" - #"z"] word: "" parse/all lowercase text [ any [ copy word some alpha ( either here: find/skip tally word 2 [ here/2: here/2 + 1 print ["found" word ":" copy/part here 2] ][ repend tally [word 1] print ["added" word ": now" length? tally] ] ) | skip ] ] foreach [word count] sort/skip tally 2 [ print [count tab word] ] quit 8<------------------------------------------------- Incidentally, don't take the timings *too* terribly seriously; as with all QAD benchmarks. These timings were run on a 4-CPU Sun 4500, but a variety of factors (transient loads, other tasks, stuff still in buffers, etc.) can introduce noticeable jitter in the times for runs as short as these. -jn- -- ; Joel Neely joeldotneelyatfedexdotcom REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]