[REBOL] Re: hash questions
From: larry:ecotope at: 8-Jan-2002 20:30
Hi Holger,
Thanks for your response. It is good that hash works for so many REBOL
datatypes. I note that with a few exceptions the hashable values are of type
any-string, any-word, or the so-called simple types in which the data is
included in the REBOL value.
I have not had a chance to test all of these possiblities. However I
question whether decimal keys in a block of consecutive key-value pairs are
hashable, or more precisely, the benefit of using a hash with decimal keys.
I ran the following test script:
REBOL []
clk: func [][now/time/precise]
;set up blocks and hashes with block values
;and either integer or decimal keys
;measure the time to find the last key in the block
b1: make block! 10000
b2: make block! 10000
repeat j 10000 [
append b1 j append/only b1 reduce [j]
]
repeat j 10000 [
append b2 to-decimal j append/only b2 reduce [j]
]
h1: to-hash b1
h2: to-hash b2
t1: clk loop 1000 [find b1 10000] t1: clk - t1
t2: clk loop 1000 [find b2 10000.0] t2: clk - t2
print ["Find integer key in block:" t1]
print ["Find decimal key in block:" t2]
t1: clk loop 1000 [find h1 10000] t1: clk - t1
t2: clk loop 1000 [find h2 10000.0] t2: clk - t2
print ["Find integer key in hash:" t1]
print ["Find decimal key in hash:" t2]
;end of code
Here are my results for the current versions of core and view (450 MHz PII):
system/version 2.5.0.3.1
Find integer key in block: 0:00:02.04
Find decimal key in block: 0:00:13.4
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:14.66
system/version 1.2.1.3.1
Find integer key in block: 0:00:01.92
Find decimal key in block: 0:00:15.27
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:16.59
It is interesting that the search for a decimal key takes 6 to 8 times
longer than for an integer key.
I noticed that if I change the code so that the decimal search keys are of
the form j / 3 instead of to-decimal j, and then search for 10000 / 3 the
times are much better, especially for the hash:
Find integer key in block: 0:00:02.03
Find decimal key in block: 0:00:09.23
Find integer key in hash: 0:00
Find decimal key in hash: 0:00:04.78
This suggests that comparing decimals which are equal to integers (1000 and
1000.0) requires some extra time. But this does not seem to be confirmed by
a direct test:
>> a: 10000
== 10000
>> b: 10000.0
== 10000
>> c: 10000 / 3
== 3333.33333333333
>> t: clk loop 5'000'000 [equal? a a] clk - t
== 0:00:06.26
>> t: clk loop 5'000'000 [equal? b b] clk - t
== 0:00:09.89
>> t: clk loop 5'000'000 [equal? c c] clk - t
== 0:00:10.11
>From the original example with decimal keys equal to integers, it appears
that using hash with decimal keys actually takes more time for FIND than
when using a block. But with integer keys the hash makes FIND so fast that
the loop iterations have to go up by a factor of 100 inorder to get a
measurable time.
I also see this:
>> t: clk loop 1000 [find h2 10000.0] clk - t
== 0:00:14.01
>> t: clk loop 1000 [find b2 10000.0] clk - t
== 0:00:14.83
>> t: clk loop 1000 [find h2 5000.0] clk - t
== 0:00:14.11
>> t: clk loop 1000 [find b2 5000.0] clk - t
== 0:00:07.36 ;twice as fast as hash
It does seem that the hashes with decimal keys take a time independent of
the location of the search key in the block while FIND on a block searches
linearly from the beginning and thus results in half the time to find a key
half-way through the block. Constant time is certainly a hallmark of a hash,
but the hash time for FIND is so slow for decimal keys equal to integers
that searching for a random key will take on average almost twice as long as
when using a block. For the more representative j / 3 case, the times for
block and hash will be comparable for a random search key, while the worst
case will be a factor of 2 better for the hash.
I guess I don't fully understand REBOL hashes, particularly with respect to
decimal keys. Of the datatypes you mention, which will fail to display the
large speed increase we expect from a hash (and which we see for string and
integer keys) as opposed to a block ?
Any comments are welcome.
-Larry