Revised associative data storage code
[1/4] from: joel::neely::fedex::com at: 22-Sep-2000 17:20
This is a multi-part message in MIME format.
--------------7B2D0141EEB652ADAD6A733C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
Thanks to Gabriele and others, who re-lit my dim bulb to the fact that
find block1 block2 select block1 block2
are looking for the first SUBSEQUENCE of the first argument equal to
the second argument, whereas
find/only block1 block2 select/only block1 block2
are looking for the first ELEMENT of the first argument equal to the
second argument.
That loud "smack" you may have heard a few nights ago was my palm
smartly hitting my not-so-smart forehead as I uttered those immortal
words, "Silly me!"
Having gotten that far(with a little help from my friends), I did a
bit of testing and found that find/only and select/only seem
quite willing to work properly with non-block second arguments as
well! With that discovery firmly in hand, I put together an updated
object-oriented associative data store, attached below.
Enjoy all, and (as always) comments/feedback are welcome!
-jn-
--------------7B2D0141EEB652ADAD6A733C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii;
name="assoco.r"
Content-Disposition: inline;
filename="assoco.r"
REBOL [
Title: "Minimal Associative Data Store"
File: %assoco.r
Date: 20-Sep-2000
Author: "Joel Neely"
Purpose: {
Revised implementation of associative data store,
using arbitrary data types for keys and values.
Updated to use /only for key processing.
}
]
assoco: make object! [
_keys: copy []
_vals: copy []
clear: func [] [_keys: copy [] _vals: copy []]
empty?: func [] [system/words/empty? _keys]
length?: func [] [system/words/length? _keys]
new: func [/local r] [r: make self [] r/clear r]
put: func [k [any-type!] v [any-type!] /local p] [
either found? p: find/only _keys k [
change/only at _vals index? p v
][
append/only _keys k
append/only _vals v
]
v
]
get: func [k [any-type!] /local p] [
either none? p: find/only _keys k [
none
][
pick _vals index? p
]
]
delete: func [k [any-type!] /local p v] [
either none? p: find/only _keys k [
none
][
k: pick _vals p: index? p
remove at _keys p
remove at _vals p
k
]
]
keys: func [] [copy _keys]
]
--------------7B2D0141EEB652ADAD6A733C--
[2/4] from: joel::neely::fedex::com at: 22-Sep-2000 17:22
Associative data storage function library
This is a multi-part message in MIME format.
--------------EFFDA6170AF07702FFAE4693
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
As a follow-up to the previous post, I also put together a function
library equivalent to the OO version. (I know this is reinventing
the wheel, but I wanted total equivalence for reasons that will be
made clear in the following post.)
-jn-
--------------EFFDA6170AF07702FFAE4693
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii;
name="assocf.r"
Content-Disposition: inline;
filename="assocf.r"
REBOL [
Title: "Minimal Associative Data Store - Function Version"
File: %assocf.r
Date: 20-Sep-2000
Author: "Joel Neely"
Purpose: {
Function-based implementation of associative data store,
using object for data only, with arbitrary data types
for keys and values.
}
]
assoc.object: make object! [
_keys: copy []
_vals: copy []
]
assoc.clear: func [ads [object!]] [
ads/_keys: copy []
ads/_vals: copy []
ads
]
assoc.empty?: func [ads [object!]] [system/words/empty? ads/_keys]
assoc.length?: func [ads [object!]] [system/words/length? ads/_keys]
assoc.new: func [/local r] [r: make assoc.object [] assoc.clear r]
assoc.put: func [ads [object!] k [any-type!] v [any-type!] /local p] [
either found? p: find/only ads/_keys k [
change/only at ads/_vals index? p v
][
append/only ads/_keys k
append/only ads/_vals v
]
v
]
assoc.get: func [ads [object!] k [any-type!] /local p] [
either none? p: find/only ads/_keys k [
none
][
pick ads/_vals index? p
]
]
assoc.delete: func [ads [object!] k [any-type!] /local p v] [
either none? p: find/only ads/_keys k [
none
][
k: pick ads/_vals p: index? p
remove at ads/_keys p
remove at ads/_vals p
k
]
]
assoc.keys: func [ads [object!]] [copy ads/_keys]
--------------EFFDA6170AF07702FFAE4693--
[3/4] from: joel::neely::fedex::com at: 22-Sep-2000 18:26
Associative Data Storage benchmark
This is a multi-part message in MIME format.
--------------C52C0FF918329894F180FE9C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
I was interested in assessing the overhead cost of an OO version
of ADS versus a function-library version, and so wrote the
benchmarking script attached below.
It creates a block of key strings used by all of the tests (which
also all re-seed the random number generator to the same starting
point to ensure equivalent behavior). This block of keys is used
by four timed loops which follow:
1) The first loop does no useful work, but is used to give an
indication of the cost of the test harness itself. It cycles
a specified number of times, drawing a key at random from the key
block created before the timing began.
2) The second loop does the same thing, but also updates a count
of the number of times that key has been drawn. This loop
uses a simple block to store key-value pairs. It utilizes plain
vanilla append...reduce , find , and change as the working
mechanisms to manage the ADS.
3) The third loop does the same thing, but uses the OO approach
coded in %assoco.r (previously posted), with /new , /put ,
and /get method calls to manage the counts.
4) The fourth loop does the same, but uses the function library
coded in %assocf.r (previously posted), etc. etc. etc.
(The reason I coded %assocf.r myself, instead of using one of
the previously-posted submissions from others, was to try to
focus on the OO-vs-function issue as the only difference between
the two tested versions. Thus I wanted all the innards to be as
much alike as possible, to make a fair test of that single issue.)
The testassocs/run function takes two arguments, the number of
keys to manufacture for the key block (from which all subsequent
testing is drawn), and the number of cycles (keys drawn) in each
of the subsequent loops.
The outputs are:
1) The number of keys actually used. Remember that we're drawing
at random, so unless the number of cycles is substantially
larger than the key pool, there's a good chance that some keys
won't be drawn.
2) The elapsed time (in seconds -- I'm calling now at the edges
of all loops) of each of the four loops above.
3) The time per cycle for each of the four loops (seconds divided
by cycles). If I REALLY cared about the real-time values, I'd
scale them to microseconds, but that's obviously dependent on the
system running the benchmark (and its load). I was much more
interested in...
4) The ratios of the times, where the simple-block-based version
is the normalization base. All other figures are relative to
the block version, as 1.0.
RESULTS:
After that long wind-up...
Relative
Loop time Comments
==== ======== ====================================================
1 0.66% We can "do nothing" really fast! ;-) This means
that the test harness didn't distort the comparisons.
2 100.00% The block version.
3 174.17% So the cost of key-value separation, keeping all the
methods out of the global namespace, etc. is an
overhead of about 75% -- not intolerable for most
purposes.
4 178.15% This result makes me suspect that accessing the
components of an object is the operation that's
driving the runtime cost.
==== ======== ====================================================
More testing (a functional version that handles key-value separation
without using an object for data storage) is clearly in order, but I
thought I'd pass on those preliminary results.
-jn-
--------------C52C0FF918329894F180FE9C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii;
name="testassocs.r"
Content-Disposition: inline;
filename="testassocs.r"
REBOL []
do %assoco.r
do %assocf.r
testassocs: make object! [
vowels: "aeiouwy"
aspirants: ["f" "v" "th" "dh" "ch" "kh"]
liquids: ["l" "r" ""]
continuants: "lmnrsz"
stops: "bptdkg"
consonants: "bcdfghjklmnpqrstvxz"
randomseed: 0
randomitem: func [s [series!]] [
pick s random length? s
]
run: function [
nblk [integer!] ncyc [integer!]
][
t0 t1 t2 t3 t4
key kblk p
a0 a1 a2
][
randomseed: random now
kblk: copy []
while [nblk > length? kblk] [
key: copy ""
append key randomitem aspirants
append key randomitem liquids
append key randomitem vowels
append key randomitem continuants
append key randomitem vowels
append key randomitem stops
if none? find kblk key [append kblk key]
]
t0: now/time
random/seed randomseed
loop ncyc [
key: randomitem kblk
]
t1: now/time
a0: copy []
random/seed randomseed
loop ncyc [
key: randomitem kblk
either none? p: find a0 key [
append a0 reduce [key 1]
][
p: next p
change p 1 + first p
]
]
t2: now/time
a1: assoco/new
random/seed randomseed
loop ncyc [
key: randomitem kblk
a1/put key 1 + any [a1/get key 0]
]
t3: now/time
a2: assoc.new
random/seed randomseed
loop ncyc [
key: randomitem kblk
assoc.put a2 key 1 + any [assoc.get a2 key 0]
]
t4: now/time
comment {
foreach key a1/keys [
print [key ": " select a0 key "," a1/get key]
]
}
print [length? a1/keys "keys used^/"]
t0: t1 - t0
t0: t0/hour * 60 + t0/minute * 60 + t0/second
t1: t2 - t1
t1: t1/hour * 60 + t1/minute * 60 + t1/second
t2: t3 - t2
t2: t2/hour * 60 + t2/minute * 60 + t2/second
t3: t4 - t3
t3: t3/hour * 60 + t3/minute * 60 + t3/second
print ["Totals: " t0 ":" t1 ":" t2 ":" t3 "^/"]
t0: t0 / ncyc
t1: t1 / ncyc
t2: t2 / ncyc
t3: t3 / ncyc
print ["Times: " t0 ":" t1 ":" t2 ":" t3 "^/"]
print ["Ratio: " t0 / t1 ":" t1 / t1 ":" t2 / t1 ":" t3 / t1 "^/"]
]
]
--------------C52C0FF918329894F180FE9C--
[4/4] from: joel:neely:fedex at: 23-Sep-2000 15:56
This is a multi-part message in MIME format.
--------------F3AFD1FB66DE3A994B5A800F
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii
By way of followup, I coded a routine for this benchmark that uses
functions only, with the data stored in a block initialized to
[ _keys [] _vals [] ]
That block is passed as an argument to the functions that access or
modify keys and values; the functions access the two sections of the
block via paths. (Source code attached, if anyone cares to critique.)
The ratios for a small collection of tests are:
Key Nbr of Test In-line Object Functns Functns
Space Testing Harness Block Method w/ Obj w/ Path
Size Cycles Loop Version Version Storage Storage
------ ------- ------- ------- ------- ------- -------
10,000 5,000 0.00% 100.00% 177.50% 175.00% 177.50%
1,000 50,000 1.92% 100.00% 183.65% 190.38% 191.35%
10,000 50,000 0.22% 100.00% 187.96% 192.34% 192.56%
100 500,000 10.31% 100.00% 214.95% 234.54% 243.81%
Another surprise! It now appears to me that function entry and
exit (whether object methods or "bare" functions) are the driving
factor in the timings.
My current conclusion is that REBOL objects are essentially tied
with plain function calls, in terms of the cost/benefit tradeoff
of runtime vs. readability and economy of expression. I think
(unless one is using such a large number of objects that the
duplication overhead is a factor) that the benefits of namespace
management and potential improvements in reusability give objects
the edge.
-jn-
[joel--neely--fedex--com] wrote:
[snip]
> Relative
> Loop time Comments
<<quoted lines omitted: 12>>
> without using an object for data storage) is clearly in order, but I
> thought I'd pass on those preliminary results.
--------------F3AFD1FB66DE3A994B5A800F
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii;
name="adsf2.r"
Content-Disposition: inline;
filename="adsf2.r"
REBOL [
Title: "Minimal Associative Data Store - Function Version 2"
File: %adsf2.r
Date: 20-Sep-2000
Author: "Joel Neely"
Purpose: {
Function-based implementation of associative data store,
using object for data only, with arbitrary data types
for keys and values.
}
]
adsf2.block: func [] [
copy/deep [
_keys []
_vals []
]
]
adsf2.clear: func [ads [block!]] [
ads/_keys: copy []
ads/_vals: copy []
ads
]
adsf2.empty?: func [ads [block!]] [system/words/empty? ads/_keys]
adsf2.length?: func [ads [block!]] [system/words/length? ads/_keys]
adsf2.new: func [] [adsf2.block]
adsf2.put: func [ads [block!] k [any-type!] v [any-type!] /local p] [
either found? p: find/only ads/_keys k [
change/only at ads/_vals index? p v
][
append/only ads/_keys k
append/only ads/_vals v
]
v
]
adsf2.get: func [ads [block!] k [any-type!] /local p] [
either none? p: find/only ads/_keys k [
none
][
pick ads/_vals index? p
]
]
adsf2.delete: func [ads [block!] k [any-type!] /local p v] [
either none? p: find/only ads/_keys k [
none
][
k: pick ads/_vals p: index? p
remove at ads/_keys p
remove at ads/_vals p
k
]
]
adsf2.keys: func [ads [block!]] [copy ads/_keys]
--------------F3AFD1FB66DE3A994B5A800F--
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted