Mailing List Archive: Associative Data Storage benchmark

[REBOL] Associative Data Storage benchmark

From: joel::neely::fedex::com at: 22-Sep-2000 18:26


This is a multi-part message in MIME format.
--------------C52C0FF918329894F180FE9C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii

I was interested in assessing the overhead cost of an OO version
of ADS versus a function-library version, and so wrote the
benchmarking script attached below.

It creates a block of key strings used by all of the tests (which
also all re-seed the random number generator to the same starting
point to ensure equivalent behavior).  This block of keys is used
by four timed loops which follow:

1)  The first loop does no useful work, but is used to give an
    indication of the cost of the test harness itself.  It cycles
a specified number of times, drawing a key at random from the key
block created before the timing began.

2)  The second loop does the same thing, but also updates a count
    of the number of times that key has been drawn.  This loop
uses a simple block to store key-value pairs.  It utilizes plain
vanilla  append...reduce ,  find , and  change  as the working
mechanisms to manage the ADS.

3)  The third loop does the same thing, but uses the OO approach
    coded in  %assoco.r  (previously posted), with  /new ,  /put ,
and  /get  method calls to manage the counts.

4)  The fourth loop does the same, but uses the function library
    coded in  %assocf.r  (previously posted), etc. etc. etc.

(The reason I coded  %assocf.r  myself, instead of using one of
the previously-posted submissions from others, was to try to
focus on the OO-vs-function issue as the only difference between
the two tested versions.  Thus I wanted all the innards to be as
much alike as possible, to make a fair test of that single issue.)

The  testassocs/run  function takes two arguments, the number of
keys to manufacture for the key block (from which all subsequent
testing is drawn), and the number of cycles (keys drawn) in each
of the subsequent loops.

The outputs are:

1)  The number of keys actually used.  Remember that we're drawing
    at random, so unless the number of cycles is substantially
larger than the key pool, there's a good chance that some keys
won't be drawn.

2)  The elapsed time (in seconds -- I'm calling  now  at the edges
    of all loops) of each of the four loops above.

3)  The time per cycle for each of the four loops (seconds divided
    by cycles).  If I REALLY cared about the real-time values, I'd
scale them to microseconds, but that's obviously dependent on the
system running the benchmark (and its load).  I was much more
interested in...

4)  The ratios of the times, where the simple-block-based version
    is the normalization base.  All other figures are relative to
the block version, as 1.0.

RESULTS:

After that long wind-up...

      Relative
Loop    time    Comments
====  ========  ====================================================
  1      0.66%  We can "do nothing" really fast! ;-)  This means
                that the test harness didn't distort the comparisons.

  2    100.00%  The block version.

  3    174.17%  So the cost of key-value separation, keeping all the
                methods out of the global namespace, etc. is an
                overhead of about 75% -- not intolerable for most
                purposes.

  4    178.15%  This result makes me suspect that accessing the
                components of an object is the operation that's
                driving the runtime cost.
====  ========  ====================================================

More testing (a functional version that handles key-value separation
without using an object for data storage) is clearly in order, but I
thought I'd pass on those preliminary results.

-jn-
--------------C52C0FF918329894F180FE9C
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=us-ascii;
 name="testassocs.r"
Content-Disposition: inline;
 filename="testassocs.r"

REBOL []

do %assoco.r
do %assocf.r

testassocs: make object! [
    vowels:      "aeiouwy"
    aspirants:   ["f" "v" "th" "dh" "ch" "kh"]
    liquids:     ["l" "r" ""]
    continuants: "lmnrsz"
    stops:       "bptdkg"
    consonants:  "bcdfghjklmnpqrstvxz"

    randomseed: 0
    randomitem: func [s [series!]] [
        pick s random length? s
    ]

    run: function [
        nblk [integer!]   ncyc [integer!]
    ][
        t0 t1 t2 t3 t4
	key kblk p
	a0 a1 a2
    ][
        randomseed: random now
	kblk: copy []
	while [nblk > length? kblk] [
	    key: copy ""
	    append key randomitem aspirants
	    append key randomitem liquids
	    append key randomitem vowels
	    append key randomitem continuants
	    append key randomitem vowels
	    append key randomitem stops
	    if none? find kblk key [append kblk key]
	]
	t0: now/time
        random/seed randomseed
	loop ncyc [
	    key: randomitem kblk
	]
	t1: now/time
	a0: copy []
        random/seed randomseed
	loop ncyc [
	    key: randomitem kblk
	    either none? p: find a0 key [
	        append a0 reduce [key 1]
	    ][
	        p: next p
		change p 1 + first p
	    ]
	]
	t2: now/time
	a1: assoco/new
	random/seed randomseed
	loop ncyc [
	    key: randomitem kblk
	    a1/put key 1 + any [a1/get key  0]
	]
	t3: now/time
	a2: assoc.new
	random/seed randomseed
	loop ncyc [
	    key: randomitem kblk
	    assoc.put a2 key 1 + any [assoc.get a2 key  0]
	]
	t4: now/time
	comment {
	foreach key a1/keys [
	    print [key ": " select a0 key "," a1/get key]
	]
	}
	print [length? a1/keys "keys used^/"]
	t0: t1 - t0
	t0: t0/hour * 60 + t0/minute * 60 + t0/second
	t1: t2 - t1
	t1: t1/hour * 60 + t1/minute * 60 + t1/second
	t2: t3 - t2
	t2: t2/hour * 60 + t2/minute * 60 + t2/second
	t3: t4 - t3
	t3: t3/hour * 60 + t3/minute * 60 + t3/second
	print ["Totals: " t0 ":" t1 ":" t2 ":" t3 "^/"]
	t0: t0 / ncyc
	t1: t1 / ncyc
	t2: t2 / ncyc
	t3: t3 / ncyc
	print ["Times: " t0 ":" t1 ":" t2 ":" t3 "^/"]
	print ["Ratio: " t0 / t1 ":" t1 / t1 ":" t2 / t1 ":" t3 / t1 "^/"]
    ]
]

--------------C52C0FF918329894F180FE9C--