r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Maxim
18-May-2010
[101]
in the code... 


rsize: 3   is the record size...  like the /skip value in most series 
handlers  

my two funcs will adapt, since you provide it the record size


but ... ehehe, I just realized that find HAS the /skip attribute... 
doh!!! the above can be made MUCH faster still, especially by removing 
the need for the keys (which take a bit of time to generate on large 
lists).
Terry
18-May-2010
[102x2]
my ultimate goal is storing triples..  [a1 a2 a3 b1 b2 b3...] preferably 
strings if that's fast enough

where i can search against any part of the triple , and return the 
whole thing (triple)
i'll only be generating the initial series rarely, but appending 
/changing/ removing often
Maxim
18-May-2010
[104]
ok. doable, with almost the same speed as the find-very-fast   func, 
but without the need for its keys argument.
Terry
18-May-2010
[105]
here's a better example  

dataset: [ "subject" "predicate" "value" "bob" "fname" "Bob" "bob" 
"age" "42" "Maxim" "age" "unknown" ... ]
so the triple is subject, predicate value  

with the ability to say quickly return   every triple  where "predicate" 
(the second part of the triple)  = "age"


i'm doing this now with MySQL as a triple store, but series should 
be much faster
Maxim
18-May-2010
[106]
I've got it done.... just testing it right now.
Terry
18-May-2010
[107]
if strings are quick enough and not much overhead, it's possible 
to use key/value   but the key is made up of subject:predicate ie: 
"maxim:age" "unknown"  so that select/part key ":age" would return 
everything including "unknown"
Maxim
18-May-2010
[108]
using find/skip and an index for what item to search for in the record... 
we get by FAR the fastest search... if you count the fact that generating 
the keys for find-very-fast often takes as long as the search itself, 
in dense searches.

ultimate-find below gives the results of this new function


preparing sparse data
6000005
sparse tests:
feach(): 5.4.3.2.1. -> 0:00:04.89   4 matches found
----------------
find-fast(): 5.4.3.2.1. -> 0:00:00.719   4 matches found
----------------
find-really-fast(): 5.4.3.2.1. -> 0:00:00.234   4 matches found
----------------
ultimate find(): 5.4.3.2.1. -> 0:00:00.594   4 matches found
----------------

dense match with key/value collisions
6000000
feach(): 5.4.3.2.1. -> 0:00:13.343   221606 matches found
----------------
find-fast(): 5.4.3.2.1. -> 0:00:14.813   221606 matches found
----------------

find-really-fast(): 5.4.3.2.1. -> 0:00:07.718   221606 matches found
----------------
ultimate find(): 5.4.3.2.1. -> 0:00:06.735   221606 matches found
----------------

dense match without key/value collisions
6000000
feach(): 5.4.3.2.1. -> 0:00:13.969   222405 matches found
----------------
find-fast(): 5.4.3.2.1. -> 0:00:09.812   222405 matches found
----------------

find-really-fast(): 5.4.3.2.1. -> 0:00:07.672   222405 matches found
----------------
ultimate find(): 5.4.3.2.1. -> 0:00:06.531   222405 matches found
----------------
done
Terry
18-May-2010
[109]
next will be ultra-ultimate XD
Maxim
18-May-2010
[110x2]
ultimate-find: func [
	series
	value

 index "field you want to search on, should be (1 <= index <= record-length)"
	record-length
	i "iterations"
	/local s st result
][
	prin "ultimate find(): "
	st: now/precise
	while [i > 0][
		prin i
		prin "."
		result: clear [] ;length? series
		s: at series index
		until [
			not all [
				s: find/skip s value record-length

    insert tail result copy/part skip s (-1 * index + 1) record-length
				s: skip s 3
			]
		]
		i: i - 1
	]
	
	prin join " -> " difference now/precise st
	print ["  " (length? result) / record-length "matches found"]
	
	head result
]
searching for strings will be slower... probably much slower... just 
try it out with some data  :-D
Terry
18-May-2010
[112]
what does it mean "field you want to search on"?
Maxim
18-May-2010
[113]
what item of your record you want to match against... basically what 
you meant by searching  subject, predicate or value
Terry
18-May-2010
[114]
yeah, so if i say 1 then that's all subjects?
Maxim
18-May-2010
[115]
yep it will match the value against subjects only.
Terry
18-May-2010
[116x4]
no go then
>> dataset

== [6 27744 92191 1 61175 9905 9 62225 72852 7 31935 71556 4 59248

>> ultimate-find dataset 1 1 zz 1
ultimate find(): 1. -> 0:00:00.031   0 matches found
== []
should have picked up that fourth index (1)
this worked.. 
>> ultimate-find dataset 6 1 zz 1
ultimate find(): 1. -> 0:00:00.219   1 matches found
Maxim
18-May-2010
[120]
zz should be 3.
Terry
18-May-2010
[121x2]
oh.. i thought zz was the length? of dataset
ah.. that works GREAT
Maxim
18-May-2010
[123]
>> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 
"y" "o"]
== [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
>> ultimate-find dataset 4 1 3 1
ultimate find(): 1. -> 0:00   4 matches found
== [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
Terry
18-May-2010
[124]
very nice
Maxim
18-May-2010
[125x2]
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]

:-)
oops missing cmd line...
Terry
18-May-2010
[127]
so if the dataset is key/value just use 2 as the record-length
Maxim
18-May-2010
[128x2]
>> ultimate-find dataset "a" 2 3 1
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]
yep
Terry
18-May-2010
[130x3]
cool
i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" 
into the dataset containing 6 million records.. 

>> ultimate-find dataset "age"  2 3 1
ultimate find(): 1. -> 0:00:00.093   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]
I'll say that's a respectable time... and the leading contestant 
:)
Maxim
18-May-2010
[133]
:-)
Terry
18-May-2010
[134x4]
now if only i was 42 again...
But wait, there's more.... 

convert dataset to hash! and run ultimate-find again!
>> ultimate-find dataset "age"  2 3 100
ultimate find():  -> 0:00   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]

100 iterations not even registering
1000 iterations 0.40
Maxim
18-May-2010
[138]
OMG !
Terry
18-May-2010
[139]
exactly
Maxim
18-May-2010
[140x2]
but I'm getting an odd deadlock here on some tests... hum...
I'm getting extremely slow results on dense tests...
Terry
18-May-2010
[142x2]
interesting... im not too worried as density isn't a big issue with 
triple stores
im off.. good luck with your optimizations
Maxim
18-May-2010
[144]
I'm talking like 100 times worse!   the larger the list the worse 
it gets... seems like an exponential issue.
Terry
18-May-2010
[145]
that seems like an anomaly
Maxim
18-May-2010
[146]
both dense tests perform pretty much the same, the moment I convert 
it to a hash, it gets reallllly slow.
Terry
18-May-2010
[147x2]
yeah, i see that too
mind you, that's pretty dense data
Maxim
18-May-2010
[149]
the strange thing is i did tests using a record size of 2, which 
wouldn't trigger strange mis aligned key/value issues.  I even removed 
the copy to make sure that wasn't the issue and one test with only 
400000 records took more than 4 minutes to complete vs .297 for the 
feach test!
Terry
18-May-2010
[150]
I'm looking for the 6 integer.. it's still cranking and i can hear 
my system struggling..