World: r3wp

Join the discussions in the REBOL3 world...

[Profiling] Rebol code optimisation and algorithm comparisons.

older newer	first last
Maxim 18-May-2010 [101]	in the code... rsize: 3 is the record size... like the /skip value in most series handlers my two funcs will adapt, since you provide it the record size but ... ehehe, I just realized that find HAS the /skip attribute... doh!!! the above can be made MUCH faster still, especially by removing the need for the keys (which take a bit of time to generate on large lists).
Terry 18-May-2010 [102x2]	my ultimate goal is storing triples.. [a1 a2 a3 b1 b2 b3...] preferably strings if that's fast enough where i can search against any part of the triple , and return the whole thing (triple)
Terry 18-May-2010 [102x2]	i'll only be generating the initial series rarely, but appending /changing/ removing often
Maxim 18-May-2010 [104]	ok. doable, with almost the same speed as the find-very-fast func, but without the need for its keys argument.
Terry 18-May-2010 [105]	here's a better example dataset: [ "subject" "predicate" "value" "bob" "fname" "Bob" "bob" "age" "42" "Maxim" "age" "unknown" ... ] so the triple is subject, predicate value with the ability to say quickly return every triple where "predicate" (the second part of the triple) = "age" i'm doing this now with MySQL as a triple store, but series should be much faster
Maxim 18-May-2010 [106]	I've got it done.... just testing it right now.
Terry 18-May-2010 [107]	if strings are quick enough and not much overhead, it's possible to use key/value but the key is made up of subject:predicate ie: "maxim:age" "unknown" so that select/part key ":age" would return everything including "unknown"
Maxim 18-May-2010 [108]	using find/skip and an index for what item to search for in the record... we get by FAR the fastest search... if you count the fact that generating the keys for find-very-fast often takes as long as the search itself, in dense searches. ultimate-find below gives the results of this new function preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.89 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.719 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.234 4 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:00.594 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.343 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.813 221606 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.718 221606 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:06.735 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.969 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:09.812 222405 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.672 222405 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:06.531 222405 matches found ---------------- done
Terry 18-May-2010 [109]	next will be ultra-ultimate XD
Maxim 18-May-2010 [110x2]	ultimate-find: func [ series value index "field you want to search on, should be (1 <= index <= record-length)" record-length i "iterations" /local s st result ][ prin "ultimate find(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: at series index until [ not all [ s: find/skip s value record-length insert tail result copy/part skip s (-1 * index + 1) record-length s: skip s 3 ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ]
Maxim 18-May-2010 [110x2]	searching for strings will be slower... probably much slower... just try it out with some data :-D
Terry 18-May-2010 [112]	what does it mean "field you want to search on"?
Maxim 18-May-2010 [113]	what item of your record you want to match against... basically what you meant by searching subject, predicate or value
Terry 18-May-2010 [114]	yeah, so if i say 1 then that's all subjects?
Maxim 18-May-2010 [115]	yep it will match the value against subjects only.
Terry 18-May-2010 [116x4]	no go then
	>> dataset == [6 27744 92191 1 61175 9905 9 62225 72852 7 31935 71556 4 59248 >> ultimate-find dataset 1 1 zz 1 ultimate find(): 1. -> 0:00:00.031 0 matches found == []
	should have picked up that fourth index (1)
	this worked.. >> ultimate-find dataset 6 1 zz 1 ultimate find(): 1. -> 0:00:00.219 1 matches found
Maxim 18-May-2010 [120]	zz should be 3.
Terry 18-May-2010 [121x2]	oh.. i thought zz was the length? of dataset
Terry 18-May-2010 [121x2]	ah.. that works GREAT
Maxim 18-May-2010 [123]	>> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] == [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] >> ultimate-find dataset 4 1 3 1 ultimate find(): 1. -> 0:00 4 matches found == [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
Terry 18-May-2010 [124]	very nice
Maxim 18-May-2010 [125x2]	ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"] :-)
Maxim 18-May-2010 [125x2]	oops missing cmd line...
Terry 18-May-2010 [127]	so if the dataset is key/value just use 2 as the record-length
Maxim 18-May-2010 [128x2]	>> ultimate-find dataset "a" 2 3 1 ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"]
Maxim 18-May-2010 [128x2]	yep
Terry 18-May-2010 [130x3]	cool
	i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" into the dataset containing 6 million records.. >> ultimate-find dataset "age" 2 3 1 ultimate find(): 1. -> 0:00:00.093 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"]
	I'll say that's a respectable time... and the leading contestant :)
Maxim 18-May-2010 [133]	:-)
Terry 18-May-2010 [134x4]	now if only i was 42 again...
	But wait, there's more.... convert dataset to hash! and run ultimate-find again!
	>> ultimate-find dataset "age" 2 3 100 ultimate find(): -> 0:00 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"] 100 iterations not even registering
	1000 iterations 0.40
Maxim 18-May-2010 [138]	OMG !
Terry 18-May-2010 [139]	exactly
Maxim 18-May-2010 [140x2]	but I'm getting an odd deadlock here on some tests... hum...
Maxim 18-May-2010 [140x2]	I'm getting extremely slow results on dense tests...
Terry 18-May-2010 [142x2]	interesting... im not too worried as density isn't a big issue with triple stores
Terry 18-May-2010 [142x2]	im off.. good luck with your optimizations
Maxim 18-May-2010 [144]	I'm talking like 100 times worse! the larger the list the worse it gets... seems like an exponential issue.
Terry 18-May-2010 [145]	that seems like an anomaly
Maxim 18-May-2010 [146]	both dense tests perform pretty much the same, the moment I convert it to a hash, it gets reallllly slow.
Terry 18-May-2010 [147x2]	yeah, i see that too
Terry 18-May-2010 [147x2]	mind you, that's pretty dense data
Maxim 18-May-2010 [149]	the strange thing is i did tests using a record size of 2, which wouldn't trigger strange mis aligned key/value issues. I even removed the copy to make sure that wasn't the issue and one test with only 400000 records took more than 4 minutes to complete vs .297 for the feach test!
Terry 18-May-2010 [150]	I'm looking for the 6 integer.. it's still cranking and i can hear my system struggling..
older newer	first last