World: r3wp
[Profiling] Rebol code optimisation and algorithm comparisons.
older newer | first last |
BrianH 30-Oct-2009 [83] | The plus side is that what you would just do, before you try to optimize, tends to be faster in R3 than it is in R2. |
Maxim 30-Oct-2009 [84] | yeah amd with the faster fuunction calling in R3 liquid should get an immediate boost in speed... it does A LOT of tiny function calls to manage the graph as if they where first level types with accessors instead of using a controler which loops over the graphs and limits flexibility. |
Maxim 17-May-2010 [85x9] | . |
just for Terry..... rebol [] feach: func[ series value i "iterations" /local result st ][ prin "feach(): " st: now/precise x: length? series while [i > 0][ prin i prin "." result: make series 0 foreach[s p v] series [if s = value [append result reduce [s p v]]] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / 3 "matches found"] result ] find-fast: func [ series value record-length i "iterations" /local result st s ][ prin "find-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head series until [ not all[ s: find/tail s value either 0 = ( (-2 + index? s ) // record-length) [ insert tail result copy/part series record-length not tail? s: skip s record-length ][ not tail? s: next s ] ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] find-really-fast: func [ series keys value record-length i "iterations" /local result st s ][ prin "find-really-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head keys until [ not all [ s: find/tail s value insert tail result copy/part at series (record-length * ( -1 + index? s )) record-length ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] test-size: 2000000 ; number of records iterations: 5 rsize: 3 print "preparing sparse data" dataset: make block! test-size loop test-size [ ; we exclude 9 from dataset loop rsize [ dataset: insert dataset random 8 ] ] dataset: head dataset insert at dataset (100 * rsize + 1) 9 insert at dataset (2000 * rsize + 1) 9 insert at dataset (50000 * rsize + 1) 9 insert at dataset (10000 * rsize + 1) 9 insert at dataset (20000 * rsize + 1) 9 probe length? dataset print "sparse tests:" feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" print "^/dense match with key/value collisions" dataset: make block! test-size loop test-size [ ; we include 9 in the dataset loop rsize [ dataset: insert dataset random 9 ] ] dataset: head dataset probe length? dataset feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" print "^/dense match without key/value collisions" dataset: make block! test-size loop test-size [ ; we include 9 in the dataset dataset: insert dataset random 9 loop (rsize - 1) [ dataset: insert dataset 10 + random 100000 ] ] dataset: head dataset probe length? dataset feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" ask "done" | |
the results speak for themselves. preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.968 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.735 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.234 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.328 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.375 181756 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.453 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.734 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:08.469 200322 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.516 222405 matches found ---------------- done | |
if you are willing to keep a version of the keys accessible at run-time, then the keyed search algorithm is MUCH faster, but incurs memory overhead, and the creation of the keys, on big blocks, is not negligible (time not included in tests, but will generally put the keyed algorythm slower than the fast-find) | |
also note that I am REUSING the result block at each call, so run times on first calls are actullaly slower than on successive calls. | |
noticed a little discrepancy in results... strange... probably cause by a last minute change... | |
ah found it.... | |
the series was being skipped one too many (since I'm using /tail on the find, as suggested by andreas) find-fast: func [ series value record-length i "iterations" /local result st s ][ prin "find-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head series until [ not all[ s: find/tail s value either 0 = ( (-2 + index? s ) // record-length) [ insert tail result copy/part series record-length not tail? s: skip s record-length ][ not tail? s: next s ] ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] | |
results with this function fixed are similar... preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.89 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.719 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.25 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.328 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.843 221606 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.5 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.843 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:09.672 222405 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.375 222405 matches found ---------------- done | |
Terry 18-May-2010 [94x2] | Thanks Maxim |
just trying to grok it :) so 2,000,000 keys against 6, 000,000 values? | |
Maxim 18-May-2010 [96x2] | setup as records of 3 items, 1 key, 2 values. |
3 items (1 key, 2 values). | |
Terry 18-May-2010 [98x3] | 2 values? |
not following ( as usual) | |
a typical hash (for example is [key1 value1 key2 value2] .. one key, one value | |
Maxim 18-May-2010 [101] | in the code... rsize: 3 is the record size... like the /skip value in most series handlers my two funcs will adapt, since you provide it the record size but ... ehehe, I just realized that find HAS the /skip attribute... doh!!! the above can be made MUCH faster still, especially by removing the need for the keys (which take a bit of time to generate on large lists). |
Terry 18-May-2010 [102x2] | my ultimate goal is storing triples.. [a1 a2 a3 b1 b2 b3...] preferably strings if that's fast enough where i can search against any part of the triple , and return the whole thing (triple) |
i'll only be generating the initial series rarely, but appending /changing/ removing often | |
Maxim 18-May-2010 [104] | ok. doable, with almost the same speed as the find-very-fast func, but without the need for its keys argument. |
Terry 18-May-2010 [105] | here's a better example dataset: [ "subject" "predicate" "value" "bob" "fname" "Bob" "bob" "age" "42" "Maxim" "age" "unknown" ... ] so the triple is subject, predicate value with the ability to say quickly return every triple where "predicate" (the second part of the triple) = "age" i'm doing this now with MySQL as a triple store, but series should be much faster |
Maxim 18-May-2010 [106] | I've got it done.... just testing it right now. |
Terry 18-May-2010 [107] | if strings are quick enough and not much overhead, it's possible to use key/value but the key is made up of subject:predicate ie: "maxim:age" "unknown" so that select/part key ":age" would return everything including "unknown" |
Maxim 18-May-2010 [108] | using find/skip and an index for what item to search for in the record... we get by FAR the fastest search... if you count the fact that generating the keys for find-very-fast often takes as long as the search itself, in dense searches. ultimate-find below gives the results of this new function preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.89 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.719 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.234 4 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:00.594 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.343 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.813 221606 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.718 221606 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:06.735 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.969 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:09.812 222405 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.672 222405 matches found ---------------- ultimate find(): 5.4.3.2.1. -> 0:00:06.531 222405 matches found ---------------- done |
Terry 18-May-2010 [109] | next will be ultra-ultimate XD |
Maxim 18-May-2010 [110x2] | ultimate-find: func [ series value index "field you want to search on, should be (1 <= index <= record-length)" record-length i "iterations" /local s st result ][ prin "ultimate find(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: at series index until [ not all [ s: find/skip s value record-length insert tail result copy/part skip s (-1 * index + 1) record-length s: skip s 3 ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] |
searching for strings will be slower... probably much slower... just try it out with some data :-D | |
Terry 18-May-2010 [112] | what does it mean "field you want to search on"? |
Maxim 18-May-2010 [113] | what item of your record you want to match against... basically what you meant by searching subject, predicate or value |
Terry 18-May-2010 [114] | yeah, so if i say 1 then that's all subjects? |
Maxim 18-May-2010 [115] | yep it will match the value against subjects only. |
Terry 18-May-2010 [116x4] | no go then |
>> dataset == [6 27744 92191 1 61175 9905 9 62225 72852 7 31935 71556 4 59248 >> ultimate-find dataset 1 1 zz 1 ultimate find(): 1. -> 0:00:00.031 0 matches found == [] | |
should have picked up that fourth index (1) | |
this worked.. >> ultimate-find dataset 6 1 zz 1 ultimate find(): 1. -> 0:00:00.219 1 matches found | |
Maxim 18-May-2010 [120] | zz should be 3. |
Terry 18-May-2010 [121x2] | oh.. i thought zz was the length? of dataset |
ah.. that works GREAT | |
Maxim 18-May-2010 [123] | >> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] == [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] >> ultimate-find dataset 4 1 3 1 ultimate find(): 1. -> 0:00 4 matches found == [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"] |
Terry 18-May-2010 [124] | very nice |
Maxim 18-May-2010 [125x2] | ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"] :-) |
oops missing cmd line... | |
Terry 18-May-2010 [127] | so if the dataset is key/value just use 2 as the record-length |
Maxim 18-May-2010 [128x2] | >> ultimate-find dataset "a" 2 3 1 ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"] |
yep | |
Terry 18-May-2010 [130x3] | cool |
i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" into the dataset containing 6 million records.. >> ultimate-find dataset "age" 2 3 1 ultimate find(): 1. -> 0:00:00.093 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"] | |
I'll say that's a respectable time... and the leading contestant :) | |
older newer | first last |