World: r3wp
[Profiling] Rebol code optimisation and algorithm comparisons.
older newer | first last |
Maxim 30-Oct-2009 [57x3] | I did note, that there is a HUGE memory leak which occured probably in the actual benchmark procedure itself. although I keep no reference to any of the data or transient test blocks and funcs, they are kept somewhere, and my rebol.exe process keeps growing and growing.... I caught it at 500MB !! but it didn't do any difference in actual speeds... after a few tests.... cause i was a bit scared. |
this will also have to be investigated further (the leak) | |
I tried manually recycling... but it didn't do anything. | |
Steeve 30-Oct-2009 [60] | what do you mean ?, it does it here: >> recycle s: stats loop 1000000 [foreach a [1 2 3][a: a]] print stats - s recycle print stats - s 1569504 ;memory allocated by the loop -320 ; after the recycle |
Maxim 30-Oct-2009 [61] | >> stats == 541502965 >> recycle >> stats == 272784493 but that's just for about 10 % of the tests... the more tests I do the more ram stays "stuck" somewhere inside the interpreter. |
Steeve 30-Oct-2009 [62x3] | yes, i noticed that too, it's a probem with R2 |
R3 is better with that | |
and if you activate recycle/on, does that make any difference ? | |
Maxim 30-Oct-2009 [65] | I think R2 GC can't determine co-dependent unused references... in some situations. ex: blk: reduce [ a: context [b: none] b: context [c: a] a/b: b ] blk: none in this case both a and b point to each other, and clearing blk doesn't tell a or b that they aren't used anymore... that is my guess. |
Steeve 30-Oct-2009 [66] | yep, but your tests seem not having such cases |
Maxim 30-Oct-2009 [67x3] | I reduce a block which is the test... and since foreach copy/deep, and there is NO word ever refering to the content of the refered block, I think the contents of the blocks prevent the blocks and the data they contain from being collected... the block contains words which are not GC counted as zero reference, so nothing gets de-allocated... that's just my guess. |
not sure I'm making sense... in how I explain it. | |
in any case I want to build a single script which does all the tests, statistics, and eventually graphics and html pages of all results in one (VERY) long process. so I can better control how the tests are done and prevent automated test creation as I am doing now. | |
sqlab 30-Oct-2009 [70] | M: looks more like 50 thousands than 50 millions for repeat. So take care of your powers of 10 |
Maxim 30-Oct-2009 [71x3] | as indicated in the document introductions... the repeat test is: loop ops [repeat i 1000 []] so with 100000 ops taking near 2 seconds. we end up with: 100,000 * 1000 / 2 = (50 million loops / second) |
but I did miscalculate the remove-each MB/second create/scan/erase cycle... its not 100MB... its 10MB. | |
updated document. | |
BrianH 30-Oct-2009 [74] | Thanks for the info, Maxim. We can do a little deduction from that data to guess how REBOL is implemented. The scientific method :) |
Maxim 30-Oct-2009 [75] | for my 3D engine, this base line test was neccessary. I need to squeze every hz out of rebol... its nice to see how some exit conditions are 10-15% faster in some equivalebt tests... who would have tought that 'PICK was faster than NOT TAIL ? :-/ |
BrianH 30-Oct-2009 [76] | For instance, 1000 > i would be faster than i < 1000 because ops redirect to actions, and actions dispatch based on the type of their first argument. If the first argument is a literal value, the action of that type can be called directly. If it is a referenced value, it woulkd have to be dereferenced first, which is apparently slower. As for PICK being faster than NOT TAIL?, that is one native compared to two, with interpreter overhead between the two. Low-level natives like PICK, NOT and TAIL? don't do much in comparison with the interpreter overhead. Large numbers of small operations tend to be slower than small numbers of large operations, if the amount of work done is comparable. This is why structure manipulation is faster in REBOL than simple math. |
Maxim 30-Oct-2009 [77x2] | better described than I would put it, but those where my assumptions... I will include this info in my next revision of the document. |
I intend to devote a whole site to these tests eventually. with a very extensive and comprehensive set of test functions and statistics. | |
BrianH 30-Oct-2009 [79] | With the typos fixed I hope :) |
Maxim 30-Oct-2009 [80x2] | you learn a lot by doing it. |
with the tests I did, I think I can probably optimise liquid by at least 20% just by changing the loops and changing none of the algorithms or features. I am about to create a second reference liquid node type. which will be completely compatible from the outside, but with less features inside. I expect to DOUBLE its processing capacity. | |
BrianH 30-Oct-2009 [82x2] | Code converted from R2 to R3 tends to need reoptimizing - the balance of what is native vs what is mezz has changed, usually for the better. All of the loop functions are native now, for instance. Also, some natives have been converted to actions, and vice versa. |
The plus side is that what you would just do, before you try to optimize, tends to be faster in R3 than it is in R2. | |
Maxim 30-Oct-2009 [84] | yeah amd with the faster fuunction calling in R3 liquid should get an immediate boost in speed... it does A LOT of tiny function calls to manage the graph as if they where first level types with accessors instead of using a controler which loops over the graphs and limits flexibility. |
Maxim 17-May-2010 [85x9] | . |
just for Terry..... rebol [] feach: func[ series value i "iterations" /local result st ][ prin "feach(): " st: now/precise x: length? series while [i > 0][ prin i prin "." result: make series 0 foreach[s p v] series [if s = value [append result reduce [s p v]]] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / 3 "matches found"] result ] find-fast: func [ series value record-length i "iterations" /local result st s ][ prin "find-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head series until [ not all[ s: find/tail s value either 0 = ( (-2 + index? s ) // record-length) [ insert tail result copy/part series record-length not tail? s: skip s record-length ][ not tail? s: next s ] ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] find-really-fast: func [ series keys value record-length i "iterations" /local result st s ][ prin "find-really-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head keys until [ not all [ s: find/tail s value insert tail result copy/part at series (record-length * ( -1 + index? s )) record-length ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] test-size: 2000000 ; number of records iterations: 5 rsize: 3 print "preparing sparse data" dataset: make block! test-size loop test-size [ ; we exclude 9 from dataset loop rsize [ dataset: insert dataset random 8 ] ] dataset: head dataset insert at dataset (100 * rsize + 1) 9 insert at dataset (2000 * rsize + 1) 9 insert at dataset (50000 * rsize + 1) 9 insert at dataset (10000 * rsize + 1) 9 insert at dataset (20000 * rsize + 1) 9 probe length? dataset print "sparse tests:" feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" print "^/dense match with key/value collisions" dataset: make block! test-size loop test-size [ ; we include 9 in the dataset loop rsize [ dataset: insert dataset random 9 ] ] dataset: head dataset probe length? dataset feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" print "^/dense match without key/value collisions" dataset: make block! test-size loop test-size [ ; we include 9 in the dataset dataset: insert dataset random 9 loop (rsize - 1) [ dataset: insert dataset 10 + random 100000 ] ] dataset: head dataset probe length? dataset feach dataset 9 iterations print "----------------" find-fast dataset 9 rsize iterations print "----------------" keys: extract dataset rsize find-really-fast dataset keys 9 rsize iterations print "----------------" ask "done" | |
the results speak for themselves. preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.968 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.735 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.234 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.328 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.375 181756 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.453 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.734 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:08.469 200322 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:07.516 222405 matches found ---------------- done | |
if you are willing to keep a version of the keys accessible at run-time, then the keyed search algorithm is MUCH faster, but incurs memory overhead, and the creation of the keys, on big blocks, is not negligible (time not included in tests, but will generally put the keyed algorythm slower than the fast-find) | |
also note that I am REUSING the result block at each call, so run times on first calls are actullaly slower than on successive calls. | |
noticed a little discrepancy in results... strange... probably cause by a last minute change... | |
ah found it.... | |
the series was being skipped one too many (since I'm using /tail on the find, as suggested by andreas) find-fast: func [ series value record-length i "iterations" /local result st s ][ prin "find-fast(): " st: now/precise while [i > 0][ prin i prin "." result: clear [] ;length? series s: head series until [ not all[ s: find/tail s value either 0 = ( (-2 + index? s ) // record-length) [ insert tail result copy/part series record-length not tail? s: skip s record-length ][ not tail? s: next s ] ] ] i: i - 1 ] prin join " -> " difference now/precise st print [" " (length? result) / record-length "matches found"] head result ] | |
results with this function fixed are similar... preparing sparse data 6000005 sparse tests: feach(): 5.4.3.2.1. -> 0:00:04.89 4 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:00.719 4 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:00.25 4 matches found ---------------- dense match with key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.328 221606 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:14.843 221606 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.5 221606 matches found ---------------- dense match without key/value collisions 6000000 feach(): 5.4.3.2.1. -> 0:00:13.843 222405 matches found ---------------- find-fast(): 5.4.3.2.1. -> 0:00:09.672 222405 matches found ---------------- find-really-fast(): 5.4.3.2.1. -> 0:00:06.375 222405 matches found ---------------- done | |
Terry 18-May-2010 [94x2] | Thanks Maxim |
just trying to grok it :) so 2,000,000 keys against 6, 000,000 values? | |
Maxim 18-May-2010 [96x2] | setup as records of 3 items, 1 key, 2 values. |
3 items (1 key, 2 values). | |
Terry 18-May-2010 [98x3] | 2 values? |
not following ( as usual) | |
a typical hash (for example is [key1 value1 key2 value2] .. one key, one value | |
Maxim 18-May-2010 [101] | in the code... rsize: 3 is the record size... like the /skip value in most series handlers my two funcs will adapt, since you provide it the record size but ... ehehe, I just realized that find HAS the /skip attribute... doh!!! the above can be made MUCH faster still, especially by removing the need for the keys (which take a bit of time to generate on large lists). |
Terry 18-May-2010 [102x2] | my ultimate goal is storing triples.. [a1 a2 a3 b1 b2 b3...] preferably strings if that's fast enough where i can search against any part of the triple , and return the whole thing (triple) |
i'll only be generating the initial series rarely, but appending /changing/ removing often | |
Maxim 18-May-2010 [104] | ok. doable, with almost the same speed as the find-very-fast func, but without the need for its keys argument. |
Terry 18-May-2010 [105] | here's a better example dataset: [ "subject" "predicate" "value" "bob" "fname" "Bob" "bob" "age" "42" "Maxim" "age" "unknown" ... ] so the triple is subject, predicate value with the ability to say quickly return every triple where "predicate" (the second part of the triple) = "age" i'm doing this now with MySQL as a triple store, but series should be much faster |
Maxim 18-May-2010 [106] | I've got it done.... just testing it right now. |
older newer | first last |