World: r3wp
[Core] Discuss core issues
older newer | first last |
Terry 17-May-2010 [16706x6] | There must be a way. An index is a symbol that represents some value. What I need is to add some metadata to that index, and make that searchable. |
the goal is a blazing key/value store that's as fast as pulling by index :) | |
I thought i had it with find/any .. but it doesn't work on blocks | |
I haven't tried it, but my guess is storing or converting values to string to use find/any on will be slower than foreach. as in foreach [key value] n [... | |
and i don't think you can search keys in hash or map! without using foreach? | |
I suppose the while loop with counter, is adding extra burden in my examples as well so real world would be faster | |
Maxim 17-May-2010 [16712] | nope... since that is 10 ops within hundreds of millions. |
Terry 17-May-2010 [16713x2] | The other thing i considered was using pairs! to as a pair of symbols, but can't search those either without foreach ie: [ 23x54 "value 1" 984x2093 "value 2"] |
Maxim, the foreach results against strings in your doc is an interesting result. | |
Maxim 17-May-2010 [16715] | right now, for sparse data, I've got an algorithm that traverses 5 times faster than your foreach example. but if its really dense, then it will be slower. |
Terry 17-May-2010 [16716] | I'm thinking 10,000,000 values min, and preferably to max memory |
Maxim 17-May-2010 [16717x3] | 10 x 10 million items, with a single value within the block... 0:00:00.234 using a 1.5GHz laptop. |
vs: 0:00:01.594 for your foreach example. | |
plus record-size is variable, and is supplied as a parameter to the function. | |
Terry 17-May-2010 [16720] | not bad.. with index? i was getting around 2.5million/sec against 100,000 |
Maxim 17-May-2010 [16721] | find-fast: func [ series value record-length i "iterations" /local result st s ][ st: now/precise while [i > 0][ result: make series 0 ;length? series s: head series until [ not if s: find s value [ either 0 = mod -1 + (index? s) record-length [ insert tail result copy/part series record-length not tail? s: next s ][ s: next s ] ] ] i: i - 1 ] print difference now/precise st print length? result result ] |
Terry 17-May-2010 [16722x2] | nice name .. |
we can call the winner find-fastest | |
Maxim 17-May-2010 [16724x2] | this can probably be optimised further... but note that the algorithm scales with density of your block |
density = ratio of matches vs size of block | |
Terry 17-May-2010 [16726x4] | can it be modified for searching keys? |
and especially pattern matching of some sort? | |
I gues find/part would work if the data were string? no? | |
On your doc, regarding foreach you mentioned.. blocks get faster as you grab more at a time, while strings slow down, go figure! but the stats look the opposite? | |
Maxim 17-May-2010 [16730x2] | if you look at the speeds, cycling through two items at a time should give roughly twice the ops/second. when you look at the results... blocks end up being faster than this, and strings are less than this. |
btw, in my find-fast above... when search matchs aprox 1/10 times, it ends up being twice as slow as the foreach... so speed will depend on the dataset, as always. | |
Andreas 17-May-2010 [16732] | maxim, your find-fast returns the values, not the indices (not sure if that was intended) |
Maxim 17-May-2010 [16733] | so does terry's |
Andreas 17-May-2010 [16734] | his "feach" yes, but not his "ind" adaptation |
Maxim 17-May-2010 [16735x2] | I'm comparing to his feach, which was faster... but if I just returned the index, it would be faster still (no copy/part). |
I'm trying out something else, which might be even faster... | |
Andreas 17-May-2010 [16737x2] | indices?-maxim-1: func [series value /local result][ result: make block! length? series until [ not if series: find series value [ append result index? series series: next series ] ] result ] |
that's your above find, stripped of the record-length stuff, and returning indices instead of the actual values | |
Maxim 17-May-2010 [16739] | btw... I discovered // instead of mod, which is twice as fast..... turns out MOD is a mezz... never noticed that. |
Andreas 17-May-2010 [16740] | prin "Initialising ... " random/seed now/precise dim1: 10'000'000 dim2: dim1 / 2 needle: random dim2 haystack: make block! dim1 loop dim1 [append haystack random dim2] print "done" |
Maxim 17-May-2010 [16741] | (the loop is twice as fast, meaning // is MUCH faster than using mod) |
Andreas 17-May-2010 [16742] | here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle |
Maxim 17-May-2010 [16743] | I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios |
Andreas 17-May-2010 [16744x3] | using find/tail instead of just find speeds up things slightly: |
loop kernel becomes: | |
not if series: find/tail series value [ append result (index? series) - 1 ] | |
Maxim 17-May-2010 [16747] | ah good idea! |
Andreas 17-May-2010 [16748x2] | now, dropping the if for an all, speeds up things minimally, but it clean up the code |
indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ] | |
Maxim 17-May-2010 [16750] | hehe... just what I am doing ;-) |
Andreas 17-May-2010 [16751x5] | ladislav's original parse example is much faster for me, than pekr's "quote"-based one |
interestingly enough, a naive c extension is only negligibly faster | |
indices?-null 0:00:00.184183 indices?-p1 0:00:01.082892 indices?-p2 0:00:01.523015 indices?-u1 0:00:00.347117 indices?-u2 0:00:00.345846 indices?-u3 0:00:00.346959 indices?-ext 0:00:00.329520 | |
null just creates a result block, to demonstrate that 50% of runtime is mem allocation for the 10m result array (so that's where one should really spend time optimising). p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, u1/2/3 are the until-based versions shown above, ext is a naive C extension. | |
the kernel of the C extension: result = RXI_MAKE_BLOCK(series_n - series_i); result_n = 0; for (; series_i < series_n; ++series_i) { elem_type = RXI_GET_VALUE(series, series_i, &elem); if (elem.int64 == value.int64 && elem_type == value_type) { result_v.int64 = series_i + 1; RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER); } } | |
older newer | first last |