World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Terry 17-May-2010 [16706x6]	There must be a way. An index is a symbol that represents some value. What I need is to add some metadata to that index, and make that searchable.
	the goal is a blazing key/value store that's as fast as pulling by index :)
	I thought i had it with find/any .. but it doesn't work on blocks
	I haven't tried it, but my guess is storing or converting values to string to use find/any on will be slower than foreach. as in foreach [key value] n [...
	and i don't think you can search keys in hash or map! without using foreach?
	I suppose the while loop with counter, is adding extra burden in my examples as well so real world would be faster
Maxim 17-May-2010 [16712]	nope... since that is 10 ops within hundreds of millions.
Terry 17-May-2010 [16713x2]	The other thing i considered was using pairs! to as a pair of symbols, but can't search those either without foreach ie: [ 23x54 "value 1" 984x2093 "value 2"]
Terry 17-May-2010 [16713x2]	Maxim, the foreach results against strings in your doc is an interesting result.
Maxim 17-May-2010 [16715]	right now, for sparse data, I've got an algorithm that traverses 5 times faster than your foreach example. but if its really dense, then it will be slower.
Terry 17-May-2010 [16716]	I'm thinking 10,000,000 values min, and preferably to max memory
Maxim 17-May-2010 [16717x3]	10 x 10 million items, with a single value within the block... 0:00:00.234 using a 1.5GHz laptop.
	vs: 0:00:01.594 for your foreach example.
	plus record-size is variable, and is supplied as a parameter to the function.
Terry 17-May-2010 [16720]	not bad.. with index? i was getting around 2.5million/sec against 100,000
Maxim 17-May-2010 [16721]	find-fast: func [ series value record-length i "iterations" /local result st s ][ st: now/precise while [i > 0][ result: make series 0 ;length? series s: head series until [ not if s: find s value [ either 0 = mod -1 + (index? s) record-length [ insert tail result copy/part series record-length not tail? s: next s ][ s: next s ] ] ] i: i - 1 ] print difference now/precise st print length? result result ]
Terry 17-May-2010 [16722x2]	nice name ..
Terry 17-May-2010 [16722x2]	we can call the winner find-fastest
Maxim 17-May-2010 [16724x2]	this can probably be optimised further... but note that the algorithm scales with density of your block
Maxim 17-May-2010 [16724x2]	density = ratio of matches vs size of block
Terry 17-May-2010 [16726x4]	can it be modified for searching keys?
	and especially pattern matching of some sort?
	I gues find/part would work if the data were string? no?
	On your doc, regarding foreach you mentioned.. blocks get faster as you grab more at a time, while strings slow down, go figure! but the stats look the opposite?
Maxim 17-May-2010 [16730x2]	if you look at the speeds, cycling through two items at a time should give roughly twice the ops/second. when you look at the results... blocks end up being faster than this, and strings are less than this.
Maxim 17-May-2010 [16730x2]	btw, in my find-fast above... when search matchs aprox 1/10 times, it ends up being twice as slow as the foreach... so speed will depend on the dataset, as always.
Andreas 17-May-2010 [16732]	maxim, your find-fast returns the values, not the indices (not sure if that was intended)
Maxim 17-May-2010 [16733]	so does terry's
Andreas 17-May-2010 [16734]	his "feach" yes, but not his "ind" adaptation
Maxim 17-May-2010 [16735x2]	I'm comparing to his feach, which was faster... but if I just returned the index, it would be faster still (no copy/part).
Maxim 17-May-2010 [16735x2]	I'm trying out something else, which might be even faster...
Andreas 17-May-2010 [16737x2]	indices?-maxim-1: func [series value /local result][ result: make block! length? series until [ not if series: find series value [ append result index? series series: next series ] ] result ]
Andreas 17-May-2010 [16737x2]	that's your above find, stripped of the record-length stuff, and returning indices instead of the actual values
Maxim 17-May-2010 [16739]	btw... I discovered // instead of mod, which is twice as fast..... turns out MOD is a mezz... never noticed that.
Andreas 17-May-2010 [16740]	prin "Initialising ... " random/seed now/precise dim1: 10'000'000 dim2: dim1 / 2 needle: random dim2 haystack: make block! dim1 loop dim1 [append haystack random dim2] print "done"
Maxim 17-May-2010 [16741]	(the loop is twice as fast, meaning // is MUCH faster than using mod)
Andreas 17-May-2010 [16742]	here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle
Maxim 17-May-2010 [16743]	I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios
Andreas 17-May-2010 [16744x3]	using find/tail instead of just find speeds up things slightly:
	loop kernel becomes:
	not if series: find/tail series value [ append result (index? series) - 1 ]
Maxim 17-May-2010 [16747]	ah good idea!
Andreas 17-May-2010 [16748x2]	now, dropping the if for an all, speeds up things minimally, but it clean up the code
Andreas 17-May-2010 [16748x2]	indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ]
Maxim 17-May-2010 [16750]	hehe... just what I am doing ;-)
Andreas 17-May-2010 [16751x5]	ladislav's original parse example is much faster for me, than pekr's "quote"-based one
	interestingly enough, a naive c extension is only negligibly faster
	indices?-null 0:00:00.184183 indices?-p1 0:00:01.082892 indices?-p2 0:00:01.523015 indices?-u1 0:00:00.347117 indices?-u2 0:00:00.345846 indices?-u3 0:00:00.346959 indices?-ext 0:00:00.329520
	null just creates a result block, to demonstrate that 50% of runtime is mem allocation for the 10m result array (so that's where one should really spend time optimising). p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, u1/2/3 are the until-based versions shown above, ext is a naive C extension.
	the kernel of the C extension: result = RXI_MAKE_BLOCK(series_n - series_i); result_n = 0; for (; series_i < series_n; ++series_i) { elem_type = RXI_GET_VALUE(series, series_i, &elem); if (elem.int64 == value.int64 && elem_type == value_type) { result_v.int64 = series_i + 1; RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER); } }
older newer	first last