r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Maxim
18-May-2010
[110x2]
ultimate-find: func [
	series
	value

 index "field you want to search on, should be (1 <= index <= record-length)"
	record-length
	i "iterations"
	/local s st result
][
	prin "ultimate find(): "
	st: now/precise
	while [i > 0][
		prin i
		prin "."
		result: clear [] ;length? series
		s: at series index
		until [
			not all [
				s: find/skip s value record-length

    insert tail result copy/part skip s (-1 * index + 1) record-length
				s: skip s 3
			]
		]
		i: i - 1
	]
	
	prin join " -> " difference now/precise st
	print ["  " (length? result) / record-length "matches found"]
	
	head result
]
searching for strings will be slower... probably much slower... just 
try it out with some data  :-D
Terry
18-May-2010
[112]
what does it mean "field you want to search on"?
Maxim
18-May-2010
[113]
what item of your record you want to match against... basically what 
you meant by searching  subject, predicate or value
Terry
18-May-2010
[114]
yeah, so if i say 1 then that's all subjects?
Maxim
18-May-2010
[115]
yep it will match the value against subjects only.
Terry
18-May-2010
[116x4]
no go then
>> dataset

== [6 27744 92191 1 61175 9905 9 62225 72852 7 31935 71556 4 59248

>> ultimate-find dataset 1 1 zz 1
ultimate find(): 1. -> 0:00:00.031   0 matches found
== []
should have picked up that fourth index (1)
this worked.. 
>> ultimate-find dataset 6 1 zz 1
ultimate find(): 1. -> 0:00:00.219   1 matches found
Maxim
18-May-2010
[120]
zz should be 3.
Terry
18-May-2010
[121x2]
oh.. i thought zz was the length? of dataset
ah.. that works GREAT
Maxim
18-May-2010
[123]
>> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 
"y" "o"]
== [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
>> ultimate-find dataset 4 1 3 1
ultimate find(): 1. -> 0:00   4 matches found
== [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
Terry
18-May-2010
[124]
very nice
Maxim
18-May-2010
[125x2]
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]

:-)
oops missing cmd line...
Terry
18-May-2010
[127]
so if the dataset is key/value just use 2 as the record-length
Maxim
18-May-2010
[128x2]
>> ultimate-find dataset "a" 2 3 1
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]
yep
Terry
18-May-2010
[130x3]
cool
i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" 
into the dataset containing 6 million records.. 

>> ultimate-find dataset "age"  2 3 1
ultimate find(): 1. -> 0:00:00.093   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]
I'll say that's a respectable time... and the leading contestant 
:)
Maxim
18-May-2010
[133]
:-)
Terry
18-May-2010
[134x4]
now if only i was 42 again...
But wait, there's more.... 

convert dataset to hash! and run ultimate-find again!
>> ultimate-find dataset "age"  2 3 100
ultimate find():  -> 0:00   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]

100 iterations not even registering
1000 iterations 0.40
Maxim
18-May-2010
[138]
OMG !
Terry
18-May-2010
[139]
exactly
Maxim
18-May-2010
[140x2]
but I'm getting an odd deadlock here on some tests... hum...
I'm getting extremely slow results on dense tests...
Terry
18-May-2010
[142x2]
interesting... im not too worried as density isn't a big issue with 
triple stores
im off.. good luck with your optimizations
Maxim
18-May-2010
[144]
I'm talking like 100 times worse!   the larger the list the worse 
it gets... seems like an exponential issue.
Terry
18-May-2010
[145]
that seems like an anomaly
Maxim
18-May-2010
[146]
both dense tests perform pretty much the same, the moment I convert 
it to a hash, it gets reallllly slow.
Terry
18-May-2010
[147x2]
yeah, i see that too
mind you, that's pretty dense data
Maxim
18-May-2010
[149]
the strange thing is i did tests using a record size of 2, which 
wouldn't trigger strange mis aligned key/value issues.  I even removed 
the copy to make sure that wasn't the issue and one test with only 
400000 records took more than 4 minutes to complete vs .297 for the 
feach test!
Terry
18-May-2010
[150x2]
I'm looking for the 6 integer.. it's still cranking and i can hear 
my system struggling..
must be a loop error
Maxim
18-May-2010
[152]
well, the results where the same at the end... pretty weird... maybe 
someone has encountered this before and can explain why this happens....
Pekr
18-May-2010
[153]
Max - just a question - wouldn't using parse be faster than find/skip?
Ladislav
18-May-2010
[154]
my advice would be:

1) to test Parse as Pekr noted (traversing only the respective field)
2) to use a hash to index the respective field
Maxim
18-May-2010
[155]
I didn't do any parse test tweaks... but find/skip is very fast so 
far, we can skip over 100 million records within a millisecond.  
not sure parse can beat that
Terry
18-May-2010
[156]
Did you find a solution to the density issue Max?
Maxim
18-May-2010
[157]
nope... I'm working on urgent stuff won't have time for a few days 
to put more time on this.
Steeve
18-May-2010
[158]
didn't tested since a while in R2, but in R3, parse is faster in 
most of the cases (if you write correctly the rules)
Terry
18-May-2010
[159]
I'm wondering if it has something to do with recreating the hash 
each time a value is found?