World: r3wp

Join the discussions in the REBOL3 world...

[Profiling] Rebol code optimisation and algorithm comparisons.

older newer	first last
Terry 18-May-2010 [132]	I'll say that's a respectable time... and the leading contestant :)
Maxim 18-May-2010 [133]	:-)
Terry 18-May-2010 [134x4]	now if only i was 42 again...
	But wait, there's more.... convert dataset to hash! and run ultimate-find again!
	>> ultimate-find dataset "age" 2 3 100 ultimate find(): -> 0:00 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"] 100 iterations not even registering
	1000 iterations 0.40
Maxim 18-May-2010 [138]	OMG !
Terry 18-May-2010 [139]	exactly
Maxim 18-May-2010 [140x2]	but I'm getting an odd deadlock here on some tests... hum...
Maxim 18-May-2010 [140x2]	I'm getting extremely slow results on dense tests...
Terry 18-May-2010 [142x2]	interesting... im not too worried as density isn't a big issue with triple stores
Terry 18-May-2010 [142x2]	im off.. good luck with your optimizations
Maxim 18-May-2010 [144]	I'm talking like 100 times worse! the larger the list the worse it gets... seems like an exponential issue.
Terry 18-May-2010 [145]	that seems like an anomaly
Maxim 18-May-2010 [146]	both dense tests perform pretty much the same, the moment I convert it to a hash, it gets reallllly slow.
Terry 18-May-2010 [147x2]	yeah, i see that too
Terry 18-May-2010 [147x2]	mind you, that's pretty dense data
Maxim 18-May-2010 [149]	the strange thing is i did tests using a record size of 2, which wouldn't trigger strange mis aligned key/value issues. I even removed the copy to make sure that wasn't the issue and one test with only 400000 records took more than 4 minutes to complete vs .297 for the feach test!
Terry 18-May-2010 [150x2]	I'm looking for the 6 integer.. it's still cranking and i can hear my system struggling..
Terry 18-May-2010 [150x2]	must be a loop error
Maxim 18-May-2010 [152]	well, the results where the same at the end... pretty weird... maybe someone has encountered this before and can explain why this happens....
Pekr 18-May-2010 [153]	Max - just a question - wouldn't using parse be faster than find/skip?
Ladislav 18-May-2010 [154]	my advice would be: 1) to test Parse as Pekr noted (traversing only the respective field) 2) to use a hash to index the respective field
Maxim 18-May-2010 [155]	I didn't do any parse test tweaks... but find/skip is very fast so far, we can skip over 100 million records within a millisecond. not sure parse can beat that
Terry 18-May-2010 [156]	Did you find a solution to the density issue Max?
Maxim 18-May-2010 [157]	nope... I'm working on urgent stuff won't have time for a few days to put more time on this.
Steeve 18-May-2010 [158]	didn't tested since a while in R2, but in R3, parse is faster in most of the cases (if you write correctly the rules)
Terry 18-May-2010 [159]	I'm wondering if it has something to do with recreating the hash each time a value is found?
Terry 19-May-2010 [160]	Looking at Maxim's ulitmate-find (above, monday 11:32) , does anyone have an idea why when dealing with hash! , the more matches it finds, the slower it gets?
Ladislav 19-May-2010 [161]	I think, that it is quite natural. You should probably generate some random data having (approximately) similar properties as what you intend to process and try some variant approaches to really find out, which one is best for the task. Do you know, that it is possible to index just a specific record field, i.e. you don't need to make a hash containing all the data from the database?
Terry 19-May-2010 [162x2]	Yeah, i've tried some actual data finding 3270 matches out of a hash that is 732981 in length.. when it's block the search takes .033 s, and same run against has is 0.6 but if the matches are just a few, hash is 1000x faster
Terry 19-May-2010 [162x2]	(against has = against hash)
Ladislav 19-May-2010 [164]	.033 s, and same run against has is 0.6 - do you mean 0.6s, ie. roughly 18 times slower?
Terry 19-May-2010 [165]	yeah
Ladislav 19-May-2010 [166x2]	that is interesting, can you post your data generator?
Ladislav 19-May-2010 [166x2]	or, do you use the real-world data?
Maxim 19-May-2010 [168]	the only thing that I'm thinking is that when the hash index changes, its rehashing its content... which is strange.
Terry 19-May-2010 [169]	it's maxim's ultimate-find above ( and im using real world data)
Maxim 19-May-2010 [170]	ladislav, there is a script earlier in this discussion which has a complete working example.
Ladislav 19-May-2010 [171]	aha, you are using real-world data. OK, then, you should tell me how many matches you see
Maxim 19-May-2010 [172x2]	(and a revision to ultimate-find, just after it)
Maxim 19-May-2010 [172x2]	the example shows the problem very well.
Terry 19-May-2010 [174]	3270 matches
Maxim 19-May-2010 [175]	the example creates several sets of data with different organizations and it compares all of them amongst each other. so with that script, you should be able to do all the analysis you need.
Terry 19-May-2010 [176]	495 matches against the same 732981 hash takes only .003
Maxim 19-May-2010 [177]	above I said "its rehashing its content... which is strange." that is a guess... it should say: its might be rehashing its content... which would be strange.
Ladislav 19-May-2010 [178x2]	hmm, what if the hash is optimized for unique elements?
Ladislav 19-May-2010 [178x2]	...then you are most probably out of luck trying to use hash! for indexing purposes
Maxim 19-May-2010 [180]	ah yes.... ususally a hash will have to skip over elements which return the same hash key. so if your table has a few thousand similar items, you aren't benifiting from the hashing... and its quite possible that looking up a hash itself is actually longer when it has to skip over and over (comparing data on top of the hash). though one could argue, that the speeds should be a bit slower than using a block, not this slower... possibly related to the implementation itself.
Terry 19-May-2010 [181]	my dilema is indexing triples in a key/value world
older newer	first last