World: r3wp
[Profiling] Rebol code optimisation and algorithm comparisons.
older newer | first last |
Terry 18-May-2010 [112] | what does it mean "field you want to search on"? |
Maxim 18-May-2010 [113] | what item of your record you want to match against... basically what you meant by searching subject, predicate or value |
Terry 18-May-2010 [114] | yeah, so if i say 1 then that's all subjects? |
Maxim 18-May-2010 [115] | yep it will match the value against subjects only. |
Terry 18-May-2010 [116x4] | no go then |
>> dataset == [6 27744 92191 1 61175 9905 9 62225 72852 7 31935 71556 4 59248 >> ultimate-find dataset 1 1 zz 1 ultimate find(): 1. -> 0:00:00.031 0 matches found == [] | |
should have picked up that fourth index (1) | |
this worked.. >> ultimate-find dataset 6 1 zz 1 ultimate find(): 1. -> 0:00:00.219 1 matches found | |
Maxim 18-May-2010 [120] | zz should be 3. |
Terry 18-May-2010 [121x2] | oh.. i thought zz was the length? of dataset |
ah.. that works GREAT | |
Maxim 18-May-2010 [123] | >> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] == [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"] >> ultimate-find dataset 4 1 3 1 ultimate find(): 1. -> 0:00 4 matches found == [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"] |
Terry 18-May-2010 [124] | very nice |
Maxim 18-May-2010 [125x2] | ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"] :-) |
oops missing cmd line... | |
Terry 18-May-2010 [127] | so if the dataset is key/value just use 2 as the record-length |
Maxim 18-May-2010 [128x2] | >> ultimate-find dataset "a" 2 3 1 ultimate find(): 1. -> 0:00 1 matches found == [1 "a" "B"] |
yep | |
Terry 18-May-2010 [130x3] | cool |
i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" into the dataset containing 6 million records.. >> ultimate-find dataset "age" 2 3 1 ultimate find(): 1. -> 0:00:00.093 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"] | |
I'll say that's a respectable time... and the leading contestant :) | |
Maxim 18-May-2010 [133] | :-) |
Terry 18-May-2010 [134x4] | now if only i was 42 again... |
But wait, there's more.... convert dataset to hash! and run ultimate-find again! | |
>> ultimate-find dataset "age" 2 3 100 ultimate find(): -> 0:00 2 matches found == ["maximn" "age" "unknown" "terry" "age" "42"] 100 iterations not even registering | |
1000 iterations 0.40 | |
Maxim 18-May-2010 [138] | OMG ! |
Terry 18-May-2010 [139] | exactly |
Maxim 18-May-2010 [140x2] | but I'm getting an odd deadlock here on some tests... hum... |
I'm getting extremely slow results on dense tests... | |
Terry 18-May-2010 [142x2] | interesting... im not too worried as density isn't a big issue with triple stores |
im off.. good luck with your optimizations | |
Maxim 18-May-2010 [144] | I'm talking like 100 times worse! the larger the list the worse it gets... seems like an exponential issue. |
Terry 18-May-2010 [145] | that seems like an anomaly |
Maxim 18-May-2010 [146] | both dense tests perform pretty much the same, the moment I convert it to a hash, it gets reallllly slow. |
Terry 18-May-2010 [147x2] | yeah, i see that too |
mind you, that's pretty dense data | |
Maxim 18-May-2010 [149] | the strange thing is i did tests using a record size of 2, which wouldn't trigger strange mis aligned key/value issues. I even removed the copy to make sure that wasn't the issue and one test with only 400000 records took more than 4 minutes to complete vs .297 for the feach test! |
Terry 18-May-2010 [150x2] | I'm looking for the 6 integer.. it's still cranking and i can hear my system struggling.. |
must be a loop error | |
Maxim 18-May-2010 [152] | well, the results where the same at the end... pretty weird... maybe someone has encountered this before and can explain why this happens.... |
Pekr 18-May-2010 [153] | Max - just a question - wouldn't using parse be faster than find/skip? |
Ladislav 18-May-2010 [154] | my advice would be: 1) to test Parse as Pekr noted (traversing only the respective field) 2) to use a hash to index the respective field |
Maxim 18-May-2010 [155] | I didn't do any parse test tweaks... but find/skip is very fast so far, we can skip over 100 million records within a millisecond. not sure parse can beat that |
Terry 18-May-2010 [156] | Did you find a solution to the density issue Max? |
Maxim 18-May-2010 [157] | nope... I'm working on urgent stuff won't have time for a few days to put more time on this. |
Steeve 18-May-2010 [158] | didn't tested since a while in R2, but in R3, parse is faster in most of the cases (if you write correctly the rules) |
Terry 18-May-2010 [159] | I'm wondering if it has something to do with recreating the hash each time a value is found? |
Terry 19-May-2010 [160] | Looking at Maxim's ulitmate-find (above, monday 11:32) , does anyone have an idea why when dealing with hash! , the more matches it finds, the slower it gets? |
Ladislav 19-May-2010 [161] | I think, that it is quite natural. You should probably generate some random data having (approximately) similar properties as what you intend to process and try some variant approaches to really find out, which one is best for the task. Do you know, that it is possible to index just a specific record field, i.e. you don't need to make a hash containing all the data from the database? |
older newer | first last |