World: r3wp
[Profiling] Rebol code optimisation and algorithm comparisons.
older newer | first last |
Andreas 19-May-2010 [184] | or have a look at cassandra and/or monetdb (w/o knowing anything about your intended usage) |
Terry 19-May-2010 [185x3] | yeah, I've looked a a few |
rdf is to xml what War and Peace is to Cat in the Hat -- Triples are working even with Maxim's code above (just not in hashes for more than a query with a single value).. but i crave the speed of index? against large datasets. | |
I WILL NOT STOP TILL I HAVE A FAST AND SIMPLE TRIPLE STORE! (sleep is my enemy) | |
Maxim 19-May-2010 [188] | terry, index? is not a procedure within rebol .. its the same as length? its a stored value which is simply looked up when you call index? nothing will be as fast as index? its the "getting to" index which consumes cycles |
Steeve 19-May-2010 [189] | Where's the dilema ? you just have to maintain 3 indexes at the same time (for triples), there isn't any other choice if you looking for speed on readings. |
Terry 19-May-2010 [190x4] | i know .. keys can be integers that are indexes of values in map! or hash. |
yeah Steeve, im scratching out notes on that now.. it's not quite as simple as it sounds | |
ie: a value might be a large binary .. | |
1 GB values as keys don't work very well. | |
Steeve 19-May-2010 [194] | I already said to you to compute a checksum to build keys from large data, it's built-in in Rebol |
Terry 19-May-2010 [195] | yeah, but then you risk collisions |
Steeve 19-May-2010 [196] | with an md5 checksum ??? don't be silly :-) |
Maxim 19-May-2010 [197] | you can negate collisions by building two checksums out of different properties of you data and merging them. |
Terry 19-May-2010 [198x2] | fair enough.. not running bank accounts with this thing |
the other issue is the time it takes to build the checksum vs brute force | |
Steeve 19-May-2010 [200x2] | but it will be 100 or 1000 times faster, then to access the data using an index. |
your actual trial to make a lookup with foreach or find+loop is insanly slow by comparison | |
Sunanda 19-May-2010 [202] | Got to decide what is more important: -- time to build data structure -- time to update it (add/remove on the fly) -- time to search it And build data structures optimized to your priorities. There is no one true solution, just the best match for the situation at hand. |
Steeve 19-May-2010 [203] | *current trial |
Terry 19-May-2010 [204] | ok.. here's an example.. take this simple rdf triple "Tweety" "isa" "Canary" How would create 3 indexes to manage it and 10,000,000 like it? |
Steeve 19-May-2010 [205x2] | It's the problem, I think Terry can't decide :-) |
Ok, I give it to you... | |
Terry 19-May-2010 [207x2] | Tweety "age" "75" Steeve "isa" "Rebol" Steeve "age" "unknown" |
I have a system working now that's fast enough.. but I'm a speed junkie.. there must be a BEST way (not better... BEST) | |
Steeve 19-May-2010 [209x4] | First i add the triple in the triples store (a simple block) |
The, I recovers its index from the block (actually it's the last one) | |
And i use this index as the value for the 3 indexes, and i create the keys+value Tweety : index Isa : index Canary : index | |
that's all... | |
Terry 19-May-2010 [213] | yeah, but ... |
Steeve 19-May-2010 [214] | Perhaps the explanations are not clear, but it's pretty clear in my head ;-) |
Terry 19-May-2010 [215x2] | >> ie: ["Tweety" "isa" "canary"] == ["Tweety" "isa" "canary"] >> index? find ie "Tweety" == 1 Great.. now what |
( only in Rebol does ie: actually mean ie: ) | |
Steeve 19-May-2010 [217x2] | (added In index1) "Tweety"= index (added In index2) "Isa"= index (added In index3) "Canary"= index |
(added In index1) "Tweety"= index (added In index2) "Isa"= index (added In index3) "Canary"= index | |
Terry 19-May-2010 [219] | so now i want to query, say , everything that is "age" "75" (like above) |
Steeve 19-May-2010 [220x4] | by doing, index2/"age" you'll get all the triple having the verb "age" |
with index3/"75", you'll get those with 75 as the target | |
then intersect the 2 blocks | |
I just think you don't get it, but I know my explanation sucks, sorry. | |
Terry 19-May-2010 [224] | i get it. |
Sunanda 19-May-2010 [225] | Ron Everett presented a database that did much of what you want at DevCon2007. The live discussion is here: http://www.rebol.org/aga-display-posts.r?offset=0&post=r3wp500x1919 The video of the presentation may be on qtask. |
Pekr 19-May-2010 [226x2] | yeah, associative stuff :-) |
Senteces from Lazysoft was another product of such kind ... | |
Terry 19-May-2010 [228] | i remember that |
Steeve 19-May-2010 [229x2] | http://www.rebol.net/cgi-bin/r3blog.r?view=0161 |
And my comment... it's remember me the IDE Plex(obsydian) in the nineties. it used widely the concept of triples (tuples) to modelize applications and databases. | |
Terry 19-May-2010 [231x3] | hmm, i thought i got it. :( now I'm lost in block hell |
nope.. i got it again :) | |
Should have listened to my mother and became a lawyer. | |
older newer | first last |