World: r3wp
[Profiling] Rebol code optimisation and algorithm comparisons.
older newer | first last |
Terry 19-May-2010 [235x3] | Shouldn't Intersect take an block of blocks? |
ie: intersect [ blk1 blk2 blk3.. ] | |
Works great Steeve. I haven't benchmarked it yet, but given the whole thing can sit in hashes or maps! , i would say the performance should be spectacular | |
Gregg 19-May-2010 [238] | Terry, I think INTERSECT is fine the way it is, and it's easy to wrap if you want. fold: func [ series [series!] fn [any-function!] /local value ][ value: pick series 1 foreach item next series [value: fn value item] ] intersect-all: func [ series [block!] "A block of series values to intersect" ][ fold series :intersect ] |
Terry 19-May-2010 [239] | I knew someone would comback with a function.. might as well have been you Gregg. I like PHP's array_merge() |
Maxim 19-May-2010 [240] | steeve, REBOL doesn't support path with strings, and furthermore, it would only return the first index, if you used it within a paren. so I'd really like you to give a small snippet of code with your idea, I am curious about your method... cause I don't see *how* what you say works. |
Steeve 20-May-2010 [241] | I would use map! (or hash!) as indexes and a key would contain one or several triples (inside a block) verb/"age": [ index indexn ...] |
Ladislav 20-May-2010 [242] | Max, Steeve proposed the datastructure so, that the above feach operations become elementary ("instantaneous"). |
Steeve 20-May-2010 [243] | Yeah !!! :) |
Terry 20-May-2010 [244x2] | Here's what I did.. |
ie: [["Steeve" "isa" "person"]["Steeve" "age" "42"]["Tweety" "isa" "Canary"]["Tweety" "age" "75"]["Chirp" "isa" "Canary"]] i1: ["Steeve" [1 2] "Tweety" [3 4] "Chirp" [5]] i2: ["isa" [1 3 5] "age" [2 4]] i3: ["person" [1] "42" [2] "Canary" [3 5]] rocket: func [it][ st: now/precise while[it > 0][ s: i1/("Steeve") p: i2/("isa") res1: intersect s p it: it - 1 ] prin join " -> " difference now/precise st ] | |
Steeve 20-May-2010 [246x2] | Yeah !!! You got it :) |
and i1,i2,i3 should be of type hash! or map! | |
Terry 20-May-2010 [248x4] | This finds all "Steve isa... " If i wanted to find all "Canaries".. p: i2/("isa") v: i3/("Canary") result intersect p v |
yeah.. im just building some benchmarks with real data now | |
my rebol is so rusty.. how do i turn ["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" "isa" "cable" "2c3cukne98" "uid" "8558"... ] into [["2c3cukne98" "femaleend" "E1016 Camlok"][ "2c3cukne98" "isa" "cable" ]["2c3cukne98" "uid" "8558"]... | |
remold make-block reduce enlarge.. or somethin.. ? | |
Steeve 20-May-2010 [252] | map-each |
Ladislav 20-May-2010 [253] | To make it practical for other operations too, the triple store should be defined so, that: 1) every triple is stored as a fourtuple, the additional field being a unique triple ID 2) create the main index using the triple ID, 3) create three field indices searchable by by the field contents, and "inside" also by the main ID This way, the triple storage will be moderately fast for triple removal as well |
Izkata 20-May-2010 [254] | For R2, I wrote a function called Stagger for that: Stagger: func [L I /local][ forall L [ L/1: copy/part L I remove/part next L (I - 1) ] return L ] >> Stagger ["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" "isa" "cable" "2c3cukne98" "uid" "8558"] 3 == [["2c3cukne98" "femaleend" "E1016 Camlok"] ["2c3cukne98" "isa" "cable"] ["2c3cukne98" "uid" "8558"]] |
Terry 20-May-2010 [255] | cool, thanks |
Izkata 20-May-2010 [256] | AFAIK, R2 doesn't have a built in function for it (at least, I never ran across it) |
Ladislav 20-May-2010 [257] | hmm, the Stagger function defined this way is O(N ** 2), i.e. not advisable for large data |
Terry 20-May-2010 [258x2] | just realizing that now :) |
I can hear my HD revving up | |
Izkata 20-May-2010 [260] | It is? I've only used it for relatively small data, so never had a problem.. |
Steeve 20-May-2010 [261x2] | Besides you don't need to store triples as blocks (they can be flattened) in the triple store, then you'll just need to do some math to calculate the index of a triple. The advantage is that you could still use the triple store do some slow searchings. |
... with find | |
Terry 20-May-2010 [263x8] | yeah, i thought of that, but once the inference is there, not sure there's a need |
with maybe the exception of pattern matching? | |
i1 indexes / 3 is always decimal .33333 i2 indexes / 3 is always decimal .6666 i3 indexes / 3 is integer | |
another advantage of a flat triple store is for simple key/value lookups ie: if you have a html element <div id="20984"> where the contents of the div is "pick ie 20984" doesn't get any faster than that | |
And, this data is tight.. the 250,000 real world triples (manages a multi-million dollar electrical equipment rental biz, including 1500 contacts, invoicing, and 10,000s of items) is only 9mb... zips to 1.3mb | |
that's all the data a logic for a medium sized company stored on a floppy. | |
To make it practical for other operations too, the triple store should be defined so, that: 1) every triple is stored as a fourtuple, the additional field being a unique triple ID... Why not just use the index of the first part of the triple? ie: ["tweety" "isa" "canary" "Sylvester" "isa" "cat"...] so 1 and 4 are the two triples( i1) and the next two are i2 and i3 i1: [ "Tweety" [1] "Sylvester" [4][ i2: ["isa" [1 4] i3: ["canary" [ 1] "cat" [4]] | |
so to determine what Sylvester is.. s: i1/("Sylvester") >> [4] p: i2/("isa") >> [1 4] do an intesect to get "4" then the value = index ( 4) + 2 | |
Ladislav 20-May-2010 [271] | Why not just use the index of the first part of the triple? - yes, can be done that way, but if you need to remove the triples, you find out, that it may be practical |
Terry 20-May-2010 [272] | how does the 4th make it more practical? |
Ladislav 20-May-2010 [273] | how... - this way, you will have a triple ID that does not change even if other triples are removed. That is not true for your index. |
Terry 20-May-2010 [274] | If you remove a triple, you'd need to reindex everything, no? |
Ladislav 20-May-2010 [275] | actually, that would be costly |
Terry 20-May-2010 [276x2] | I'd tend to null it out instead, and reuse it later |
I have thought of a fourth ID for a triple before.. but it represents the triple itself. .. rarely needed it though. .there's always a way around | |
Ladislav 20-May-2010 [278] | Yes, sure, as I said, it is practical mainly in case, you want to have simple and fast triple delete operation |
Terry 20-May-2010 [279x2] | yeah.. that could remove some ambiguation |
er.. wait, there's no ambiguation.. http://en.wikipedia.org/wiki/Single_Source_of_Truth | |
Ladislav 20-May-2010 [281] | ...but you are right, that there is always a way around |
Terry 20-May-2010 [282x2] | In my old AtomDB i made in Rebol some time ago, I had an index for each value.. ie: the number "42" was stored only once, and used the index with the inference.. values: [ "42" "November" "Baseball"..] You could have a million birthdays recorded in "November", with a single element billbdaymonth: 2 frankbdaymonth: 2 frankage: 1 the answer to life the universe and everything: 1 |
It makes more sense for larger values.. nice when you can use a single symbol to represent a 1GB binary Anyway, thanks everyone for the help. | |
Ladislav 20-May-2010 [284] | Just BTW, I doubt, that INTERSECT is the fastest way how to intersect two hashes, might need some profiling too. |
older newer | first last |