r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

btiffin
12-May-2007
[7860x2]
During a Sunanda ML contest, someone came up with a parse solution 
to compacting

...it was fairly blazing in speed, but I got busy after phase one 
of the contest and didn't 
follow as close as I would have liked to.

Check http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-display-thread.r?m=rmlPCJC
for some
details...watch the entries by Romano, Peter and Christian.
The winning code ended up in

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=rse-ids.r

It's not the same problem set, but parse seems to be a bit of a magic 
bullet, speed
wise.
I'm completely new to parse...haven't timed it...it probably breaks...
out: copy []
key: 'one

parse n [to key  any [thru key (insert tail out first n  insert tail 
out second n)  skip]]
Terry
12-May-2007
[7862]
Im thinking parse is not the way to go.. I'm referring to very large 
hash tables)
Chris
12-May-2007
[7863x3]
; This probably isn't efficient enough either:
remove-dups: func [block [any-block!] /local this-key][

    forskip block 2 [

        this-key: pick block 1
        remove-each [key val] next next block [this-key = key]
    ]
    block
]
; Parse?
remove-dups: func [block [any-block!] /locals this mk mk2][

	parse block [

		any [
			set this word! skip mk:

			any [to this mk2: (remove/part mk2 2)]

			:mk

		]

	]

	block

]
Sorry, read the problem first :)
btiffin
12-May-2007
[7866]
Chris; :)  I was using the remove-each for it's nativeness.

Terry;  Parse has proven itself to be very fast.  It may be faster 
than
remove-each [key item] copy n [not key = 'one] for large sets

Otherwise foreach [key item] n [if key = 'one [insert tail out reduce 
[key item]]

But I'll bet the parse is faster (assuming it works...that is one 
of my first parse's)
Chris
12-May-2007
[7867]
; Parse, this time addressing the spec:
consolidate: func [block [any-block!] /locals key val mk dup][
	parse block [
		any [
			set key word! val: skip mk:
			opt [
				to key dup: (change/only val compose [(val/1)])
				any [to key dup: (append val/1 dup/2 remove/part dup 2)]
			]
			:mk
		]
	]
	block
]
btiffin
12-May-2007
[7868]
Terry; go with Chris...He'll not lead you wrong.  :)
Chris
12-May-2007
[7869]
B: I wonder if 'parse or other loops work faster with hashes?
btiffin
12-May-2007
[7870]
Hmm.  Good point.   R3 is going to have other native! '-each' words 
right?
Terry
12-May-2007
[7871]
um .. not trying to reomve dupes.. trying to collect them all (you 
know, like pokemon cards)
btiffin
12-May-2007
[7872]
Terry;  Sunanda's contest was using  the parse technique on some 
very large

indexes...for skimp.  A foreach solution was actually the fastest 
for a teenie window of

time, until Romano posted the parse winner.  But...it was a different 
problem set.

It was a very informative contest in terms of efficient REBOL coding.
Chris
12-May-2007
[7873]
T: Yes, that's my last function (after I reread your post)
Terry
12-May-2007
[7874x2]
These don't feel right. .. looking for the equiv. of  SQL's  "select 
value where key = 'one'" .. Isn't rifling through a 100mb hash table 
using parse similar to rifling through an un-indexed SQL table?
The rse-ids.r  file seems what Im looking for .. need to have a play.
btiffin
12-May-2007
[7876]
Yep.  You could sort, find first, find last and copy the range?  
But that introduces sort...

There is a blog entry about hash! but we have to wait till R3.  RSN.


Yeah, there was some high-level optimizing going on for that res-ids 
little beauty.  :)
Chris
12-May-2007
[7877]
; one more :)
select-all: func [block [any-block!] key /locals result val][
    result: copy []

    parse block [any [to key skip set val skip (append result val)]]
    result
]
btiffin
12-May-2007
[7878]
That I like... :)  And Terry;  You may be surprised at the timings 
of that  to  sequence.
Chris
12-May-2007
[7879]
(thru key) would work as well, if not better, than (to key skip) 
 :)
btiffin
12-May-2007
[7880]
Or  thru  sequence  :)
Terry
12-May-2007
[7881]
What do you use for the key? a word value .. 'one ?
Chris
12-May-2007
[7882]
Yes -- select-all data 'one
Terry
12-May-2007
[7883x2]
doesn't work
sorry, does work..
Chris
12-May-2007
[7885]
>> n: make hash! [one "test1"  two "test2"  one "test3"]
== make hash! [one "test1" two "test2" one "test3"]
>> select-all n 'one
== ["test1" "test3"]
Terry
12-May-2007
[7886]
(I've been doing so much javascript and php  lately, im really starting 
to lose whatever rebol understanding I once had)
Chris
12-May-2007
[7887]
Ak, js is passable, php? :)
Terry
12-May-2007
[7888]
This sort of hash table won't work anyway due to the 'word limitations 
of the current core
Chris
12-May-2007
[7889]
It will work with other keys, but has the same issue as 'select in 
that values can be mistaken for keys.
Terry
12-May-2007
[7890x2]
exactly  (or as the French say... exact)
Im tryin real hard to get my simple data into a Rebol hash table, 
or blocks.. whatever.. but it seems like traditional Relational DB 
is the way to go.. even used only as a flat file DB :(
Chris
12-May-2007
[7892]
; This may slow things down a little:

select-all: func [block [any-block!] key /locals result val fork][
    result: copy []
    parse block [
        any [
            thru key fork:

            (fork: pick [[set val skip (append result val)][]] even? index? fork)
            fork
        ]
    ]
    result
]
btiffin
12-May-2007
[7893]
I'm a little confused... what word limit are you bumping into?  I 
thought the limit was

only for set-words?  Can't blocks have any number of symbols inside? 
 Until ram
is gone...
Chris
12-May-2007
[7894]
I guess the alternative is *sigh* waiting for R3 where these issues 
will be addressed...
Tomc
12-May-2007
[7895]
terry is ordering your data when you insert it prohivitive?
Terry
12-May-2007
[7896x2]
sigh alright
somewhat tom
Tomc
12-May-2007
[7898]
knowing more about how it will be accesses inser delete moce selecr 
would help choose a stratagey
Terry
12-May-2007
[7899x2]
if everytime you do a 'write' you need to sort 400mb file.. i would 
say yeah
Ok.. it's like this..
Tomc
12-May-2007
[7901]
sorted keys you could binary seatch  to avoid linear scan
btiffin
12-May-2007
[7902x2]
Terry;  Go back to the QRAM days, segment everything. :)  It only 
adds an order of
magnitude in complexity.  :)  Love Wintel...
Sorry.  I shut up now.
Terry
12-May-2007
[7904x2]
My data is structured using a semantic network model..  entites.. 
ie:

tweety   isa   canary
canary   isa  bird
bird   haspart   feathers
bird   haspart  beak
Tomc
12-May-2007
[7906]
so you have seperate hashees of different relationship types
Terry
12-May-2007
[7907]
so currently its stored as entity - attribute - value  in a single 
db table using 3 cols
Tomc
12-May-2007
[7908]
isa hash and partof hash
Terry
12-May-2007
[7909]
(lots of arguments out there regarding EAV as DB model...  most say 
it cant be done.. I say rubish.. works beautifully on a university 
project currently at 3 million rows of data)