r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Graham
18-Aug-2010
[17913x2]
find [ "ab" "bc" "bd" ] "b*"
looks like I should try and get a hash for each first letter, first 
and second letter etc so I can at least narrow it down to the first 
3 letters and then do search on everyone after that.
Gregg
18-Aug-2010
[17915]
How many strings and what kind of performance do you need?
Graham
18-Aug-2010
[17916x2]
18,000 strings....  real time.... as each time they type a character 
I am presenting a list of choices
I am emulating this http://rxterms.nlm.nih.gov:8080/
AdrianS
18-Aug-2010
[17918x2]
have you thought about using SQLite?
I wonder how that would compare with doing REBOL lookups
Graham
18-Aug-2010
[17920]
I was just thinking that!
Gregg
18-Aug-2010
[17921]
The example helps enormously Graham, because you said "first string" 
but now I see you want all matches for a given prefix.
Graham
18-Aug-2010
[17922]
once I get the first string .. then I can easily get the others as 
I sort the 18k strings first
AdrianS
18-Aug-2010
[17923]
is the existing sqlite driver work over the wire or does it do in 
process calls?
Graham
18-Aug-2010
[17924x4]
Maybe I need RIF
I've never used sqlite
I wonder if Ashley's rebdb would work ...
He doesn't have a like clause ... but I guess it could be faked
Gregg
19-Aug-2010
[17928x2]
Graham, did you try REFORMing into a single string and using FIND/ANY? 
I did a quick test here and it's instant for me with 18K random "words" 
making a 200K string finding the last word (non-random for testing).
Hmmm, some of the terms aren't single words though.
Graham
19-Aug-2010
[17930]
In effect you've created an index
Gregg
19-Aug-2010
[17931]
No, just a list of single-value records. :-)
Graham
19-Aug-2010
[17932]
Actually I wonder why find/any shouldn't work on series like these
Gregg
19-Aug-2010
[17933x2]
Or you can say I created an index and the query system with two function 
calls.
You mean a block containing string values?
Graham
19-Aug-2010
[17935x3]
sure ..
>> help find
USAGE:

    FIND series value /part range /only /case /any /with wild /skip size 
    /match /tail /last /reverse
Doesn't say blocks are not included
Gregg
19-Aug-2010
[17938]
But it's not find/deep, which is what you want.
Graham
19-Aug-2010
[17939x2]
for the /match refinement
Since it's a read only db, perhaps I could use Cyphre's compiler 
:)
Gregg
19-Aug-2010
[17941]
Not sure how /match is going to help you here. It only matches the 
start of the series.

>> find/match ["Aa" "Bb" "Cc"] ["Bb"]
== none
>> find/match ["Aa" "Bb" "Cc"] ["Aa" "Bb"]
== ["Cc"]
Graham
19-Aug-2010
[17942x2]
ok... find/deep then :)
for 18k strings, it seems overkill to use sqlite though
Gregg
19-Aug-2010
[17944]
REFORM + FIND/ANY

Just sayin'. ;-)
Graham
19-Aug-2010
[17945]
I'm sure that it is fast .. but I want real fast so I'm going to 
see if I can create a hash
PeterWood
19-Aug-2010
[17946]
Perhaps Sunanda's skimp.r may be hiding some gems you could adopt.
Sunanda
19-Aug-2010
[17947]
Graham -- consider using a Trie

[that is more-or-less what skimp.r does, but the skimp data structures 
are badly disorted as I was struggling to find a deeply-nested block 
structure that did not trigger garbage collection bugs in the then-current 
REBOL core. (those bugs have been fixed)]
   http://en.wikipedia.org/wiki/Trie
Graham
19-Aug-2010
[17948]
is there a port scheme for trie:// ?
Henrik
19-Aug-2010
[17949]
Graham, look at the filter function in LIST-VIEW.
Graham
19-Aug-2010
[17950]
filter-list ?
Henrik
19-Aug-2010
[17951x3]
no, FILTER.
sorry, FILTER-ROWS.
I guess it's been split in several parts now, so it may be too hard 
to read.
Gabriele
19-Aug-2010
[17954]
Graham, I did that recently, where you get choices as you type. I 
had ~1000 strings though, so not sure how much it scales. I just 
indexed by the first two characters.
Graham
19-Aug-2010
[17955x2]
It seems to be working not too bad for me without a hash .. but I'll 
need to test it on a slower PC ( test inside a VM I guess ) ...
video of my current implementation http://screencast.com/t/ZGI5ZTJkOTct
Gregg
19-Aug-2010
[17957]
Very nice Graham! Thanks for posting. 

How did you end up doing the matching?
Graham
19-Aug-2010
[17958x2]
search each one from the start :)
rebol seems fast enough
Gregg
19-Aug-2010
[17960]
Cool. Yeah, there are times we need to use advanced algorithms, and 
times we don't.
Brock
19-Aug-2010
[17961]
Carl has a character by character search in Rebodex.  Not sure if 
that is any better than what you have done.
Tomc
20-Aug-2010
[17962]
G raham a suffix tree may be what tou want for trem compleation