r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3-OLD1]

Anton
24-Nov-2006
[1625x3]
Functions like these are very useful to have. I could have used them 
recently while doing file searching.
However, I wouldn't like to see these functions included as is.

- Not very efficient. That's ok  for searching small strings or the 
contents of short files, but bad when searching large files for many 
strings. 

- Not generic. The name suggests many datatypes are supported. Better 
names might be find-any-string, find-all-strings
- The above FINDALL does not keep FINDIT as a local.

- The argument names are too short, so they are not distinct or descriptive 
enough.

- The return values are not defined clearly in the function doc strings. 
The above issues are fixable, but it will take some time.
(Actually, the efficiency issue will take the most time to resolve.)
(... but, most important is defining the user interface and functionality 
clearly, as well as eliminating undesireable side-effects.)
Louis
24-Nov-2006
[1628]
Who can make these functions the most efficient, and display them 
in a benchmark program to prove it? And correct all the other problems 
mentioned by Anton.
Anton
24-Nov-2006
[1629x2]
Yes.... "who ?".... :-)
The above find-any suffers from this problem, which needs at least 
to be documented in the function description.
	>> pos: findany "hello cat dog license" ["dog" "cat"]
	== "dog license"

("cat" appears before "dog" in the input string, but because "dog" 
was searched first, it was returned first.)
Maxim
24-Nov-2006
[1631x2]
this is the same limitation as in parse which is not optimal, IMHO.
the 'ANY in the name implies any of the options are equivalent, so 
its the first in the input which is the desired return value, as 
anton points out.
[unknown: 5]
24-Nov-2006
[1633]
Louis, the way I benchmark in REBOL is to do a trace count.  In other 
words if the execution of the trace generates more output than another 
method then I assume that method is less efficient.
Anton
25-Nov-2006
[1634x4]
Stayed up all night, and succeeded in making a parse rule generator, 
so if we want to search a string for any substrings:

string: {Hello there Anton. Arrow in the box. What nice antlers you 
have.}
substrings: ["ant" "antler" "anton" "arrow" "bar" "box"]

rule: [start: [["a" [["nt" action ["ler" action | "on" action]] | 
"rrow" action]] | ["b" ["ar" action | "ox" action]]] | skip]
Found at: 13 Substring: "Ant"
Found at: 13 Substring: "Anton"
Found at: 20 Substring: "Arrow"
Found at: 33 Substring: "box"
Found at: 48 Substring: "ant"
Found at: 48 Substring: "antler"
true
So you can see the rule is built from the substrings.
Thus we are able to search large files for any number of substrings 
in a single pass parse. :)
(very happy about this..) I'll clean it up and publish it probably 
later tonight.
Jerry
25-Nov-2006
[1638]
Nice job. Anton.
Anton
25-Nov-2006
[1639]
Thankyou, Jerry. I wonder if anyone else made a parse generator like 
that ?
Louis
25-Nov-2006
[1640x3]
Maxim and Anton, what difference does it make which value is returned? 
It is the true or false that I am looking for. If any of the strings 
are found, why look any farther? I'm sure you guys have a reason, 
but I want to know what it is.
is returned = is returned first
If you keep looking after already having an answer, how can that 
be more efficient?
Anton
25-Nov-2006
[1643x4]
Well, that functionality works perfectly for your case, but there 
are many other cases where the position of the match(es) is also 
wanted.
.. and a name like FINDALL suggests that it returns those matches.
Your functions might better be named:  ANY-SUBSTRINGS? and ALL-SUBSTRINGS?

FINDANY and FINDALL might be fine for personal use, but to get acceptance 
out in the community, the names should be more accurate.
For the single-pass parse, the action can be defined by the user 
to either continue or break the parse. (So FINDANY would break, whereas 
FINDALL would continue.)
Louis
25-Nov-2006
[1647x2]
OK, it really doesn't make any difference to me what the functions 
are named, as long as the names are easy to remember.
But I would really like to see funtions that find-any-substring and 
that find-all-substrings included in REBOL3, as they make programming 
a lot easier---at least for me.
Maxim
25-Nov-2006
[1649x3]
Louis, hehe you'll eventually realize that semantics in rebol are 
pretty important... simply because Carl puts soooo much effort I 
guess it all makes us anal about it  ;-)
in parse, a good example of where the index creates many problems 
is when you use the 'TO or 'THRU words.
they jump exactly like the above... and well it makes them much less 
usefull within the context of trying to get to the next value of 
"equivalent" values.
Louis
26-Nov-2006
[1652x2]
Maxim, I see. Thanks for the explanation. It will be interesting 
to see Anton's function.
You guys a way more advance to me. That is why I hang out here---I 
get help when I get stuck. And by the way, thanks to all of you guys 
for the help.
Gregg
26-Nov-2006
[1654]
Anton, my LIKE? function generates parse rules, but I doubt it's 
as advanced as yours, since it's just meant for simple pattern matching 
and doesn't deal with multiple search targets.
Anton
26-Nov-2006
[1655]
Gregg, I think I recently made a function probably similar to your 
LIKE?.  Have you published that somewhere ?

But yes, multiple search terms are the next level up. To get the 
full range of matching rules with multiple search terms will take 
some work, however the basis is there.
Anton
27-Nov-2006
[1656]
Ok, here it is:

http://anton.wildit.net.au/rebol/library/string-search-functions.r


do http://anton.wildit.net.au/rebol/library/demo-string-search-functions.r
Gregg
27-Nov-2006
[1657]
It's on REBOl.org. I finally decided to publish it (it's old) when 
I published my file-list script, which uses it.
Louis
27-Nov-2006
[1658x2]
Anton, I think line 437 should be:

	find-every-string: func [
func [

 must of been accidentally deleted right before you sent the file.
Anton
27-Nov-2006
[1660x2]
Ah.. thankyou Louis.
Fixed and republished (with same version number).
Henrik
28-Nov-2006
[1662]
http://www.rebol.net/r3blogs/0052.html<--- Change the Hash Datatype 
in 3.0?
Henrik
29-Nov-2006
[1663]
http://www.rebol.net/r3blogs/0053.html<-- Vector datatype
CharlesS
5-Dec-2006
[1664]
So anyone know of a rough date for 3.0 ?
BrianH
5-Dec-2006
[1665]
No, not even RT.
Izkata
5-Dec-2006
[1666]
Let's assume December 2025, then we shouldn't be disappointed.... 
  =^_^=
(I hope....)
Rebolek
6-Dec-2006
[1667]
I hope it will be out before Duke Nukem Forever >;-)
Pekr
6-Dec-2006
[1668x3]
My estimation is, that R3 (in some form) will be released for DevCon 
2007
... including View, without some parts as Unicode etc., just new 
architecture ....
hmm, pages for DevCon are down, so actually it is difficult to say, 
wehn it will happen :-)
Pekr
11-Dec-2006
[1671]
So now we know the target date for R3 release :-) DevCon 2007 holds 
first conference session - By Carl Sassenrath - "Introducing REBOL 
3.0"
Rebolek
11-Dec-2006
[1672]
Pekr: "introducing R3" may mean just technology overview ;)
CharlesS
11-Dec-2006
[1673]
who here is going to devcon2007 ?
Pekr
11-Dec-2006
[1674]
probably me. It depends if I opt for a new job or no, and if I am 
succesfull :-)