'extract (proposal)
[1/5] from: shannon:ains:au at: 16-Dec-2000 21:58
I discovered a common need that the rebol 'parse, 'find and 'load
functions don't easily solve. That is to search a string! or block! for
a value of a particular datatype!. I think rebol needs a new native (or
mezzanine) function which I like to call 'extract:
USAGE:
EXTRACT series type /part range /all /tail /last /reverse /index
/custom rule
DESCRIPTION:
Finds a datatype in a series and returns the value(s) in a block.
Otherwise returns an empty block.
EXTRACT is an action value.
ARGUMENTS:
series -- (Type: series block port)
type -- (Type: datatype string block)
REFINEMENTS:
/part -- Limits the search to a given length or position.
range -- (Type: number series port)
/all -- Returns all matches in the series or block
/deep -- Searches within sub-strings and sub-blocks in the source
/last -- Backwards from end of series.
/reverse -- Backwards from the current position.
/index -- Returns a block containing the start and end index of the
match
/custom -- Allows custom datatypes to be matched
rule -- Specifies a rule for the custom datatype
examples:
>> extract "I have $10 in the bank!" money!
== [$10]
>> extract/all {<HTML><BODY>Some Text</BODY></HTML>} tag!
== [<HTML> <BODY> </BODY> </HTML>]
>>extract/all "String" char!
== [#"S" #"t" #"r" #"i" #"n" #"g"]
>> indexes: extract/index search-string: {Here is a "string" within a
string} string!
== [11 18]
foreach [start stop] indexes [prin search-string/start prin
search-string/stop]
>> extract/index ["string" 123.123.123.123 10x10] pair!
== [3 3]
>> digits: charset "0123456789"
>> won-id: ["<WON:" some digits ">"]
>> extract/custom {Killa<fred><123><WON:726372>} won-id
== ["<WON:726372>"]
Advanced example:
>> alpha: charset [#"A" - #"Z" #"a" - "z"]
>> digits: charset "0123456789"
>> name: [some alpha " " some alpha]
>> phone-number: [3 digits "-" 3 digits "-" 4 digits]
>> extract/custom/all phone-book [name phone-number]
== ["John Aalane" "333-245-2145" "Mary Absenabil" "435-245-5732" .....]
I would like to see some courageous rebol list-members attempt to write
source for this beast. I have written some myself that performs most of
the basic tasks outlined above but I don't want to contaminate the fresh
thinking of others by posting it now. I will post it soon after some
discussion on this topic.
Here are some issues for discussion:
Should 'extract return [], false or none! when it fails to find a match?
Are the refinements useful, do any clash, should more be added?
Should the functionality of 'extract be split between several
complementary functions to reduce complexity?
Should the syntax for custom rules be the same as for 'parse?
SpliFF
[2/5] from: al:bri:xtra at: 17-Dec-2000 9:09
Spliff wrote:
> I discovered a common need that the rebol 'parse, 'find and 'load
functions don't easily solve. That is to search a string! or block! for a
value of a particular datatype!. I think rebol needs a new native (or
mezzanine) function which I like to call 'extract:
I think that you might find:
parse load "your example string here" Rules
might be more versatile.
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
[3/5] from: shannon:ains:au at: 17-Dec-2000 10:18
Re: 'extract - reply to Andrew
Andrew Martin wrote:
> I think that you might find:
> parse load "your example string here" Rules
> might be more versatile.
I disagree. Your example makes too many assumptions about the input string.
Particularly it assumes that the author of the original source was kind
enough to use elegant spacing and rebol conventions. For example:
this is a string with $10 1234 %a-file.txt etc.
Sometimes this isn't the case such as
source: "User1234<WON:387463>202.76.345.2" ;This string contains several
discreet integers!
extract source integer!
== [1234 387463 202 76 345 2]
and could be made more useful with a refinement like
extract/ignore source [integer!] [tuple!]
== [1234 387463]
All of this can be done with parse. In fact my 'extract code relies on it
extensively. The point is that "Simple things should be simple to do".
Obviously anybody experienced with 'parse would be able to write a function
to match datatypes but rebol is an evolving language. It is not supposed to
be a collection of natives only. Rebol allows the same task to be solved in
multiple ways. That's why we have 'import-email, 'to-integer, 'maximum etc.
etc. etc.
SpliFF
[4/5] from: g:santilli:tiscalinet:it at: 17-Dec-2000 14:19
Hello Shannon!
On 17-Dic-00, you wrote:
SB> source: "User1234<WON:387463>202.76.345.2" ;This string
SB> contains several discreet integers!
SB> extract source integer! == [1234 387463 202 76 345 2]
How could EXTRACT decide if User1234 is a word or if User is a
word and 1234 is an integer? Should it treat <WON:387463> as a tag
or as the set-word WON: followed by the integer 387463?
I don't think this is "a simple thing".
Anyway, if you want to search for a certain datatype in a block,
FIND works:
>> find [word 1234 12.23.34] integer!
== [1234 12.23.34]
>> find [word 1234 12.23.34] word!
== [word 1234 12.23.34]
>> find [word 1234 12.23.34] tuple!
== [12.23.34]
Regards,
Gabriele.
--
Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer
Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/
[5/5] from: al::bri::xtra::co::nz at: 18-Dec-2000 7:42
Spliff wrote:
> I disagree. Your example makes too many assumptions about the input
string. Particularly it assumes that the author of the original source was
kind enough to use elegant spacing and rebol conventions.
> source: "User1234<WON:387463>202.76.345.2" ;This string contains several
discreet integers!
I could suggest that:
202.76.345.2
is a tuple, not a integer. Or it could be:
202.76°C
and:
345.2°C
with a mistakenly typed "." instead of a comma. Or it could be:
202.76°F
to:
345.2°F
and intended to be a range of temperatures, but the keyboard stuck on
the second ".".
This could be a tag!:
<WON:387463>
in HTML or XML.
This could be formula (less than or '<), written without spaces:
1234<WON:387463
with a set-word inbetween.
The point is that using 'parse with suitable rules and 'load if necessary,
is a better solution to the larger problem, you're trying to solve. The
larger problem being understanding the special dialect that the human being
has used or intended to use.
I hope that helps!
Andrew Martin
ICQ: 26227169 http://members.nbci.com/AndrewMartin/
Librarian comment
Later versions of REBOL have a function called extract, but its purpose is different to this proposal. The built-in extract creates a block from an existing block by extracting every nth entry, eg: