[REBOL] Re: REBOL Newbie tries to convert C source to REBOL (long posting) Part
From: AJMartin:orcon at: 24-Dec-2003 22:49
Mike wrote:
> REBOL []
> line: "datalinex HFD 111.11 N012.11.099 E034.31.114"; example line
> digits: charset "0123456789"; all numbers that are allowed
> alpha: charset "abcdefghijklmnopqrstuvwxyz"; all characters that are
> allowed (not case specific)
>
> dot: #"."; a dot
> whitespace: #" "; a whitespace
>
> rule: [any alpha 1 whitespace 3 digits 1 dot 2 digits]; an abbreviated
rule
> to match the first three elemnts of data
>
> print parse line [rule]; to check whether the line contains data in the
> format specified by rule
> ;--------------------------------------
Very close! Unfortunately, you've made some little mistakes. Remember that
'help is free to use (except on parse dialect words...), so don't be afraid
to use it:
>> help parse
USAGE:
PARSE input rules /all /case
DESCRIPTION:
Parses a series according to rules.
PARSE is a native value.
ARGUMENTS:
input -- Input series to parse (Type: series)
rules -- Rules to parse by (Type: block string none)
REFINEMENTS:
/all -- Parses all chars including spaces.
/case -- Uses case-sensitive comparison.
Note the /all refinement; if used it allows processing space otherwise, it
skips over spaces in the input. Similarly for /case refinement, if
specified, character comparison is case-sensitive; otherwise, not. So it's
best to decide right at the top, whether whitespace are important or not,
and whether case is important or not. Let's assume that spaces are
important, and that case is important as I noticed that data values have
upper case letters, and the leading code "datalinex" is lower case:
line: "datalinex HFD 111.11 N012.11.099 E034.31.114"
Next mistake is here:
digits: charset "0123456789"
Note that a charset only matches one character. So the above character set
should be the singular (one) "digit" not the plural (many) "digits". Let's
make that change now:
digit: charset "0123456789"
Next mistake is here:
alpha: charset "abcdefghijklmnopqrstuvwxyz"
Note that these are only the lower case letters, not the uppercase letters.
So it's the wrong name. But better is a rule for upper and lower case
letters, like these (from my %Patterns.r script):
Upper: charset [#"A" - #"Z"]
Lower: charset [#"a" - #"z"]
Alpha: union Upper Lower
Another mistake is:
whitespace: #" "
That's actually a space character. What's better is this:
SP: #" "
HT: #"^-"
LWS: charset reduce [SP HT #"^(A0)"]
WS: charset reduce [SP HT newline CR LF]
Now onto the rule:
rule: [any alpha 1 whitespace 3 digits 1 dot 2 digits]
which, apart from the word changes above, needs to be carefully checked
against the data format:
line: "datalinex HFD 111.11 N012.11.099 E034.31.114"
Note that the "HFD" doesn't have a match in the rule, so the rule will fail.
So let's write it out in English:
some lower-case letters, a space, three upper-case letters, three
digits, Period (or dot) two digits, a space, a uppercase letter, 3 digits,
dot, 2 digits, dot, 3 digits, a uppercase letter, 3 digits, dot, 2 digits,
dot, 3 digits.
That's pretty easy to convert to Rebol parse words now:
Rule: [
some Lower
SP
3 Upper SP 3 Digit Dot 2 Digit
SP
Upper 3 Digit Dot 2 Digit Dot 3 Digit
SP
Upper 3 Digit Dot 2 Digit Dot 3 Digit ; Note the duplication!
]
probe parse/all/case line Rule
And trying it out:
>> probe parse/all/case line Rule
true
== true
Success!
Notice that there is a duplication in the rule? It's a really good idea to
remove the duplicated values and say the rule once and once only.
> You mentioned a "pattern.r-file" written by you - is there any
documenation available on the actual usage of your library file ?
It hardly needs documenting. Just a simple 'do before using it in other
scripts. I've sent a copy directly to you.
Andrew J Martin
Speaking in tongues and performing miracles.
ICQ: 26227169
http://www.rebol.it/Valley/
http://valley.orcon.net.nz/
http://Valley.150m.com/