Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: REBOL Newbie tries to convert C source to REBOL (long posting) Part

From: AJMartin:orcon at: 24-Dec-2003 22:49

Mike wrote:
> REBOL [] > line: "datalinex HFD 111.11 N012.11.099 E034.31.114"; example line > digits: charset "0123456789"; all numbers that are allowed > alpha: charset "abcdefghijklmnopqrstuvwxyz"; all characters that are > allowed (not case specific) > > dot: #"."; a dot > whitespace: #" "; a whitespace > > rule: [any alpha 1 whitespace 3 digits 1 dot 2 digits]; an abbreviated
> to match the first three elemnts of data > > print parse line [rule]; to check whether the line contains data in the > format specified by rule > ;--------------------------------------
Very close! Unfortunately, you've made some little mistakes. Remember that 'help is free to use (except on parse dialect words...), so don't be afraid to use it:
>> help parse
USAGE: PARSE input rules /all /case DESCRIPTION: Parses a series according to rules. PARSE is a native value. ARGUMENTS: input -- Input series to parse (Type: series) rules -- Rules to parse by (Type: block string none) REFINEMENTS: /all -- Parses all chars including spaces. /case -- Uses case-sensitive comparison. Note the /all refinement; if used it allows processing space otherwise, it skips over spaces in the input. Similarly for /case refinement, if specified, character comparison is case-sensitive; otherwise, not. So it's best to decide right at the top, whether whitespace are important or not, and whether case is important or not. Let's assume that spaces are important, and that case is important as I noticed that data values have upper case letters, and the leading code "datalinex" is lower case: line: "datalinex HFD 111.11 N012.11.099 E034.31.114" Next mistake is here: digits: charset "0123456789" Note that a charset only matches one character. So the above character set should be the singular (one) "digit" not the plural (many) "digits". Let's make that change now: digit: charset "0123456789" Next mistake is here: alpha: charset "abcdefghijklmnopqrstuvwxyz" Note that these are only the lower case letters, not the uppercase letters. So it's the wrong name. But better is a rule for upper and lower case letters, like these (from my %Patterns.r script): Upper: charset [#"A" - #"Z"] Lower: charset [#"a" - #"z"] Alpha: union Upper Lower Another mistake is: whitespace: #" " That's actually a space character. What's better is this: SP: #" " HT: #"^-" LWS: charset reduce [SP HT #"^(A0)"] WS: charset reduce [SP HT newline CR LF] Now onto the rule: rule: [any alpha 1 whitespace 3 digits 1 dot 2 digits] which, apart from the word changes above, needs to be carefully checked against the data format: line: "datalinex HFD 111.11 N012.11.099 E034.31.114" Note that the "HFD" doesn't have a match in the rule, so the rule will fail. So let's write it out in English: some lower-case letters, a space, three upper-case letters, three digits, Period (or dot) two digits, a space, a uppercase letter, 3 digits, dot, 2 digits, dot, 3 digits, a uppercase letter, 3 digits, dot, 2 digits, dot, 3 digits. That's pretty easy to convert to Rebol parse words now: Rule: [ some Lower SP 3 Upper SP 3 Digit Dot 2 Digit SP Upper 3 Digit Dot 2 Digit Dot 3 Digit SP Upper 3 Digit Dot 2 Digit Dot 3 Digit ; Note the duplication! ] probe parse/all/case line Rule And trying it out:
>> probe parse/all/case line Rule
true == true Success! Notice that there is a duplication in the rule? It's a really good idea to remove the duplicated values and say the rule once and once only.
> You mentioned a "pattern.r-file" written by you - is there any
documenation available on the actual usage of your library file ? It hardly needs documenting. Just a simple 'do before using it in other scripts. I've sent a copy directly to you. Andrew J Martin Speaking in tongues and performing miracles. ICQ: 26227169