Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

simple regex function

 [1/1] from: greggirwin::starband::net at: 3-Sep-2001 21:26


Here's a little RegEx engine I've mangled together. It isn't much. Just beyond what you can do with find/any, and not complete or debugged, but I've already learned a lot just building this little function. I'd welcome any comments people have for ways to improve it (you can laugh at it too, I'm sure I will in a month <g>). There are at least a couple other ways to attack it, I know, so thoughts on efficiency issues (e.g. exploiting parse versus a foreach approach) as well as maintenance or extensibility gotchas would be great. --Gregg REBOL [ Title: "wild-card" Date: 03-Sep-2001 Version: 0.0.1 File: %wild-card.r Author: "Gregg Irwin" Email: [greggirwin--starband--net] Purpose: { The match function is a first crack at something like VBs Like operator. I.e. A simple RegEx engine. The real purpose was to help me get acquainted with parse. } Comment: { * Zero or more characters ? Any single character # Any single digit [list] Any single char in list (character class) ![list] Any single char not in list Meta chars, except "]", can be used in character classes. "]" can be used by itself, as a regular char, but not in a character class. The order for negated character classes should actually be "[!" but I'll have to figure out how to make that work. I used "![" for now because it was one less thing to figure out to get it up and running. TBD Ranges in character classes are not supported yet BUG Putting *[ in a pattern chokes it TBD Change the name. wild-card just doesn't work. } History: [ 0.0.1 [03-Sep-2001 "Initial Release." "Gregg"] ] Example: { test-wild-card/show-exp-pat "abc_()!@^#%_defÿz" "abc*def?[xyz]" test-wild-card/show-exp-pat "abc_defx" "abc*def[xyz]" test-wild-card/show-exp-pat "abc_defx" "abc?def[xyz]" test-wild-card/show-exp-pat "abcdxxxxxx" "abc?*" test-wild-card/show-exp-pat "avbcvz" "a*z" test-wild-card/show-exp-pat "12345_xxx" "*_*" test-wild-card/show-exp-pat "filename.txtdfdf" "*.txt*" test-wild-card/show-exp-pat "ab*&&&fgÿ?$^^^ `~¨019" "ab[*]*f[ghi]ÿ[?]?* ??![©»§]###" } ] wild-card: make object! [ any-char: complement charset "" digit: charset "0123456789" any-single-digit: [1 digit] any-single-char: [1 any-char] ;any-multi-char: [any any-char] ;any-multi-char-to: [any any-char to] wild-chars: charset "*?![#" non-wild-chars: complement wild-chars valid-group-chars: complement charset "]" to-next-real-char: 'thru to-end: [to end] expanded-pattern: none tmp: none plain-chars: [copy tmp some non-wild-chars (emit copy tmp)] dig: ["#" (emit 'any-single-digit)] star: ["*" (emit 'to-next-real-char)] any-one: ["?" (emit 'any-single-char)] char-group: ["[" copy tmp some valid-group-chars "]" (emit charset copy tmp)] not-char-group: ["![" copy tmp some valid-group-chars "]" (emit complement charset copy tmp)] emit: func [arg][ append expanded-pattern arg ] expand-pattern: func [p[string!]][ expanded-pattern: make block! length? p parse/all p [some [ plain-chars | dig | star | any-one | char-group | not-char-group ]] ; If the last thing in our pattern is thru, it won't work so we remove the trailing ; thru and replace it with "to end". if (last expanded-pattern) =? 'to-next-real-char [ remove back tail expanded-pattern append/only expanded-pattern 'to-end ] return head expanded-pattern ] match: func ["match a string against a pattern containing wildcards" s[string!] "The string you want to check" p[string!] "The pattern you want to check s against"][ return parse/all s expand-pattern p ] ] test-wild-card: func [str[string!] pat[string!] /show-exp-pat][ prin ["RegEx: " tab pat newline] prin ["Str: " tab str newline] prin ["Result:" tab wild-card/match str pat newline] if show-exp-pat [ prin ["Parse: " tab remold wild-card/expand-pattern pat newline] ] prin newline ] ;test-wild-card/show-exp-pat "abc_()!@^#%_defÿz" "abc*def?[xyz]" ;test-wild-card/show-exp-pat "abc_defx" "abc*def[xyz]" ;test-wild-card/show-exp-pat "abc_defx" "abc?def[xyz]" ;test-wild-card/show-exp-pat "abcdxxxxxx" "abc?*" ;test-wild-card/show-exp-pat "avbcvz" "a*z" ;test-wild-card/show-exp-pat "12345_xxx" "*_*" ;test-wild-card/show-exp-pat "filename.txtdfdf" "*.txt*" ;test-wild-card/show-exp-pat "abcdefg" "ab*f[ghi]" test-wild-card/show-exp-pat "]ab*&&&fgÿ?$^^^ `~¨019[" "]ab[*]*f[ghi]ÿ[?]?* ??![©»§]###[[]" halt