[REBOL] simple regex function
From: greggirwin::starband::net at: 3-Sep-2001 21:26
Here's a little RegEx engine I've mangled together. It isn't much. Just
beyond what you can do with find/any, and not complete or debugged, but I've
already learned a lot just building this little function. I'd welcome any
comments people have for ways to improve it (you can laugh at it too, I'm
sure I will in a month <g>). There are at least a couple other ways to
attack it, I know, so thoughts on efficiency issues (e.g. exploiting parse
versus a foreach approach) as well as maintenance or extensibility gotchas
would be great.
--Gregg
REBOL [
Title: "wild-card"
Date: 03-Sep-2001
Version: 0.0.1
File: %wild-card.r
Author: "Gregg Irwin"
Email: [greggirwin--starband--net]
Purpose: {
The match function is a first crack at something like
VBs Like operator. I.e. A simple RegEx engine. The real
purpose was to help me get acquainted with parse.
}
Comment: {
* Zero or more characters
? Any single character
# Any single digit
[list] Any single char in list (character class)
![list] Any single char not in list
Meta chars, except "]", can be used in character classes.
"]" can be used by itself, as a regular char, but not in a character class.
The order for negated character classes should actually be "[!"
but I'll have to figure out how to make that work. I used "!["
for now because it was one less thing to figure out to get it
up and running.
TBD Ranges in character classes are not supported yet
BUG Putting *[ in a pattern chokes it
TBD Change the name. wild-card just doesn't work.
}
History: [
0.0.1 [03-Sep-2001 "Initial Release." "Gregg"]
]
Example: {
test-wild-card/show-exp-pat "abc_()!@^#%_defÿz" "abc*def?[xyz]"
test-wild-card/show-exp-pat "abc_defx" "abc*def[xyz]"
test-wild-card/show-exp-pat "abc_defx" "abc?def[xyz]"
test-wild-card/show-exp-pat "abcdxxxxxx" "abc?*"
test-wild-card/show-exp-pat "avbcvz" "a*z"
test-wild-card/show-exp-pat "12345_xxx" "*_*"
test-wild-card/show-exp-pat "filename.txtdfdf" "*.txt*"
test-wild-card/show-exp-pat "ab*&&&fgÿ?$^^^ `~¨019" "ab[*]*f[ghi]ÿ[?]?*
??![©»§]###"
}
]
wild-card: make object! [
any-char: complement charset ""
digit: charset "0123456789"
any-single-digit: [1 digit]
any-single-char: [1 any-char]
;any-multi-char: [any any-char]
;any-multi-char-to: [any any-char to]
wild-chars: charset "*?![#"
non-wild-chars: complement wild-chars
valid-group-chars: complement charset "]"
to-next-real-char: 'thru
to-end: [to end]
expanded-pattern: none
tmp: none
plain-chars: [copy tmp some non-wild-chars (emit copy tmp)]
dig: ["#" (emit 'any-single-digit)]
star: ["*" (emit 'to-next-real-char)]
any-one: ["?" (emit 'any-single-char)]
char-group: ["[" copy tmp some valid-group-chars "]" (emit charset copy
tmp)]
not-char-group: ["![" copy tmp some valid-group-chars "]" (emit
complement charset copy tmp)]
emit: func [arg][
append expanded-pattern arg
]
expand-pattern: func [p[string!]][
expanded-pattern: make block! length? p
parse/all p [some [ plain-chars | dig | star | any-one | char-group
| not-char-group ]]
; If the last thing in our pattern is thru, it won't work so we
remove the trailing
; thru and replace it with "to end".
if (last expanded-pattern) =? 'to-next-real-char [
remove back tail expanded-pattern
append/only expanded-pattern 'to-end
]
return head expanded-pattern
]
match: func ["match a string against a pattern containing wildcards"
s[string!] "The string you want to check"
p[string!] "The pattern you want to check s against"][
return parse/all s expand-pattern p
]
]
test-wild-card: func [str[string!] pat[string!] /show-exp-pat][
prin ["RegEx: " tab pat newline]
prin ["Str: " tab str newline]
prin ["Result:" tab wild-card/match str pat newline]
if show-exp-pat [
prin ["Parse: " tab remold wild-card/expand-pattern pat newline]
]
prin newline
]
;test-wild-card/show-exp-pat "abc_()!@^#%_defÿz" "abc*def?[xyz]"
;test-wild-card/show-exp-pat "abc_defx" "abc*def[xyz]"
;test-wild-card/show-exp-pat "abc_defx" "abc?def[xyz]"
;test-wild-card/show-exp-pat "abcdxxxxxx" "abc?*"
;test-wild-card/show-exp-pat "avbcvz" "a*z"
;test-wild-card/show-exp-pat "12345_xxx" "*_*"
;test-wild-card/show-exp-pat "filename.txtdfdf" "*.txt*"
;test-wild-card/show-exp-pat "abcdefg" "ab*f[ghi]"
test-wild-card/show-exp-pat "]ab*&&&fgÿ?$^^^ `~¨019[" "]ab[*]*f[ghi]ÿ[?]?*
??![©»§]###[[]"
halt