Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Fast way to remove all non-numerical chars from a string

From: kpeters:otaksoft at: 24-Sep-2007 10:25

Wow - this seemingly "little" question really sparked some responses! I like it when that happens because it really shows off the brilliance of Rebol and the people mastering it. All solutions will go into my library collection since they all shine in their own way and I can learn from all of them - so I thank you all. As you likely have guessed, I asked because I need to re-format phone numbers. The vast majority of these will arrive formatted by various people according to what they consider proper formatting - sometimes quite creative and riddled with typos as well. At any time, I have to be prepared for the occasional complete junk string. The numbers may reside in MySQL tables or in text files with one phone record (number & address) per line. Each of these tables or text files will be processed exactly once (as far as the phone number standardizing goes) - speed is important but a extra handful of seconds per file (containing between 500,000 and 1,000,000 numbers) won't hurt anybody. The phone numbers are stored with a max of 15 characters each prior to processing - these strings will be overwritten with a standardized phone number string if they contain a valid number and will be emptied otherwise. For now, all phone numbers hail from North America - so valid lengths are a) 7 digits - local number b) 10 digits - area code included c) 11 digits - leading 1 in front of area code Here's the function logic I intend to use: 1) Lose all non-numerical characters from ph#-string 2) If length not in (7,10,11) return empty string because phone# is invalid 3) If length = 11 and first char = 1 then chop off first char // now only 2 possibilities left 4) If length = 10 then frame the three leftmost digits with a pair or parentheses insert a '1' in front 5) Insert hyphen before fourth character from the end of string Does this sound like a good strategy or are there other, maybe radically different (but speedy) ways to do this? TIA, Kai