Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: joel:neely:fedex at: 15-Dec-2001 4:40

Carl Read wrote:
> Well, you didn't say it was for any old phone numbers in any > old text, did you? >
No, but I did say "memo"; I suspect most of us would consider Albert Jones-Smythe ... on 12-01-2001 ... of 28-Nov-2001 as neither unusual text nor phone numbers. ;-)
> Will your Perl script get this right for instance... > So these are standard NZ phone numbers... > > National: 1-2-345 6789 > Local: 345 6789 > 0800: 0800 123 456 >
No, it wouldn't. I should have noted my deliberate decision to handle only a common form of US phone numbers (but the regular expression was certainly clear on that point ;-). Of course, the big danger of trying to write code that handles arbitrary human-written/readable text is that humans can use their context-sensitive intelligence to interpret (and even correct) a wide range of syntactical variation. The only way around that AFAIK is to impose limits on the amount of variation before the program either gives up or raises the case for discussion. For example, the simplest substitution rule that covers all of your NZ samples, as well as common variations on US phone numbers would be s/\b[- .()\d]{8,}\b/####/g which would replace *any* run (of at least 8 characters) of digits, hyphens, spaces, dots, and parentheses with the "blot-out" string. But, of course, that also would affect strings such as 31-12-2001 and 3.1415926535 which a human would likely *not* interpret as phone numbers. IIRC, there was a thread a few months ago about trying to come up with a PARSE rule which would recognize phone numbers from as many countries as possible with as few errors (false positives and false negatives) as possible. As I recall, the result was that the range of variation as one added countries with differing conventions rapidly made the task infeasible.
> rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some > [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f >
Very nice! I find it hard to imagine a shorter solution in REBOL.
> I couldn't have done it without Petr's example though, as what > I was trying wasn't working. (: But I now know a lot more about > parsing than I did yesterday... >
And *THAT*, to my way of thinking, is the a real payoff to such parlor games as this -- we improve our grasp of what is (or isn't) feasible with one tool or another, and learn to use our tool(s) better. -jn- -- One job listing included the qualification "sense of humor" just after "ability to meet stringent deadlines". Man, ain't that the truth! -- Erik Naggum joel:dot:neely:at:fedex:FIX:PUNCTUATION:dot:com