Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: petr:krenzelok:trz:cz at: 15-Dec-2001 18:09

Joel Neely wrote:
>Carl Read wrote: > >>Well, you didn't say it was for any old phone numbers in any >>old text, did you? >> > >No, but I did say "memo"; I suspect most of us would consider > > Albert Jones-Smythe ... on 12-01-2001 ... of 28-Nov-2001 > >as neither unusual text nor phone numbers. ;-) > >>Will your Perl script get this right for instance... >>So these are standard NZ phone numbers... >> >> National: 1-2-345 6789 >> Local: 345 6789 >> 0800: 0800 123 456 >> > >No, it wouldn't. I should have noted my deliberate decision to >handle only a common form of US phone numbers (but the regular >expression was certainly clear on that point ;-). > >Of course, the big danger of trying to write code that handles >arbitrary human-written/readable text is that humans can use their >context-sensitive intelligence to interpret (and even correct) >a wide range of syntactical variation. The only way around that >AFAIK is to impose limits on the amount of variation before the >program either gives up or raises the case for discussion. For >example, the simplest substitution rule that covers all of your >NZ samples, as well as common variations on US phone numbers >would be > > s/\b[- .()\d]{8,}\b/####/g > >which would replace *any* run (of at least 8 characters) of digits, >hyphens, spaces, dots, and parentheses with the "blot-out" string. > >But, of course, that also would affect strings such as > > 31-12-2001 > >and > > 3.1415926535 > >which a human would likely *not* interpret as phone numbers. > >IIRC, there was a thread a few months ago about trying to come >up with a PARSE rule which would recognize phone numbers from >as many countries as possible with as few errors (false positives >and false negatives) as possible. As I recall, the result was >that the range of variation as one added countries with differing >conventions rapidly made the task infeasible. > >>rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some >>[a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f >> > >Very nice! I find it hard to imagine a shorter solution in REBOL. >
eh, I can't read it right now - what does 1 2 [3 c "-"] 4 c is doing? :-) Thanks ... I also found out there is some strange 'opt keyword in parse rules :-)
> >>I couldn't have done it without Petr's example though, as what >>I was trying wasn't working. (: But I now know a lot more about >>parsing than I did yesterday... >> > >And *THAT*, to my way of thinking, is the a real payoff to such >parlor games as this -- we improve our grasp of what is (or isn't) >feasible with one tool or another, and learn to use our tool(s) >better. >
Yes! That's exactly it. Language flame wars could be seen so often, that I am no longer interested in them :-) On the other hand I have no time to stufy other languages. Sometimes though I call my friend knowing php e.g. and ask him, how would he solve it in the tool he uses. Then I try to think about it and map to what I know about Rebol, and try to find adequate solution ... -pekr-