Mailing List Archive: Re: Perl is to stupid to understand this 1 liner.

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: joel:neely:fedex at: 15-Dec-2001 4:40


Carl Read wrote:
> Well, you didn't say it was for any old phone numbers in any
> old text, did you?
>

No, but I did say "memo"; I suspect most of us would consider

    Albert Jones-Smythe ... on 12-01-2001 ... of 28-Nov-2001

as neither unusual text nor phone numbers.  ;-)

> Will your Perl script get this right for instance...
> So these are standard NZ phone numbers...
>
>     National: 1-2-345 6789
>     Local: 345 6789
>     0800: 0800 123 456
>

No, it wouldn't.  I should have noted my deliberate decision to
handle only a common form of US phone numbers (but the regular
expression was certainly clear on that point ;-).

Of course, the big danger of trying to write code that handles
arbitrary human-written/readable text is that humans can use their
context-sensitive intelligence to interpret (and even correct)
a wide range of syntactical variation.  The only way around that
AFAIK is to impose limits on the amount of variation before the
program either gives up or raises the case for discussion.  For
example, the simplest substitution rule that covers all of your
NZ samples, as well as common variations on US phone numbers
would be

    s/\b[- .()\d]{8,}\b/####/g

which would replace *any* run (of at least 8 characters) of digits,
hyphens, spaces, dots, and parentheses with the "blot-out" string.

But, of course, that also would affect strings such as

    31-12-2001

and

    3.1415926535

which a human would likely *not* interpret as phone numbers.

IIRC, there was a thread a few months ago about trying to come
up with a PARSE rule which would recognize phone numbers from
as many countries as possible with as few errors (false positives
and false negatives) as possible.  As I recall, the result was
that the range of variation as one added countries with differing
conventions rapidly made the task infeasible.

> rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some
> [a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f
>

Very nice!  I find it hard to imagine a shorter solution in REBOL.

> I couldn't have done it without Petr's example though, as what
> I was trying wasn't working. (:  But I now know a lot more about
> parsing than I did yesterday...
>

And *THAT*, to my way of thinking, is the a real payoff to such
parlor games as this -- we improve our grasp of what is (or isn't)
feasible with one tool or another, and learn to use our tool(s)
better.

-jn-

--
One job listing included the qualification "sense of humor" just
after "ability to meet stringent deadlines". Man, ain't that the
truth!
                                                      -- Erik Naggum
joel:dot:neely:at:fedex:FIX:PUNCTUATION:dot:com