[REBOL] Re: Perl is to stupid to understand this 1 liner.
From: joel:neely:fedex at: 17-Dec-2001 19:14
Hi, Gregg,
Gregg Irwin wrote:
...
> Now, you run it and it shows you the results for confirmation,
> just like a new assistant you hire. You give it feedback, which
> it remembers, and over time it builds new rules to account for
> your style. At some point, it may do so well that you tell it
> "Don't ask me to proof your work unless you have some doubt about
> something". If something, like your example, is of a critical
> nature, you can always tell it "run this by me before you send it
> out", or "Add your own 'signature' so, if something is wrong,
> they can blame you." :)
>
That's an interesting (although severely non-trivial) approach to
the development issue, but I was describing a property of the
problem space itself. Using myself as a case in point, I made up
a list of the ways I've actually seen US phone numbers written or
typed/typeset:
551-1211
800-552-1212
1-800-553-1213
1+800-554-1214
800/555-1215
(800)556-1216
(800) 557-1217
800.558.1218
1.800.559.1219
Even handling this short list (with nicely commented/named code ;-)
to find phone numbers in a text file required something roughly
like the following:
8<------------------------------------------------------------
phones: make object! [
defaultarea: "123"
areadata: none
exchdata: none
linedata: none
digits: charset "0123456789"
plusminus: charset "+-"
ldcode: ["1" plusminus | none]
optgap: [" " | none]
area: [copy areadata 3 digits]
exch: [copy exchdata 3 digits]
line: [copy linedata 4 digits]
phonepatterns: [
exch "-" line (areadata: none)
| ldcode area "-" exch "-" line
| area "/" exch "-" line
| "(" area ")" optgap exch "-" line
| ["1." | none] area "." exch "." line
]
findphones: func [st [string!] /local result] [
result: clear []
parse/all st [
any [
phonepatterns (
append result rejoin [
any [areadata defaultarea]
"-" exchdata "-" linedata
]
)
| skip
]
]
result
]
run: func [fn [file!] /local text] [
text: read fn
print rejoin [{"^/} text {^/"}]
foreach phone findphones text [
print [tab phone]
]
print ""
]
]
phones/run %phones.txt
8<------------------------------------------------------------
I'm sure someone could tighten it up a bit, but that's not my
main point here. This quick draft version is still sensitive
to false positives (e.g. a line with a product number in it
resembling
#AB-1234-56789
In addition, when I described this to a collegue, she immediately
asked, "What about extensions?", raising the issue of multi-line
phone systems where the numbers might be written/typed as
800-555-1212x123
808-554-1212 x 234
888-556-1232 ext 456
889-567-1242 ext. 9876
898-576-1252/1234
(with the extension in combination with other phone number formats
as appear in the code above).
Whether a human manufactures the rules, or a piece of AI software
attempts to do so (and I suspect the human will do a better job at
this point in history), the problem remains that the size of the
rule set itself undergoes a combinatorial explosion as we try to
take into account the variations in the data.
And we haven't even tackled odd cases like the following
You may call me at my office at 1-800-555-1212--I expect to be
there until 5:00PM--to discuss our presentation.
which ultimately require an actual *understanding* of the text to
achieve high accuracy.
Hence my description of the problem domain itself as being
metaphorically fractal.
-jn-