Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: joel:neely:fedex at: 17-Dec-2001 19:14

Hi, Gregg, Gregg Irwin wrote:
> Now, you run it and it shows you the results for confirmation, > just like a new assistant you hire. You give it feedback, which > it remembers, and over time it builds new rules to account for > your style. At some point, it may do so well that you tell it > "Don't ask me to proof your work unless you have some doubt about > something". If something, like your example, is of a critical > nature, you can always tell it "run this by me before you send it > out", or "Add your own 'signature' so, if something is wrong, > they can blame you." :) >
That's an interesting (although severely non-trivial) approach to the development issue, but I was describing a property of the problem space itself. Using myself as a case in point, I made up a list of the ways I've actually seen US phone numbers written or typed/typeset: 551-1211 800-552-1212 1-800-553-1213 1+800-554-1214 800/555-1215 (800)556-1216 (800) 557-1217 800.558.1218 1.800.559.1219 Even handling this short list (with nicely commented/named code ;-) to find phone numbers in a text file required something roughly like the following: 8<------------------------------------------------------------ phones: make object! [ defaultarea: "123" areadata: none exchdata: none linedata: none digits: charset "0123456789" plusminus: charset "+-" ldcode: ["1" plusminus | none] optgap: [" " | none] area: [copy areadata 3 digits] exch: [copy exchdata 3 digits] line: [copy linedata 4 digits] phonepatterns: [ exch "-" line (areadata: none) | ldcode area "-" exch "-" line | area "/" exch "-" line | "(" area ")" optgap exch "-" line | ["1." | none] area "." exch "." line ] findphones: func [st [string!] /local result] [ result: clear [] parse/all st [ any [ phonepatterns ( append result rejoin [ any [areadata defaultarea] "-" exchdata "-" linedata ] ) | skip ] ] result ] run: func [fn [file!] /local text] [ text: read fn print rejoin [{"^/} text {^/"}] foreach phone findphones text [ print [tab phone] ] print "" ] ] phones/run %phones.txt 8<------------------------------------------------------------ I'm sure someone could tighten it up a bit, but that's not my main point here. This quick draft version is still sensitive to false positives (e.g. a line with a product number in it resembling #AB-1234-56789 In addition, when I described this to a collegue, she immediately asked, "What about extensions?", raising the issue of multi-line phone systems where the numbers might be written/typed as 800-555-1212x123 808-554-1212 x 234 888-556-1232 ext 456 889-567-1242 ext. 9876 898-576-1252/1234 (with the extension in combination with other phone number formats as appear in the code above). Whether a human manufactures the rules, or a piece of AI software attempts to do so (and I suspect the human will do a better job at this point in history), the problem remains that the size of the rule set itself undergoes a combinatorial explosion as we try to take into account the variations in the data. And we haven't even tackled odd cases like the following You may call me at my office at 1-800-555-1212--I expect to be there until 5:00PM--to discuss our presentation. which ultimately require an actual *understanding* of the text to achieve high accuracy. Hence my description of the problem domain itself as being metaphorically fractal. -jn-