Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: joel:neely:fedex at: 18-Dec-2001 1:37

Hi, Carl, [Metaphysicomputational rambling ahead; proceed at your own risk!] Carl Read wrote:
> On 18-Dec-01, Joel Neely wrote: > > > Whether a human manufactures the rules, or a piece of AI > > software attempts to do so (and I suspect the human will > > do a better job at this point in history), the problem > > remains that the size of the rule set itself undergoes a > > combinatorial explosion as we try to take into account the > > variations in the data. > > Perhaps, instead of trying to make software understand documents > written any old which way by humans, we should create strictly > formal versions of current human languages that can be tested for > correctness by computer? We'd then be able to have documents that > could be examined by computer without the need to worry about an > infinate number of special cases. >
That's been tried before. It was called COBOL. ;-) More recently, it's been tried again, and called mS Word. =8-0 Seriously, I think the proposal breaks down for two main reasons: 1) It merely displaces the issue -- whatever tool(s) enforce your "formal versions" and "correctness" rules would *STILL* need the rules to be defined, and users would *STILL* be annoyed with poor performance, false positives, and false negatives. 2) It assumes that we know in advance which rules are to be enforced based on possible future uses of the text being created/vetted. To elaborate (optional reading ;-)... 1) Word tries to vet spelling, grammar, punctuation, and typography "on the fly" as the user is typing. This process is a) is incredibly annoying/distracting, especially when writing a draft of a document I know I will subsequently revise and tidy up, but am trying to get "on paper" quickly; b) merely displaces the recognition problem to mS, because code/rules *still* must be designed to determine when something resembles e.g., a date or phone number closely enough that it can be put in "standard" format or challenged with a "Did you really mean...?" message (see point (1.a) above!); and this is even more annoying/frustrating when the rules are wrong, incomplete, or inadequate; and c) frightening, as I don't want a commercial entity taking control of my language, regardless of their own agenda. See for a story titled "Microsoft Edits English" that begins "An article in the 23-Oct-2000 issue of the New York Times ... talks about how Microsoft has eliminated words from its thesaurus so as to "not suggest words that may have offensive uses or provide offensive definitions for any words". Entering a word like "idiot" yields no hits in Word 2000 unlike the numerous hits in Word 97." d) problematic due to international/cultural variation; consider the controversy over conversion of all European currencies to the Euro, and the decades-old-and-still-barely-begun efforts to get the US public to use the metric system. It is quite clear to me that the *ONLY* reasonable way to write dates is 2001/12/18, and I can't understand why you haven't already figured that out for yourself (...I'm JOKING!!! ;-) 2) Human language is "living" and dynamic in the same sense as other human activities. a) We may not know in advance that we'd need to scan my memos from last year to find all of the email addresses, dates, phone numbers, street addresses, names of people who worked in the department then but have transferred to other jobs, program names and version numbers, hostnames for servers in one of our labs... b) New usages, abbreviations, conventions, etc. are being created all the time, because what we have to say, and the frequency with which we have to say it, is constantly changing; rigid standardization stifles expressivity and leaves us in a bland, barren, and plastic-laminated mental landscape -- ya' want fries with that memo? c) Humans are excellent at recognizing patterns, even in the presence of noise, many kinds of errors, and considerable variation (even of the never-seen-before kind). When I'm writing to another human being, I can move quickly because I can trust her/him to understant me even if I make a tpyo. Finally, the discussion of how dates appear in running text is IMHO only a basic exemplar of a much more pervasive issue: whenever we (and especially our programs/systems) interact with human beings in the "real world" the burden should be on *us*and*our*artifacts* to do the adapting to their way of doing/expressing things. That's as true in the design of physical workflow as it is in the design of computational artifacts. As you can tell from my email address, I work for a big company that employs lots of people in lots of places/cultures to do lots of work that must happen with high speed and low (preferably zero ;-) error rate. However, it is often the case that what works well in some contexts (either physically or computationally) is suboptimal in other settings. Finding the balance -- and perhaps I should really say "keeping the balance, in a constantly changing world" -- between standards enforcement and flexibility for local/personal preferences/needs is the underlying "fractal" challenge of which date formatting is just the tiniest tip of the iceberg. OBTW, let's not forget humor. See the random sig of the moment... -jn- -- Outside of a dog, a book is man's best friend. Inside of a dog, it's too dark to read. -- Groucho Marx FIX?PUNCTUATION?joel?dot?neely?at?fedex?dot?com