Mailing List Archive: Re: Perl is to stupid to understand this 1 liner.

[REBOL] Re: Perl is to stupid to understand this 1 liner.

From: joel:neely:fedex at: 18-Dec-2001 1:37


Hi, Carl,

[Metaphysicomputational rambling ahead; proceed at your own risk!]

Carl Read wrote:
> On 18-Dec-01, Joel Neely wrote:
>
> > Whether a human manufactures the rules, or a piece of AI
> > software attempts to do so (and I suspect the human will
> > do a better job at this point in history), the problem
> > remains that the size of the rule set itself undergoes a
> > combinatorial explosion as we try to take into account the
> > variations in the data.
>
> Perhaps, instead of trying to make software understand documents
> written any old which way by humans, we should create strictly
> formal versions of current human languages that can be tested for
> correctness by computer?  We'd then be able to have documents that
> could be examined by computer without the need to worry about an
> infinate number of special cases.
>

That's been tried before.  It was called COBOL.  ;-)

More recently, it's been tried again, and called mS Word.  =8-0

Seriously, I think the proposal breaks down for two main reasons:

1)  It merely displaces the issue -- whatever tool(s) enforce your
   "formal versions" and "correctness" rules would *STILL* need
    the rules to be defined, and users would *STILL* be annoyed
    with poor performance, false positives, and false negatives.

2)  It assumes that we know in advance which rules are to be
    enforced based on possible future uses of the text being
    created/vetted.

To elaborate (optional reading ;-)...

1)  Word tries to vet spelling, grammar, punctuation, and typography
   "on the fly" as the user is typing.  This process is

  a) is incredibly annoying/distracting, especially when writing
     a draft of a document I know I will subsequently revise
     and tidy up, but am trying to get "on paper" quickly;

  b) merely displaces the recognition problem to mS, because
     code/rules *still* must be designed to determine when
     something resembles e.g., a date or phone number closely
     enough that it can be put in "standard" format or challenged
     with a "Did you really mean...?" message (see point (1.a)
     above!); and this is even more annoying/frustrating when
     the rules are wrong, incomplete, or inadequate; and

  c) frightening, as I don't want a commercial entity taking
     control of my language, regardless of their own agenda.
     See

     http://slashdot.org/article.pl?sid=01/10/26/1334257&mode=thread

     for a story titled "Microsoft Edits English" that begins

    "An article in the 23-Oct-2000 issue of the New York Times
     ... talks about how Microsoft has eliminated words from its
     thesaurus so as to "not suggest words that may have offensive
     uses or provide offensive definitions for any words". Entering
     a word like "idiot" yields no hits in Word 2000 unlike the
     numerous hits in Word 97."

  d) problematic due to international/cultural variation; consider
     the controversy over conversion of all European currencies to
     the Euro, and the decades-old-and-still-barely-begun efforts
     to get the US public to use the metric system.  It is quite
     clear to me that the *ONLY* reasonable way to write dates is
     2001/12/18, and I can't understand why you haven't already
     figured that out for yourself (...I'm JOKING!!! ;-)

2) Human language is "living" and dynamic in the same sense as
   other human activities.

  a) We may not know in advance that we'd need to scan my memos
     from last year to find all of the email addresses, dates,
     phone numbers, street addresses, names of people who worked
     in the department then but have transferred to other jobs,
     program names and version numbers, hostnames for servers in
     one of our labs...

  b) New usages, abbreviations, conventions, etc. are being
     created all the time, because what we have to say, and the
     frequency with which we have to say it, is constantly
     changing; rigid standardization stifles expressivity and
     leaves us in a bland, barren, and plastic-laminated mental
     landscape -- ya' want fries with that memo?

  c) Humans are excellent at recognizing patterns, even in the
     presence of noise, many kinds of errors, and considerable
     variation (even of the never-seen-before kind).  When I'm
     writing to another human being, I can move quickly because
     I can trust her/him to understant me even if I make a tpyo.

Finally, the discussion of how dates appear in running text is
IMHO only a basic exemplar of a much more pervasive issue:  whenever
we (and especially our programs/systems) interact with human beings
in the "real world" the burden should be on *us*and*our*artifacts*
to do the adapting to their way of doing/expressing things.  That's
as true in the design of physical workflow as it is in the design
of computational artifacts.

As you can tell from my email address, I work for a big company that
employs lots of people in lots of places/cultures to do lots of work
that must happen with high speed and low (preferably zero ;-) error
rate.  However, it is often the case that what works well in some
contexts (either physically or computationally) is suboptimal in
other settings.

Finding the balance -- and perhaps I should really say "keeping the
balance, in a constantly changing world" -- between standards
enforcement and flexibility for local/personal preferences/needs
is the underlying "fractal" challenge of which date formatting is
just the tiniest tip of the iceberg.

OBTW, let's not forget humor.  See the random sig of the moment...

-jn-

--
Outside of a dog, a book is man's best friend. Inside of a dog, it's
too dark to read.
                                                     -- Groucho Marx
FIX?PUNCTUATION?joel?dot?neely?at?fedex?dot?com