Mailing List Archive: Re: Rugby source code cleaner? :-)

[REBOL] Re: Rugby source code cleaner? :-)

From: joel:neely:fedex at: 28-Jan-2002 8:10


Hi, Petr,

WARNING:  My remarks are not intended as complaints, criticisms,
nor negativity, but simply as thinking out loud about how the
unique nature of REBOL means that things we are used to doing with
other languages may not apply.  However, experience indicates that
some folks may take offense.

None intended.

Petr Krenzelok wrote:
> Hi,
>
> I know it is very sensitive topic, but could anyone write source
> code cleaner for Rugby to have standard Rebol code output as
> suggested by RT?
>

One of the factors which drove me to think about alternative styles
was the difficulty (i.e. impossibility) of algorithmically laying
out REBOL code in the general case.  (Not that The Style That Must
Not Be Mentioned solved that problem by any stretch of the
imagination!)

To show why I believe that to be the case, below are the guidelines
from the RCUG on the RT web site, along with my observations.  I've
re-ordered them to get the easy ones out of the way first.

The lexical issues are easiest to handle.

      5.1.1. Indent Content for Clarity [beginning]

      The contents of a block are indented, but the block's
      enclosing square brackets [ ] are not. That's because
      the square brackets belong to the prior level of syntax,
      as they define the block but are not contents of the
      block. Also, it's easier to spot breaks between adjacent
      blocks when the brackets stand out.

Tracking nesting of/within square brackets (including the presence
of strings delimited by braces and quotes) is a SMOP.

      5.1.2. Standard Tab Size

      REBOL standard tab size is four spaces. Because people
      use different editors and readers for scripts, you can
      elect to use spaces rather than tabs.

      5.1.3. Detab Before Posting

      The tab character (ASCII 9) does not indent four spaces
      in many viewers, browsers, or shells, so use an editor
      or REBOL to detab a script before publishing it to the net.

Once the indentation issues are addressed, using spaces instead of
tabs is trivial.  I find the you-can vs. you-must of the above two
points interesting.

      5.1.4. Limit Line Lengths to 80 Characters

      For ease of reading and portability among editors and
      email readers, limit lines to 80 characters.

This one looks easy, but begs a crucial question: how do we know
*where* within a longer line one should insert the break(s) to
conform to the 80-character (including indentation) length limit?

That takes us from the lexical level directly to the semantic
level, which is where life gets "interesting"...

      5.1.1. Indent Content for Clarity [continued]

      Where possible, an opening square bracket remains
      on the line with its associated expression. The
      closing bracket can be followed by more expressions
      of that same level. These same rules apply equally
      to parenthesis ( ) and braces { }.

      An exception is made for expressions that normally
      belong on a single line, but extend to multiple lines: ...

      This also applies to grouped values that belong
      together, but must be wrapped to fit on the line: ...

I don't know of any syntactic definition of "expression" for REBOL.
This is both due to the "syntax-free" nature of REBOL source code,
and the dynamic/latent/weak typing (use whichever buzzword you wish)
which by-and-large leaves data type issues until evaluation time.

With many other languages, one can look at a string of source text
in complete isolation and determine where the "expressions" and
statements
 are, based on such things as keywords, required
punctuation, syntax and precedence for various built-in operators,
etc.  Since REBOL has none of those things, all of the clues for
doing the analysis at a strictly lexical level are gone.  The only
way I know of to define "expression" in REBOL is to understand the
meaning
 of code, which can only be done at a given point in run
time and with knowledge of the entire set of words defined at that
point.

The same, of course, applies to the notions of whether things
belong on a single line
 and how one can find "grouped values
that belong together" within an arbitrary piece of source text.

How many "expressions" are on the following line?

    a b c 1 d [e f g] 2 h

Of course, the answer depends on the definitions of A thru H at
the/each moment that we try to answer the question!  It is easy
(e.g., by using longer words or adding more values in the same
vein) to make the above line more than 80 characters long.  In
that case, the question of how/where to break it for meaningful
wrapping can only be answered if we know what meanings we're
dealing with.

In the cases of source text which contains

-  references to other source text not present
   (e.g. DO %SOMEFILE to pull in "library" code)

-  multiple definitions for words

-  functions which take functions as arguments

we (humans) may be unable to guess such basic details as which
words are functions and which are not, and how many arguments
a function takes.

Even if we restrict ourselves to the notion of just writing
expressions in "stock" REBOL without any user-defined words,
the only general solutions I've been able to think of are:

-  building a gigantic case-driven syntax analyzer which uses
   knowledge of the argument structure of every built-in word
   of REBOL, or

-  building a very intelligent analyzer which uses the current
   content of the REBOL word list to look up and interpret the
   "expressional" requirements/properties of every word/token
   in the source string being analyzed.

Neither of these techniques survives in the face of source
code in which the user is allowed to define new words.

Of course, if I've overlooked something obvious, I'd be glad
to know what it is!

As I said at the beginning, this is not a criticism; it is just
my attempt to describe the state of affairs as I see them.  To
me, the punch-line is that REBOL (with respect to the issue
we're talking about) resembles more closely a human language than
a traditional programming language.

AI researchers have been working for years (essentially as long
as there have been computers!) on the question of how to analyze,
recognize, interpret, and act the meaning of human language.
Some remarkable things have been done, but they typically require:

-  lots of computing horsepower,

-  highly specialized programming skills,

-  a human to intervene at some point.

Consider the old classic example:

    Time flies like an arrow.

which can be parsed as:

-   a command to determine the flight speed of arrow-shaped
    household insects,

-   a statement about the dietary preference of insects who
    live in clocks at archery ranges, or

-   a philosophical musing on how swiftly time moves.

OBTW, this last one is by far the most complex, since speed is
calculated as distance divided by time; therefore the notion of
the "speed of time" is one which is totally outside the realm
of computation!

-jn-

--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;