Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Rugby source code cleaner? :-)

From: joel:neely:fedex at: 28-Jan-2002 8:10

Hi, Petr, WARNING: My remarks are not intended as complaints, criticisms, nor negativity, but simply as thinking out loud about how the unique nature of REBOL means that things we are used to doing with other languages may not apply. However, experience indicates that some folks may take offense. None intended. Petr Krenzelok wrote:
> Hi, > > I know it is very sensitive topic, but could anyone write source > code cleaner for Rugby to have standard Rebol code output as > suggested by RT? >
One of the factors which drove me to think about alternative styles was the difficulty (i.e. impossibility) of algorithmically laying out REBOL code in the general case. (Not that The Style That Must Not Be Mentioned solved that problem by any stretch of the imagination!) To show why I believe that to be the case, below are the guidelines from the RCUG on the RT web site, along with my observations. I've re-ordered them to get the easy ones out of the way first. The lexical issues are easiest to handle. 5.1.1. Indent Content for Clarity [beginning] The contents of a block are indented, but the block's enclosing square brackets [ ] are not. That's because the square brackets belong to the prior level of syntax, as they define the block but are not contents of the block. Also, it's easier to spot breaks between adjacent blocks when the brackets stand out. Tracking nesting of/within square brackets (including the presence of strings delimited by braces and quotes) is a SMOP. 5.1.2. Standard Tab Size REBOL standard tab size is four spaces. Because people use different editors and readers for scripts, you can elect to use spaces rather than tabs. 5.1.3. Detab Before Posting The tab character (ASCII 9) does not indent four spaces in many viewers, browsers, or shells, so use an editor or REBOL to detab a script before publishing it to the net. Once the indentation issues are addressed, using spaces instead of tabs is trivial. I find the you-can vs. you-must of the above two points interesting. 5.1.4. Limit Line Lengths to 80 Characters For ease of reading and portability among editors and email readers, limit lines to 80 characters. This one looks easy, but begs a crucial question: how do we know *where* within a longer line one should insert the break(s) to conform to the 80-character (including indentation) length limit? That takes us from the lexical level directly to the semantic level, which is where life gets "interesting"... 5.1.1. Indent Content for Clarity [continued] Where possible, an opening square bracket remains on the line with its associated expression. The closing bracket can be followed by more expressions of that same level. These same rules apply equally to parenthesis ( ) and braces { }. An exception is made for expressions that normally belong on a single line, but extend to multiple lines: ... This also applies to grouped values that belong together, but must be wrapped to fit on the line: ... I don't know of any syntactic definition of "expression" for REBOL. This is both due to the "syntax-free" nature of REBOL source code, and the dynamic/latent/weak typing (use whichever buzzword you wish) which by-and-large leaves data type issues until evaluation time. With many other languages, one can look at a string of source text in complete isolation and determine where the "expressions" and statements are, based on such things as keywords, required punctuation, syntax and precedence for various built-in operators, etc. Since REBOL has none of those things, all of the clues for doing the analysis at a strictly lexical level are gone. The only way I know of to define "expression" in REBOL is to understand the meaning of code, which can only be done at a given point in run time and with knowledge of the entire set of words defined at that point. The same, of course, applies to the notions of whether things belong on a single line and how one can find "grouped values that belong together" within an arbitrary piece of source text. How many "expressions" are on the following line? a b c 1 d [e f g] 2 h Of course, the answer depends on the definitions of A thru H at the/each moment that we try to answer the question! It is easy (e.g., by using longer words or adding more values in the same vein) to make the above line more than 80 characters long. In that case, the question of how/where to break it for meaningful wrapping can only be answered if we know what meanings we're dealing with. In the cases of source text which contains - references to other source text not present (e.g. DO %SOMEFILE to pull in "library" code) - multiple definitions for words - functions which take functions as arguments we (humans) may be unable to guess such basic details as which words are functions and which are not, and how many arguments a function takes. Even if we restrict ourselves to the notion of just writing expressions in "stock" REBOL without any user-defined words, the only general solutions I've been able to think of are: - building a gigantic case-driven syntax analyzer which uses knowledge of the argument structure of every built-in word of REBOL, or - building a very intelligent analyzer which uses the current content of the REBOL word list to look up and interpret the "expressional" requirements/properties of every word/token in the source string being analyzed. Neither of these techniques survives in the face of source code in which the user is allowed to define new words. Of course, if I've overlooked something obvious, I'd be glad to know what it is! As I said at the beginning, this is not a criticism; it is just my attempt to describe the state of affairs as I see them. To me, the punch-line is that REBOL (with respect to the issue we're talking about) resembles more closely a human language than a traditional programming language. AI researchers have been working for years (essentially as long as there have been computers!) on the question of how to analyze, recognize, interpret, and act the meaning of human language. Some remarkable things have been done, but they typically require: - lots of computing horsepower, - highly specialized programming skills, - a human to intervene at some point. Consider the old classic example: Time flies like an arrow. which can be parsed as: - a command to determine the flight speed of arrow-shaped household insects, - a statement about the dietary preference of insects who live in clocks at archery ranges, or - a philosophical musing on how swiftly time moves. OBTW, this last one is by far the most complex, since speed is calculated as distance divided by time; therefore the notion of the "speed of time" is one which is totally outside the realm of computation! -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;