Rugby source code cleaner? :-)

[1/4] from: petr:krenzelok:trz:cz at: 25-Jan-2002 7:46

Hi, I know it is very sensitive topic, but could anyone write source code cleaner for Rugby to have standard Rebol code output as suggested by RT? :-) -pekr-

[2/4] from: joel:neely:fedex at: 28-Jan-2002 8:10

Hi, Petr, WARNING: My remarks are not intended as complaints, criticisms, nor negativity, but simply as thinking out loud about how the unique nature of REBOL means that things we are used to doing with other languages may not apply. However, experience indicates that some folks may take offense. None intended. Petr Krenzelok wrote:

> Hi, > > I know it is very sensitive topic, but could anyone write source > code cleaner for Rugby to have standard Rebol code output as > suggested by RT? >

One of the factors which drove me to think about alternative styles was the difficulty (i.e. impossibility) of algorithmically laying out REBOL code in the general case. (Not that The Style That Must Not Be Mentioned solved that problem by any stretch of the imagination!) To show why I believe that to be the case, below are the guidelines from the RCUG on the RT web site, along with my observations. I've re-ordered them to get the easy ones out of the way first. The lexical issues are easiest to handle. 5.1.1. Indent Content for Clarity [beginning] The contents of a block are indented, but the block's enclosing square brackets [ ] are not. That's because the square brackets belong to the prior level of syntax, as they define the block but are not contents of the block. Also, it's easier to spot breaks between adjacent blocks when the brackets stand out. Tracking nesting of/within square brackets (including the presence of strings delimited by braces and quotes) is a SMOP. 5.1.2. Standard Tab Size REBOL standard tab size is four spaces. Because people use different editors and readers for scripts, you can elect to use spaces rather than tabs. 5.1.3. Detab Before Posting The tab character (ASCII 9) does not indent four spaces in many viewers, browsers, or shells, so use an editor or REBOL to detab a script before publishing it to the net. Once the indentation issues are addressed, using spaces instead of tabs is trivial. I find the you-can vs. you-must of the above two points interesting. 5.1.4. Limit Line Lengths to 80 Characters For ease of reading and portability among editors and email readers, limit lines to 80 characters. This one looks easy, but begs a crucial question: how do we know *where* within a longer line one should insert the break(s) to conform to the 80-character (including indentation) length limit? That takes us from the lexical level directly to the semantic level, which is where life gets "interesting"... 5.1.1. Indent Content for Clarity [continued] Where possible, an opening square bracket remains on the line with its associated expression. The closing bracket can be followed by more expressions of that same level. These same rules apply equally to parenthesis ( ) and braces { }. An exception is made for expressions that normally belong on a single line, but extend to multiple lines: ... This also applies to grouped values that belong together, but must be wrapped to fit on the line: ... I don't know of any syntactic definition of "expression" for REBOL. This is both due to the "syntax-free" nature of REBOL source code, and the dynamic/latent/weak typing (use whichever buzzword you wish) which by-and-large leaves data type issues until evaluation time. With many other languages, one can look at a string of source text in complete isolation and determine where the "expressions" and statements are, based on such things as keywords, required punctuation, syntax and precedence for various built-in operators, etc. Since REBOL has none of those things, all of the clues for doing the analysis at a strictly lexical level are gone. The only way I know of to define "expression" in REBOL is to understand the meaning of code, which can only be done at a given point in run time and with knowledge of the entire set of words defined at that point. The same, of course, applies to the notions of whether things belong on a single line and how one can find "grouped values that belong together" within an arbitrary piece of source text. How many "expressions" are on the following line? a b c 1 d [e f g] 2 h Of course, the answer depends on the definitions of A thru H at the/each moment that we try to answer the question! It is easy (e.g., by using longer words or adding more values in the same vein) to make the above line more than 80 characters long. In that case, the question of how/where to break it for meaningful wrapping can only be answered if we know what meanings we're dealing with. In the cases of source text which contains - references to other source text not present (e.g. DO %SOMEFILE to pull in "library" code) - multiple definitions for words - functions which take functions as arguments we (humans) may be unable to guess such basic details as which words are functions and which are not, and how many arguments a function takes. Even if we restrict ourselves to the notion of just writing expressions in "stock" REBOL without any user-defined words, the only general solutions I've been able to think of are: - building a gigantic case-driven syntax analyzer which uses knowledge of the argument structure of every built-in word of REBOL, or - building a very intelligent analyzer which uses the current content of the REBOL word list to look up and interpret the "expressional" requirements/properties of every word/token in the source string being analyzed. Neither of these techniques survives in the face of source code in which the user is allowed to define new words. Of course, if I've overlooked something obvious, I'd be glad to know what it is! As I said at the beginning, this is not a criticism; it is just my attempt to describe the state of affairs as I see them. To me, the punch-line is that REBOL (with respect to the issue we're talking about) resembles more closely a human language than a traditional programming language. AI researchers have been working for years (essentially as long as there have been computers!) on the question of how to analyze, recognize, interpret, and act the meaning of human language. Some remarkable things have been done, but they typically require: - lots of computing horsepower, - highly specialized programming skills, - a human to intervene at some point. Consider the old classic example: Time flies like an arrow. which can be parsed as: - a command to determine the flight speed of arrow-shaped household insects, - a statement about the dietary preference of insects who live in clocks at archery ranges, or - a philosophical musing on how swiftly time moves. OBTW, this last one is by far the most complex, since speed is calculated as distance divided by time; therefore the notion of the "speed of time" is one which is totally outside the realm of computation! -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[3/4] from: joel:neely:fedex at: 30-Jan-2002 9:04

Hello, all, Joel Neely wrote:

> One of the factors which drove me to think about alternative styles > was the difficulty (i.e. impossibility) of algorithmically laying > out REBOL code in the general case. (Not that The Style That Must > Not Be Mentioned solved that problem by any stretch of the > imagination!) >

Motivated to see how far I could get with purely lexical/syntactic concepts (and hoping someone else could benefit/contribute), I hacked up a "pretty-printer" for REBOL structures. At this point it only handles blocks, objects, functions, and files (assumed to contain REBOL source), or words that are set to one of those types. PPR assumes that all whitespace (non-string, of course!) in an input file or block is ignorable -- after all, PPR is supposed to be doing the layout, and should assume that any pre-existing style in the source is obsolete! ;-) If PPR has any value (other than as a stimulus for further ideas), it would be as a "pre-pass" over code to perform basic structural layout. Real-world use would still require a human to go back and insert optional whitespace (horizontal and vertical) as desired to supply semantically-oriented grouping/separation, and to fix the mangled comments (see below). PPR attempts to implement the following rules, which I believe to be consistent with the RCUG style rules: - Use indentation (4 columns per level) to show nested structure. - Limit line length to 72 characters. - If the representation of a value will fit entirely on the on the current line, put it there. - If it will not fit on the current line, and it's not a block, start a new line (appropriately indented). - If it won't fit, and it's a block, indent the contents of the block; the opening bracket goes at the end of the previous line, and the closing bracket begins a new line resuming the previous indentation level. - Multiply-nested blocks can begin on the same line (as in the previous point) but their closing brackets will be on distinct lines at the appropriate "outdent". Additional heuristics are: - The result of PPR is a string (allowing it to be saved to a file for source cleanup). To simply see the output for testing purposes, PRINT PPR ... is the simplest thing to do. - If the argument to PPR is a word , the output resembles the setting of that word to a pretty-printed presentation of its value (see type limitations above!) - If the argument is a file, the content of the file is presented as if LOADed instead of DOne. Since REBOL doesn't recognize the distinction between data and code, no attempt was made to guess the meaning/usage of the content of a block, with one exception: - A new line is begun (at the appropriate level of indentation) whenever a set-word value (or sequence of consecutive set-word values) is encountered in a block which will not fit on a single line. This has the advantage of making objects appear in an intuitively obvious form (and function bodies, as well) but has the disadvantage that set-words embedded within larger running expressions don't appear as subordinate. In general, no attempt is made to fathom/guess the meaning/intent of a block, so the semantic concepts of "expression" and "belonging together" are essentially absent. One additional limitation: comments in source files are hopelessly mangled. The use of comment { ... } will allow the output of PPR to be valid for re-loading, but all layout is lost. OTOH, the use of ;-style comments will likely produce text that will choke REBOL upon reloading unless manual intervention occurs. This email is already too long, but as a quick demo (for those who may not want to bother trying it out themselves), here's a source file which contains three copies of the same function with various layouts (RCUG, Style R, and Obfuscated): 8<------------------------------------------------------------ REBOL [] bignum: func [/local a aa b c d] [ a: aa: 1.0 until [ aa: a b: 1.0 c: 1.0 until [ any [ error? try [ d: a + b if d > a [print [d b c]] a: d ] error? try [ c: c + c b: c ] ] ] print a a = aa ] print a ] bignum2: func [ /local a aa b c d ][ a: aa: 1.0 until [ aa: a b: 1.0 c: 1.0 until [ any [ error? try [ d: a + b if d > a [print [d b c]] a: d] error? try [ c: c + c b: c] ] ] print a a = aa ] print a ] bignum3: func[/local a aa b c d][a: aa: 1.0 until[aa: a b: 1.0 c: 1.0 until[any[error? try[d: a + b if d > a[print[d b c]]a: d]error? try[c: c + c b: c]]]print a a = aa]print a] 8<------------------------------------------------------------ And here's how PPR renders that file: 8<------------------------------------------------------------

>> print ppr %/a/bignum.r

[ bignum: func [/local a aa b c d] [ a: aa: 1 until [ aa: a b: 1 c: 1 until [ any [ error? try [d: a + b if d > a [print [d b c]] a: d] error? try [c: c + c b: c] ] ] print a a = aa ] print a ] bignum2: func [/local a aa b c d] [ a: aa: 1 until [ aa: a b: 1 c: 1 until [ any [ error? try [d: a + b if d > a [print [d b c]] a: d] error? try [c: c + c b: c] ] ] print a a = aa ] print a ] bignum3: func [/local a aa b c d] [ a: aa: 1 until [ aa: a b: 1 c: 1 until [ any [ error? try [d: a + b if d > a [print [d b c]] a: d] error? try [c: c + c b: c] ] ] print a a = aa ] print a ] ] 8<------------------------------------------------------------ and an individual function after the file is DOne: 8<------------------------------------------------------------

>> print ppr 'bignum3

bignum3: func [/local a aa b c d] [ a: aa: 1 until [ aa: a b: 1 c: 1 until [ any [ error? try [d: a + b if d > a [print [d b c]] a: d] error? try [c: c + c b: c] ] ] print a a = aa ] print a ] 8<------------------------------------------------------------ Without further ado, here's PPR: 8<------------------------------------------------------------ _ppr: make object! [ linelimit: 72 textbuffer: copy "" delimiter: "" linebuffer: copy "" linelength?: func [] [length? linebuffer] pad: func [] [ either all [ 0 < linelength? #" " <> last linebuffer ][" "][""] ] reset: func [] [textbuffer: copy linebuffer: copy delimiter: ""] roomfor?: func [s [string!]] [ (length? s) + linelength? + (length? pad) <= linelimit ] endline: func [] [ if 0 < linelength? [ append textbuffer join delimiter linebuffer delimiter: newline linebuffer: copy "" ] ] output: func [s [string!] inset [integer!] /left] [ inset: inset * 4 either left [ if inset < linelength? [endline] ][ append linebuffer pad ] while [inset > linelength?] [append linebuffer " "] append linebuffer s ] ppcontent: func [ arg [block!] indent [integer!] /local moldeditem firstitem wasblock wassetword ][ firstitem: true wasblock: false wassetword: false foreach item arg [ either roomfor? moldeditem: mold :item [ either any [ firstitem all [wasblock not block? :item] all [set-word? :item not wassetword] ][ output/left moldeditem indent ][ output moldeditem indent ] ][ either block? :item [ ppblock item indent ][ output/left moldeditem indent ] ] firstitem: false wasblock: block? :item wassetword: set-word? :item ] ] ppblock: func [arg [block!] indent [integer!] /local moldedarg] [ either roomfor? moldedarg: mold arg [ output moldedarg indent ][ output "[" indent ppcontent arg indent + 1 output/left "]" indent ] ] cleanblock: func [arg [block! string!]] [load trim/lines arg] run: func [arg [block! word! file! object!]] [ reset if word? arg [ output join to-string arg ":" 0 arg: get arg ] switch/default to-string type? :arg [ "file" [ppblock cleanblock read arg 0] "block" [ppblock cleanblock mold arg 0] "object" [ppcontent cleanblock mold :arg 0] "function" [ppcontent cleanblock mold :arg 0] ][ output mold :arg 0 ] endline textbuffer ] ] ppr: func [arg [block! word! file! object!]] [_ppr/run arg] 8<------------------------------------------------------------ Enjoy! -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[4/4] from: greggirwin:mindspring at: 30-Jan-2002 11:22

Very cool Joel! Thanks for posting that! --Gregg