Rugby source code cleaner? :-)
[1/4] from: petr:krenzelok:trz:cz at: 25-Jan-2002 7:46
Hi,
I know it is very sensitive topic, but could anyone write source code
cleaner for Rugby to have standard Rebol code output as suggested by RT?
:-)
-pekr-
[2/4] from: joel:neely:fedex at: 28-Jan-2002 8:10
Hi, Petr,
WARNING: My remarks are not intended as complaints, criticisms,
nor negativity, but simply as thinking out loud about how the
unique nature of REBOL means that things we are used to doing with
other languages may not apply. However, experience indicates that
some folks may take offense.
None intended.
Petr Krenzelok wrote:
> Hi,
>
> I know it is very sensitive topic, but could anyone write source
> code cleaner for Rugby to have standard Rebol code output as
> suggested by RT?
>
One of the factors which drove me to think about alternative styles
was the difficulty (i.e. impossibility) of algorithmically laying
out REBOL code in the general case. (Not that The Style That Must
Not Be Mentioned solved that problem by any stretch of the
imagination!)
To show why I believe that to be the case, below are the guidelines
from the RCUG on the RT web site, along with my observations. I've
re-ordered them to get the easy ones out of the way first.
The lexical issues are easiest to handle.
5.1.1. Indent Content for Clarity [beginning]
The contents of a block are indented, but the block's
enclosing square brackets [ ] are not. That's because
the square brackets belong to the prior level of syntax,
as they define the block but are not contents of the
block. Also, it's easier to spot breaks between adjacent
blocks when the brackets stand out.
Tracking nesting of/within square brackets (including the presence
of strings delimited by braces and quotes) is a SMOP.
5.1.2. Standard Tab Size
REBOL standard tab size is four spaces. Because people
use different editors and readers for scripts, you can
elect to use spaces rather than tabs.
5.1.3. Detab Before Posting
The tab character (ASCII 9) does not indent four spaces
in many viewers, browsers, or shells, so use an editor
or REBOL to detab a script before publishing it to the net.
Once the indentation issues are addressed, using spaces instead of
tabs is trivial. I find the you-can vs. you-must of the above two
points interesting.
5.1.4. Limit Line Lengths to 80 Characters
For ease of reading and portability among editors and
email readers, limit lines to 80 characters.
This one looks easy, but begs a crucial question: how do we know
*where* within a longer line one should insert the break(s) to
conform to the 80-character (including indentation) length limit?
That takes us from the lexical level directly to the semantic
level, which is where life gets "interesting"...
5.1.1. Indent Content for Clarity [continued]
Where possible, an opening square bracket remains
on the line with its associated expression. The
closing bracket can be followed by more expressions
of that same level. These same rules apply equally
to parenthesis ( ) and braces { }.
An exception is made for expressions that normally
belong on a single line, but extend to multiple lines: ...
This also applies to grouped values that belong
together, but must be wrapped to fit on the line: ...
I don't know of any syntactic definition of "expression" for REBOL.
This is both due to the "syntax-free" nature of REBOL source code,
and the dynamic/latent/weak typing (use whichever buzzword you wish)
which by-and-large leaves data type issues until evaluation time.
With many other languages, one can look at a string of source text
in complete isolation and determine where the "expressions" and
statements
are, based on such things as keywords, required
punctuation, syntax and precedence for various built-in operators,
etc. Since REBOL has none of those things, all of the clues for
doing the analysis at a strictly lexical level are gone. The only
way I know of to define "expression" in REBOL is to understand the
meaning
of code, which can only be done at a given point in run
time and with knowledge of the entire set of words defined at that
point.
The same, of course, applies to the notions of whether things
belong on a single line
and how one can find "grouped values
that belong together" within an arbitrary piece of source text.
How many "expressions" are on the following line?
a b c 1 d [e f g] 2 h
Of course, the answer depends on the definitions of A thru H at
the/each moment that we try to answer the question! It is easy
(e.g., by using longer words or adding more values in the same
vein) to make the above line more than 80 characters long. In
that case, the question of how/where to break it for meaningful
wrapping can only be answered if we know what meanings we're
dealing with.
In the cases of source text which contains
- references to other source text not present
(e.g. DO %SOMEFILE to pull in "library" code)
- multiple definitions for words
- functions which take functions as arguments
we (humans) may be unable to guess such basic details as which
words are functions and which are not, and how many arguments
a function takes.
Even if we restrict ourselves to the notion of just writing
expressions in "stock" REBOL without any user-defined words,
the only general solutions I've been able to think of are:
- building a gigantic case-driven syntax analyzer which uses
knowledge of the argument structure of every built-in word
of REBOL, or
- building a very intelligent analyzer which uses the current
content of the REBOL word list to look up and interpret the
"expressional" requirements/properties of every word/token
in the source string being analyzed.
Neither of these techniques survives in the face of source
code in which the user is allowed to define new words.
Of course, if I've overlooked something obvious, I'd be glad
to know what it is!
As I said at the beginning, this is not a criticism; it is just
my attempt to describe the state of affairs as I see them. To
me, the punch-line is that REBOL (with respect to the issue
we're talking about) resembles more closely a human language than
a traditional programming language.
AI researchers have been working for years (essentially as long
as there have been computers!) on the question of how to analyze,
recognize, interpret, and act the meaning of human language.
Some remarkable things have been done, but they typically require:
- lots of computing horsepower,
- highly specialized programming skills,
- a human to intervene at some point.
Consider the old classic example:
Time flies like an arrow.
which can be parsed as:
- a command to determine the flight speed of arrow-shaped
household insects,
- a statement about the dietary preference of insects who
live in clocks at archery ranges, or
- a philosophical musing on how swiftly time moves.
OBTW, this last one is by far the most complex, since speed is
calculated as distance divided by time; therefore the notion of
the "speed of time" is one which is totally outside the realm
of computation!
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[3/4] from: joel:neely:fedex at: 30-Jan-2002 9:04
Hello, all,
Joel Neely wrote:
> One of the factors which drove me to think about alternative styles
> was the difficulty (i.e. impossibility) of algorithmically laying
> out REBOL code in the general case. (Not that The Style That Must
> Not Be Mentioned solved that problem by any stretch of the
> imagination!)
>
Motivated to see how far I could get with purely lexical/syntactic
concepts (and hoping someone else could benefit/contribute), I
hacked up a "pretty-printer" for REBOL structures. At this point
it only handles blocks, objects, functions, and files (assumed to
contain REBOL source), or words that are set to one of those types.
PPR assumes that all whitespace (non-string, of course!) in an
input file or block is ignorable -- after all, PPR is supposed to
be doing the layout, and should assume that any pre-existing style
in the source is obsolete! ;-)
If PPR has any value (other than as a stimulus for further ideas),
it would be as a "pre-pass" over code to perform basic structural
layout. Real-world use would still require a human to go back and
insert optional whitespace (horizontal and vertical) as desired to
supply semantically-oriented grouping/separation, and to fix the
mangled comments (see below).
PPR attempts to implement the following rules, which I believe to be
consistent with the RCUG style rules:
- Use indentation (4 columns per level) to show nested structure.
- Limit line length to 72 characters.
- If the representation of a value will fit entirely on the
on the current line, put it there.
- If it will not fit on the current line, and it's not a block,
start a new line (appropriately indented).
- If it won't fit, and it's a block, indent the contents of the
block; the opening bracket goes at the end of the previous
line, and the closing bracket begins a new line resuming the
previous indentation level.
- Multiply-nested blocks can begin on the same line (as in the
previous point) but their closing brackets will be on distinct
lines at the appropriate "outdent".
Additional heuristics are:
- The result of PPR is a string (allowing it to be saved to a
file for source cleanup). To simply see the output for testing
purposes, PRINT PPR ... is the simplest thing to do.
- If the argument to PPR is a word , the output resembles the
setting of that word to a pretty-printed presentation of its
value (see type limitations above!)
- If the argument is a file, the content of the file is presented
as if LOADed instead of DOne.
Since REBOL doesn't recognize the distinction between data and code,
no attempt was made to guess the meaning/usage of the content of a
block, with one exception:
- A new line is begun (at the appropriate level of indentation)
whenever a set-word value (or sequence of consecutive set-word
values) is encountered in a block which will not fit on a single
line. This has the advantage of making objects appear in an
intuitively obvious form (and function bodies, as well) but has
the disadvantage that set-words embedded within larger running
expressions don't appear as subordinate.
In general, no attempt is made to fathom/guess the meaning/intent
of a block, so the semantic concepts of "expression" and "belonging
together" are essentially absent.
One additional limitation: comments in source files are hopelessly
mangled. The use of
comment { ... }
will allow the output of PPR to be valid for re-loading, but all
layout is lost. OTOH, the use of ;-style comments will likely
produce text that will choke REBOL upon reloading unless manual
intervention occurs.
This email is already too long, but as a quick demo (for those who
may not want to bother trying it out themselves), here's a source
file which contains three copies of the same function with various
layouts (RCUG, Style R, and Obfuscated):
8<------------------------------------------------------------
REBOL []
bignum: func [/local a aa b c d] [
a: aa: 1.0
until [
aa: a
b: 1.0 c: 1.0
until [
any [
error? try [
d: a + b
if d > a [print [d b c]]
a: d
]
error? try [
c: c + c
b: c
]
]
]
print a
a = aa
]
print a
]
bignum2: func
[ /local a aa b c d
][ a: aa: 1.0
until
[ aa: a b: 1.0 c: 1.0
until
[ any
[ error? try
[ d: a + b if d > a [print [d b c]] a: d]
error? try
[ c: c + c b: c]
] ]
print a a = aa
]
print a
]
bignum3: func[/local a aa b c d][a: aa: 1.0 until[aa: a b: 1.0 c:
1.0 until[any[error? try[d: a + b if d > a[print[d b c]]a: d]error?
try[c: c + c b: c]]]print a a = aa]print a]
8<------------------------------------------------------------
And here's how PPR renders that file:
8<------------------------------------------------------------
>> print ppr %/a/bignum.r
[ bignum: func [/local a aa b c d] [
a: aa: 1 until [
aa: a
b: 1
c: 1 until [
any [
error? try [d: a + b if d > a [print [d b c]] a: d]
error? try [c: c + c b: c]
]
]
print a a = aa
]
print a
]
bignum2: func [/local a aa b c d] [
a: aa: 1 until [
aa: a
b: 1
c: 1 until [
any [
error? try [d: a + b if d > a [print [d b c]] a: d]
error? try [c: c + c b: c]
]
]
print a a = aa
]
print a
]
bignum3: func [/local a aa b c d] [
a: aa: 1 until [
aa: a
b: 1
c: 1 until [
any [
error? try [d: a + b if d > a [print [d b c]] a: d]
error? try [c: c + c b: c]
]
]
print a a = aa
]
print a
]
]
8<------------------------------------------------------------
and an individual function after the file is DOne:
8<------------------------------------------------------------
>> print ppr 'bignum3
bignum3:
func [/local a aa b c d] [
a: aa: 1 until [
aa: a
b: 1
c: 1 until [
any [
error? try [d: a + b if d > a [print [d b c]] a: d]
error? try [c: c + c b: c]
]
]
print a a = aa
]
print a
]
8<------------------------------------------------------------
Without further ado, here's PPR:
8<------------------------------------------------------------
_ppr: make object! [
linelimit: 72
textbuffer: copy ""
delimiter: ""
linebuffer: copy ""
linelength?: func [] [length? linebuffer]
pad: func [] [
either all [
0 < linelength?
#" " <> last linebuffer
][" "][""]
]
reset: func [] [textbuffer: copy linebuffer: copy delimiter: ""]
roomfor?: func [s [string!]] [
(length? s) + linelength? + (length? pad) <= linelimit
]
endline: func [] [
if 0 < linelength? [
append textbuffer join delimiter linebuffer
delimiter: newline
linebuffer: copy ""
]
]
output: func [s [string!] inset [integer!] /left] [
inset: inset * 4
either left [
if inset < linelength? [endline]
][
append linebuffer pad
]
while [inset > linelength?] [append linebuffer " "]
append linebuffer s
]
ppcontent: func [
arg [block!] indent [integer!]
/local moldeditem firstitem wasblock wassetword
][
firstitem: true
wasblock: false
wassetword: false
foreach item arg [
either roomfor? moldeditem: mold :item [
either any [
firstitem
all [wasblock not block? :item]
all [set-word? :item not wassetword]
][
output/left moldeditem indent
][
output moldeditem indent
]
][
either block? :item [
ppblock item indent
][
output/left moldeditem indent
]
]
firstitem: false
wasblock: block? :item
wassetword: set-word? :item
]
]
ppblock: func [arg [block!] indent [integer!] /local moldedarg] [
either roomfor? moldedarg: mold arg [
output moldedarg indent
][
output "[" indent
ppcontent arg indent + 1
output/left "]" indent
]
]
cleanblock: func [arg [block! string!]] [load trim/lines arg]
run: func [arg [block! word! file! object!]] [
reset
if word? arg [
output join to-string arg ":" 0
arg: get arg
]
switch/default to-string type? :arg [
"file" [ppblock cleanblock read arg 0]
"block" [ppblock cleanblock mold arg 0]
"object" [ppcontent cleanblock mold :arg 0]
"function" [ppcontent cleanblock mold :arg 0]
][
output mold :arg 0
]
endline
textbuffer
]
]
ppr: func [arg [block! word! file! object!]] [_ppr/run arg]
8<------------------------------------------------------------
Enjoy!
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[4/4] from: greggirwin:mindspring at: 30-Jan-2002 11:22
Very cool Joel!
Thanks for posting that!
--Gregg