Mailing List Archive: Re: to-char

[REBOL] Re: to-char

From: joel:neely:fedex at: 14-Feb-2002 9:08


Hi, Jason, Carl, and all...

Sorry that the following is so long.  I don't have time to
write it in fewer words.  English, combined with a bag of
prior experience with other programming languages, is really
a *TERRIBLE* medium for explaining REBOL!!!  ;-)

As always, I'll be happy for any corrections or useful
revisions of this discussion.

Carl Read wrote:
> > So now let me ask an even more basic [FA] question:
>
> A question for you - what's "FA"? (:
>

FA as in FAQ

> > What is happening, what is the real meaning of
>
> > someblock: []
>
> An empty block is created, as is the word 'someblock, which
> points to (references) the block.
>

I must respectfully disagree.  The issue of when values are
created is an entirely different discussion than the issue
of what happens when those values are evaluated.  There is
not much documentation that separates out these issues, so
this puzzle comes up in almost every REBOL programmer's
path to enREBOLment.

As I understand it, when REBOL *loads* a string of the form

    foo: <value>

(where <value> is a single value such as a number, string, or
block) it creates an internal REBOL structure (which can serve
as both code and data, depending on how it is subsequently
used).  In that structure there is a distinct REBOL value for
each syntactical element in the source string.  However, to
know some of the details, we have to know where that string
came from.

Console input to the interpreter is submitted to a "load-and-do"
cycle which takes the input string, loads (translates) it into
a REBOL structure, then DOes that structure.  Starting with a
fresh REBOL process, we can model that process as follows:

    REBOL/View 1.2.1.3.1 21-Jun-2001
    ... more verbiage suppressed ...
    *** Obtain REBOL/View/Pro from http://www.rebol.com/view-sales.html
    >> print foo
    ** Script Error: foo has no value
    ** Near: print foo

(Just to show that REBOL has no preconceived notion of FOO...)

    >> console-input-1: "foo: []"
    == "foo: []"
    >> length? console-input-1
    == 7

The console input is a string of 7 characters.

    >> console-struct-1: load console-input-1
    == [foo: []
    ]
    >> length? console-struct-1
    == 2

LOADing the console input produces a block containing two values.

    >> type? console-struct-1/1
    == set-word!
    >> type? console-struct-1/2
    == block!

LOAD created a SET-WORD! value from the string "foo:" and created a
block from the string "[]" and put those two values into a new block.
The (empty at this point) block doesn't have any more baggage, but
there's a hidden issue with the SET-WORD! value.

Each REBOL word belongs to a context (known in other languages as an
environment
 or other terms even less useful to us right now! ;-)
I think of an environment as a dictionary that pairs the name of a
word with a value (reference for some types), but that pairing is
relevant only within that context.  (If that last phrase is unclear,
hang on; we'll try to shed more light on it Real Soon Now.)  I think
of the internal representation of a word as "containing" a reference
to the string that is its name and a reference to its context.

NB:  THAT IS A DESCRIPTIVE MENTAL MODEL.  THERE ARE MANY WAYS THAT
THIS COULD ACTUALLY BE IMPLEMENTED; I DON'T KNOW WHICH OF THEM
IS/ARE ACTUALLY USED IN THE VARIOUS FLAVORS OF REBOL.

In order to create the SET-WORD! value for "foo:", it must have a
context.  For the above case, since there's nothing to specify
otherwise, that will be the global context.  So a word value is
created with a name of "foo" and a reference to the global context;
a new definition is added to the global context containing a word
name of "foo" but with no associated value at this point.
(AGAIN, THINK OF THIS AS METAPHORICAL.)

Finally (whew! this guy is long-winded! ;-) we're ready to talk
about the DO step.

    >> do console-struct-1
    == []
    >> print foo

    >> print mold foo
    []

DOing a block requires evaluating each value within the block.
When evaluating a SET-WORD! the interpreter does something like
the following (AGAIN, METAPHORICAL):

1)  put the word in question "on hold";
2)  evaluate the following value/expression (let's not go into
    that too much right now) which in this case is a reference
    to a block;
3)  evaluating a block (NOT the same as DOing the block!) simply
    yields a reference to that block;
4)  take the word left on hold in (1) and find its context;
5)  within that specific context/dictionary, alter the value slot
    associated with that word, so that now the value slot contains
    the value (or reference to ... yadda yadda) produced in (3);
6)  in addition, the value from (3) now serves as the value for
    this entire process (in case this evaluation occurred within
    a larger evaluation -- the issue we skipped over in (2)).

At this point, the interpreter would be able throw away a string
typed into the console, and the block created from that string,
since there are no surviving references to them.

In the case of our little modeling exercise, that doesn't happen
because we actually have words that are set to the string and
block we're playing with.  We'll keep them around for a little
longer to make a point.

Let's model the load-and-do cycle on another string (this time
without so much verbiage):

    >> console-input-2: "oof: foo"
    == "oof: foo"
    >> console-struct-2: load console-input-2
    == [oof: foo
    ]
    >> do console-struct-2
    == []

There's a new word in the global context now.  Its value (in that
context) is set to refer to the *same* block that (global) FOO is
set to.  Let's keep modeling the load-and-do cycle...

    >> console-input-3: "append oof 1"
    == "append oof 1"
    >> console-struct-3: load console-input-3
    == [append oof 1
    ]
    >> do console-struct-3
    == [1]

I'm sure we can all describe that one, and would anticipate the
result of cheating and looking at the actual words we're playing
with in our model:

    >> mold foo
    == "[1]"
    >> mold oof
    == "[1]"

However, ("Finally!" you're probably thinking ;-) now I can get
to my first punch line.  Let's go back and look at our input
strings, and then look at the structures that represent those
strings in REBOL internal form (with MOLDing and some added
whitespace for clarity):

    >> foreach thing reduce [
    [    console-input-1 console-input-2 console-input-3
    [    ][print mold thing]

    "foo: []"

    "oof: foo"

    "append oof 1"

    >> foreach thing reduce [
    [    console-struct-1 console-struct-2 console-struct-3
    [    ][print mold thing]

    [foo: [1]
    ]

    [oof: foo
    ]

    [append oof 1
    ]

What's with the value of CONSOLE-STRUCT-1???  Remember that it
originally contained two values -- a set-word! and an empty
block (created empty at the time that CONSOLE-INPUT-1 was
LOADed).

And that's the key.

DOing CONSOLE-STRUCT-1 didn't *create* the set-word nor the
empty block. They were created when CONSOLE-INPUT-1 was LOADed.
All that happened when CONSOLE-STRUCT-1 was DOne was that the
value (in the global context) for FOO was set to (a reference
to) the block which *already* existed and was referred to in
the second position of CONSOLE-STRUCT-1.

DOing CONSOLE-STRUCT-2 set (global) OOF (created when we LOADed
CONSOLE-INPUT-2) to refer to that same block (still empty at
that time).

At that point there were three references to that block:  the
original reference in CONSOLE-STRUCT-1, in the global context
for FOO, and in the global context for OOF.

The first of those only remained in existence because of our
modeling; if we simply typed the console input strings in at
the prompt, the older strings and blocks would have already
gone back to the recycling plant as soon as the next input
was typed.

Just to prove that these are three references to the same
block, let's cheat on our model.  We'll set OOF directly and
see the consequences.

    >> oof: "no block here!"
    == "no block here!"

    >> foreach thing reduce [
    [    console-struct-1 console-struct-2 console-struct-3
    [    ][print mold thing]

    [foo: [1]
    ]

    [oof: foo
    ]

    [append oof 1
    ]

Setting OOF to a new string (created when this new console
input was loaded -- outside our model), we simply change the
global dictionary definition for OOF to something else.  We
haven't altered the value with which OOF was associated
before that point.

    >> oof: append foo "I'm back!"
    == [1 "I'm back!"]

Now we've reSET the value of OOF to the value of an expression
that *also* mutates the value to which FOO is set.  Therefore,
we now see the effect of that mutation through all references
to that same value:

    >> foreach thing reduce [
    [    console-struct-1 console-struct-2 console-struct-3
    [    ][print mold thing]

    [foo: [1 "I'm back!"]
    ]

    [oof: foo
    ]

    [append oof 1
    ]

With all of that in place, let's fast-forward to Carl's comments
on functions:

> More on functions:  A series is created when the function is
> first created and not each time the function is called, which is
> why what's in a series will persist from function-call to
> function-call unless you specifically clear it or make a copy of
> it.  Whether to use 'copy or 'clear (or neither for that matter)
> will depend on the behaviour you want from the series.
>

I agree 100% with what Carl meant, but -- with apologies -- let me
try to reword a little bit by continuing our modeling exercise.

    >> console-input-4: "trick: func [/local foo] [foo: append [] 1]"
    == "trick: func [/local foo] [foo: append [] 1]"
    >> console-struct-4: load console-input-4
    == [trick: func [/local foo] [foo: append [] 1]
    ]
    >> do console-struct-4

Now there's a global word TRICK which is set to a FUNCTION! value.
The SECOND part of a FUNCTION! value is a block -- the "body" of
the function.

    >> second :trick
    == [foo: append [] 1]

When was that FUNCTION! value created?  When CONSOLE-STRUCT-4 was
DOne.  When FUNC is applied to two blocks, it constructs a new
FUNCTION! value with a process something like this:

1)  create a new (empty) context;
2)  add to that context every argument and refinement in the
    first block given to FUNC;
3)  make a deep copy of the second block offered to FUNC, but
    whenever a word appears in that copy that is also in the
    first block, change the context of the word IN THE COPY to
    be the context created in (2);
4)  create a new FUNCTION! value that is based on the results
    of (2) and (3), and return that FUNCTION! as the value of
    (this invocation of) FUNC to whatever caused FUNC to be
    invoked.

The third element in the body of TRICK is (at this moment!) an
empty block.  It is there because LOAD created an empty block as
the third element of the fourth element of CONSOLE-STRUCT-4, and
then FUNC copied that empty block at (3) to create the third
element of the block that serves as the body of the FUNCTION!
created in (4).

So, at this moment, the third element of the body of TRICK is
an empty block created by copying an empty block created by
LOADing a string that contained -- in part -- a #"[" followed
by a #"]".  To save me some typing and you some reading, let's
call that block ~HERBIE~ (the weird punctuation is to remind
us that this is only our conversational name for something,
it is not REBOL terminology nor notation).

Now when we evaluate the (global) TRICK, we get the same behavior
that we were discussing earlier:

    >> trick
    == [1]
    >> trick
    == [1 1]
    >> trick
    == [1 1 1]

The reason is that -- in the body of TRICK -- the (local to
TRICK) word FOO is set to the value of an expression that
mutates its first argument, which is ~HERBIE~.  ~HERBIE~ started
off empty when the function was created.  Each time that the
function is evaluated, the (local to TRICK) word FOO is set
to the value of an expression that modified ~HERBIE~.  Since
the third element of TRICK's body is a reference to ~HERBIE~,
we will see the effects of those mutations when we look at
TRICK's body.

    >> second :trick
    == [foo: append [1 1 1] 1]

Since TRICK's body was created by sort-of-copying a block that's
still in CONSOLE-STRUCT-4, we *will*not* see the effects of the
mutations there.

    >> console-struct-4
    == [trick: func [/local foo] [foo: append [] 1]
   ]

Since the first element of TRICK's body is a word that has a
different context than the global one, all of this SETting of
that word has no effect on the global FOO as seen below:

    >> foo
    == [1 "I'm back!"]

We can "tunnel" into that the body of TRICK and see the value
of TRICK's local FOO as follows:

    >> get first second :trick
    == [1 1 1]

As with our console input modeling above, there's still a chain
of references to ~HERBIE~ so the value of ~HERBIE~ persists.
And, since the body of TRICK is just a block, we can do block
operations on it:

    >> poke second :trick 3 [2]
    == [foo: append [2] 1]

Now the body of FOO no longer contains a reference to ~HERBIE~
because I POKEd a different value into the place where that
reference to ~HERBIE~ used to be.

However, there's still another reference to ~HERBIE~

    >> get first second :trick
    == [1 1 1]

But look what happens when I pull another TRICK ...

    >> trick
    == [2 1]
    >> second :trick
    == [foo: append [2 1] 1]
    >> get first second :trick
    == [2 1]

I've killed ~HERBIE~ !!!  (Good thing he was only a virtual name,
or I'd be arrested!  ;-)

Now both the (local to TRICK) word FOO and the third element of
TRICK's body refer to a new block.  OBTW, that block was
created when I typed the string "poke second :trick 3 [2]" into
the console and REBOL LOADed it.  It was then mutated when the
body of TRICK was evaluated.

If you've read this far, you deserve an Olympic medal for the
marathon!!!

The reason for using COPY in front of a "literal" block inside
a REBOL function would be to prevent mutations through a
reference to that block from persisting -- i.e. you want a fresh
instance of that block's content every time.

I know that the above discussion was painful and laborious to
read, but I hope it makes clear that there's already some COPYing
going on.  Knowing *when* the COPYing happens and *what* values
are COPYed makes a lot of difference IMHO.

As for CLEAR, it simple discards the content of a series, but
doesn't replace the series itself.

    >> foo
    == [1 "I'm back!"]
    >> oof
    == [1 "I'm back!"]
    >> clear foo
    == []
    >> oof
    == []

so that OOF and FOO still both refer to the same series, its just
that a (particularly severe!) mutation to that series's value
occurred.

The consequence of THAT fact is illustrated with these two little
tweedles
:

    >> dee: func [/local foo] [foo: append copy [] 1]
    >> dum: func [/local foo] [foo: append clear [] 1]

I hope that I've belabored my model to the point that all of the
following now make sense to you, most admirable and persistent
reader!

    >> a: dee
    == [1]
    >> b: dee
    == [1]
    >> c: dum
    == [1]
    >> d: dum
    == [1]
    >> append a 2
    == [1 2]
    >> b
    == [1]
    >> second :dee
    == [foo: append copy [] 1]
    >> append c 2
    == [1 2]
    >> d
    == [1 2]
    >> second :dum
    == [foo: append clear [1 2] 1]
    >> e: dee
    == [1]
    >> f: dum
    == [1]
    >> second :dum
    == [foo: append clear [1] 1]

Or, if you'll pardon the pun, I hope that it's all CLEAR now!

-jn-

--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;