Mailing List Archive: Re: On mutability and sameness

[REBOL] Re: On mutability and sameness

From: joel:neely:fedex at: 15-Jun-2001 2:02


Hi, again, briefly (long day today...)

Ladislav Mecir wrote:
> > 1)  A question:  Given the transcript below:
> >
> >     >> block-o-words: to-block "pie"    == [pie]
> >     >> pie: "apple"                     == "apple"
> >     >> append block-o-words [pie]       == [pie pie]
> >     >> do func [n /local pie] [
> >     [    pie: n
> >     [    append block-o-words [pie]
> >     [    ] 17                           == [pie pie pie]
> >
> >     >> get second block-o-words         == "apple"
> >     >> get third block-o-words          == 17
> >
> >     >> print block-o-words
> >     ** Script Error: pie is not defined in this context
> >     ** Where: halt-view
> >     ** Near: pie pie pie
> >
> > would you say that BLOCK-O-WORDS contains three words whose
> > names are all spelled the same, or would you say that
> > BLOCK-O-WORDS contains three occurrences of a word with a
> > different context attribute on each?
> >
>
> ... For me ... the words aren't the same. I would rather say
> that BLOCK-O-WORDS contains three words whose "names" ...
> are equal.
>

So would I.  In that case, I'd say that

    word-root: "pi"
    block: to block! word-root ; == [pi]
    bind block 'system         ; == [pi]

is mutating BLOCK by replacing the word which is its only
element with a different word.

> > On the other hand, the REBOL documentation clearly states
> > that (for example) TIME! values have HOUR, MINUTE, and
> > SECOND components.
>
> where?
>

REBOL/Core User Guide Version 2.3, page 2-2 and pages A-71
through A-76 (although RCUG uses the term "fields" instead
of the term "components", I assume we agree that the meaning
is the same).

> Now I think I know what you are after. The central notion of
> your approach is SHARABILITY, while the central notion of my
> description is MUTABILITY.
>

The central notion of my approach is that sharability and
mutability are distinct issues.  (Although I can't think of
any sharable immutable types in REBOL, the concept is well-
established in other languages, e.g. the Java string type.)

>From my perspective, the central distinction is that your
approach integrates those two issues.

> The interesting thing is, that there is a simple way how to
> translate between them. Where is the difference?
>

I must confess I didn't follow all of the description of how
you perceived the differences in our perspectives.  Let me
try to describe it the way I see it, and tell me if this
makes sense to you.

I believe that you and I are building models to answer the
same puzzle:  how do we explain what is going on with these
similar-looking cases?

    >> b1: [23 34 45]    == [23 34 45]
    >> b2: b1            == [23 34 45]
    >> b3: [23 34 45]    == [23 34 45]
    >> same? b1 b2       == true
    >> same? b1 b3       == false
    >> equal? b1 b2      == true
    >> equal? b1 b3      == true

    >> b1/1: 22          == [22 34 45]

    >> b1                == [22 34 45]
    >> b2                == [22 34 45]
    >> b3                == [23 34 45]

    ; Question 1: Why did the value for B2 change?

    >> same? b1 b2       == true
    >> same? b1 b3       == false
    >> equal? b1 b3      == false

    >> t1: 23:34:45    == 23:34:45
    >> t2: t1          == 23:34:45
    >> t3: 23:34:45    == 23:34:45
    >> same? t1 t2     == true
    >> same? t1 t3     == true

    ; Question 2: Why does SAME? T1 T3 evaluate to TRUE ?

    >> t1/hour: 22     == 22

    ; Question 3: What just happened?

    >> t1              == 22:34:45
    >> t2              == 23:34:45
    >> t3              == 23:34:45

    ; Question 4: Why didn't the value of T2 change?

    >> same? t1 t2     == false
    >> same? t1 t3     == false
    >> same? t2 t3     == true

    ; Question 5: Why are T1 and T2 now not SAME? when
    ;             formerly they were?

    >> t4: to-time reduce [2 * 11 + 1  3 * 11 + 1  4 * 11 + 1]
                       == 23:34:45
    >> same? t3 t4     == true

    ; Question 6: How did REBOL figure out that T3 and T4
    ;             are the SAME? value?

The differences in our models arise from the difference in
what each of us assumes as a basis, and what each of us
consequently must explain in a less intuitive fashion.

MY MODEL:

a)  All set-path expressions cause something to be mutated.
    The "something" is identified by the set-path up to the
last #"/" and which part of that "something" is identified by
what follows the last #"/".  Any such "something" is mutable.

b)  A scalar (Holger's "simple") value is stored directly,
    represented by the data value itself.  A reference value
is stored indirectly, with a reference to the data which is
stored elsewhere.  This is true whether we're talking about
a value "in" a variable or "in" a data structure (block,
object, etc.)

c)  Since a reference can be duplicated without duplicating
    whatever it refers to, reference values can be shared.
Since there are no references for scalar values, they cannot
be shared.

d)  Independently-constructed reference values (blocks, strings,
    etc.) may be EQUAL? in that they have equivalent content,
but they are not the SAME? values.  When a block is constructed,
REBOL doesn't have to search all of memory to see whether there
is already a block in existence with that content; it can just
build it.  Therefore, EQUAL? and SAME? are not identical.

Therefore, I have to answer the questions in the puzzle above
in the following way:

1)  The value for B2 changed because both B1 and B2 refer to
    the same data value (block).  Using either reference to
modify the data creates a change visible through both.

2)  The SAME? test for reference types compares the references
    (an identity test); the SAME? test for scalar types compares
the data values (an equivalence test; there are no references).
Therefore, for scalar types, SAME? and EQUAL? behave
identically.

3)  The HOUR component of the data value for T1 was altered.

4)  The value of T2 didn't change because it is an independent
    scalar value from T1 (whose data value *was* altered).

5)  Same answer as for Question 2 -- scalar SAME? compares the
    data values, which are no longer equivalent (or EQUAL? for
that matter).

6)  Same answer as for Question 2 -- scalar SAME? compares the
    data values, regardless of how those data values were
computed.

NOTE:  It doesn't bother me that SAME? is implemented in a
different way for scalar types and reference types; after all,
EQUAL? is a simple test for INTEGER! values, a fairly simple
loop for STRING! values, and a much more complex (potentially
recursive) process for BLOCK! values.

YOUR MODEL:

It seems to me that your model starts with the assumption that
all values are sharable and that SAME? is always an identity
test.

This forces you to answer the questions as follows:

1)  The value for B2 changed because SAME? B1 B2 was TRUE, so
    whatever happens to one happens to the other.

2)  There can be only one occurrence of the TIME! value of
    23:34:45 and it is shared among all TIME! value references.
Therefore, REBOL must have somehow discovered that this value
was already in use and set T3 to refer to the same TIME! value
as T1 and T2.

3)  A new TIME! value was constructed using the value following
    the set-path for the HOUR, but using the MINUTE and SECOND
components from the TIME! referred to by T1.  Then T1 was set
to refer to that newly-constructed value.  In other words,

    t1/hour: 22

must be understood as an abbreviation of

    t1: make time! reduce [22 t1/minute t1/second]

In fact, all set-path expressions for "simple" types must be
understood as analogous abbreviations to the above (although
the retained/replaced content will vary with type.  This is
different from set-paths for non-"simple" types, which may
still be understood as changing only a part of an unreplaced
whole.

4)  Because the value of T1 was *replaced* and not *modified*.

5)  Because of the answers to (3) and (4); although T1 and T2
    were previously sharing a value, they no longer do so
because T1's value was replaced.

6)  Some unspecified mechanism must be used to search all data
    (or at least all existing TIME! values) in memory to see if
the new value for T4 already exists.  Since it does, and is
referred to by T3, T4 can be set to refer to that same value,
which T3 and T4 now share.

The reason I prefer the first model to the second is that it
offers what seems to me to be simpler answers to the six
questions posed in the puzzle.  (And please feel free to
correct me if I have incorrectly or unfairly speculated about
any of your answers!)

> What about the BM dialect bm [a/0: 1]?
>

What about it?  ;-)

You've managed to construct a clever function (BM) that
implements a dialect that does something different with
set-path expressions that REBOL does.  I'm trying to build
a model for what REBOL does, not for how it might be extended
with a dialect to do something else/new.

> According to that, the time values aren't sharable,
> otherwise we would have seen the mutation when we looked at
> 'b. This leads to another conclusion: SAME? lies a bit, when
> it says we are sharing values. We are not, we have copies
> sometimes. There is a problem: what does SAME? say then?
>

Your statement seems to summarize our differences.

I do not believe that SAME? lies.  Think of SAME? as meaning
guaranteed equal
.

I believe that SAME? tells us whether two values are so much
alike that they may be freely interchanged for all practical
purposes.  I believe that, for instance, any occurrence of 2
is totally equivalent to any other occurrence of 2, even if one
were calculated from 1 + 1 and the other were calcuated from
(29 - 21) / 4 or any other such expressions you can imagine.
And I don't have to believe that there can only exist one
occurrence of 2 in order to believe their equivalence.

On the other hand, two strings that both currently contain
the message "Hi" may not be totally interchangeable for all
purposes, if there is a way I can modify the second letter of
one of them to #"a" without doing so to the other.  Since
strings are mutable, this is possible (unless both are actually
references to the same underlying sequence of characters).

I do believe that SAME? is implemented differently for scalar
values than for reference values, but I think that difference
is both reasonable and explainable.

But please play along with me for a moment...

EXPERIMENT:

Whether or not you accept that REBOL has some values that are
not "shareable", I trust that you'll agree with me that it is
conceivable that *some* languages have non-sharable types.  It
would be A Good Thing to have a function which tells us whether
two sharable values are in fact sharing the underlying data.

So...  Imagine we're designing such a language.  What should
our SAME-DATA? function do if given two non-sharable values?
I can only think of three alternatives:

A)  Throw an error.  This seems slightly hostile; we'd like our
    functions to return something reasonable if at all possible.

b)  Be strict and return FALSE, since non-sharable values by
    definition can't be shared.  This is a defensible position
but somewhat hard-nosed.  We might really only need a test of
equivalence, in which case we'd have to choose between using
SAME-DATA? and EQUAL? depending on the types of the values we
want to compare.

c)  Be generous and test EQUAL?, since non-sharable, immutable
    values are indistinguishable (even if they are distinct
copies of equal data).

I believe REBOL does (c), which has as its only tricky exception
the fact that -- with non-sharable MUTABLE values -- two values
will stop being SAME? if one of them is mutated.

SIDE REMARK:

I also believe that our preconceived notions about the English
word "same" are contributing to the confusion.  In English, the
words "equal", "equivalent", "same", and "identical" are often
used with highly overlapping (but not exactly identical)
meanings.  This simply highlights the grave pitfalls that
surround any attempt to use natural languages as a basis for
user-friendly
 languages for "non-programmers".

Natural languages are fuzzy and ambiguous, and are great for
poetry, flirting, and jokes.  They are also notoriously bad for
precise specification (such as treaties, contracts, and program
specs).

Programming languages must be precise and unabiguous.

END OF SIDE REMARK

I hope this hasn't been too tedious for you...  It certainly
forced me to think hard about what (I think ;-) I understand
and how to verbalize it.  Thanks for all your hard work!

-jn-

------------------------------------------------------------
Programming languages: compact, powerful, simple ...
 Pick any two!
              joel'dot'neely'at'fedex'dot'com