Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Problem with try [ open/direct/binary tcp://... ] Re:(3)

From: joel:neely:fedex at: 4-Oct-2000 8:38

Hello, Elan, I'd like to suggest a minor tweak in terminology that I think would help avoid confusion in other contexts than this present problem. I think it is in harmony with your fundamental point, but extends it to cover a bit more territory. Anyone who doesn't want the long discussion is invited to skip ahead to the heading "THE TROUBLE WITH STRINGS!!!" -- or even the heading THE PUNCH LINE!!! , where there are some specific questions about type conversion confusion. -jn- [elan--loop--com] wrote:
[...problem-specific background and examples snipped...
> Nevertheless the results are surprising in some instances. > That is because one would expect that REBOL uses a SYMMETRICAL > CONVERSION model, which it doesn't. > What I mean is: > > A SYMMETRICAL CONVERSION MODEL in REBOL would look something > like this: > > 1. In REBOL data always occurs in association with a datatype. > The combination of data with a datatype is a value. > > 2. CONVERSION means disassociating some data from its current > datatype and associating it with a different datatype, while > leaving the data unmodified. Its value representation changes > because it is now associated with a different datatype. >
This process -- changing the type/interpretation WITHOUT changing the internal data -- is referred to in other languages as TYPECASTING or (in the aggressive-abbreviation mentality of c) simply CASTING. I'm used to seeing the word "conversion" used in the more general sense of creating/returning a value of a different type which is "equivalent" (in a well-defined way!) to the original. This process may include the necessity of changing the representation. Consider these examples:
>> fee: {abc}
== "abc"
>> length? fee
== 3
>> fie: to-binary fee
== #{616263}
>> length? fie
== 3 Without knowing the internals of REBOL implementation, one can still safely say that it's POSSIBLE that this is simply casting, as there's no logical necessity for the data bits to change, but only the type bits.
>> foe: #"A"
== #"A"
>> length? foe
** Script Error: length? expected series argument of type: series port tuple struct. ** Where: length? foe
>> fum: to-string foe
== "A"
>> length? fum
== 1 This would be an example of conversion, where BY DEFINITION there is information required for a string (e.g., its length) that has nothing to do with the value of a single character. (And, with the phrase "by definition" I refer to the inherent concept of string, not the details of REBOL's implementation of them.)
> 3. CONVERSION INTEGRITY means that converting a value to a > different datatype and back to the original datatype must > result in the same value it originally had. The reason is that > the data was not modified during the conversion. >
This is an important point. My only suggestion is to call this TYPECASTING INTEGRITY, so that we can then let the phrase conversion integrity (or perhaps "...consistency") cover more ground. If we read the following remarks with "typecasting" substituted for "conversion", I think the point is undamaged.
> REBOL does not follow this symmetrical conversion model. > REBOL's conversion does not maintain CONVERSION INTEGRITY. > REBOL's CONVERSION implementation is not always symmetrical. > This means that in some instances REBOL does not limit > itself to modifying the datatype associated with the data that > is being converted, instead, the data itself is modified as well. > > Some of the examples you show show REBOL supporting symmetrical > conversion. For instance >
[...examples snipped...]
> 5. I think it is reasonable to expect that REBOL's conversion > routines maintain CONVERSION INTEGRITY. Unfortunately that is > not true. Examples: >
[... more examples snipped...]
> Is REBOL's current implementation of ASYMMETRICAL CONVERSION, > i.e. a conversion that permits the modification of the data > and does not limit itself to modifying the representation of > the data by associating the data with a different datatype, > more useful - albeit occasionally surprising - than a > symmetrical conversion would be? >
An an aside to this last point, let me add that inconsistency has a cost -- it requires extra effort to learn/teach/use the concepts where inconsistencies emerge. While I'm not saying that inconsistencies are automatically evil, I would suggest they are an unnecessary cost unless there's some significant benefit that offsets that cost. Now, back to the main idea... Now let's add the idea of "conversion consistency" as follows (with my numbers an extension of Elan's list): 6. I think it's reasonable to have the idea of canonical (or "standard", if one prefers) internal and external representations for each datatype in the language. The basic conversion mechanism should ensure that when a value is converted to a different type (assuming, of course, that the conversion makes logical sense!) that it is converted to the standard representation. 7. Then we can discuss the idea of "conversion consistency", in which converting a value to another type (or even through a chain of such conversions!) then back to the original type results in a value that is indistinguishable from the original. Extending the conversion example from above,
>> foe: #"A"
== #"A"
>> fum: to-string foe
== "A"
>> to-char fum
== #"A"
>> foe = to-char fum
== true
>> fum = to-string to-char fum
== true I suspect that these all make sense to us. But sometimes REBOL gives us a surprise!
>> phee: 1.1
== 1.1
>> type? phee
== decimal!
>> phie: to-money phee
== $1.10
>> type? phie
== money!
>> to-decimal phie
** Script Error: Invalid argument: $1.10. ** Where: to decimal! :value
>> make decimal! phie
** Script Error: Invalid argument: $1.10. ** Where: make decimal! phie
>> second phie
== 1.1
>> type? second phie
== decimal! The decimal/money case, to my mind, is very much like the char/string case, in that going one way we need to add some information which, going the other way, we need to strip off. But doing so should be possible, and should give us back an equivalent value. Note that there may be different options or representations for some data types, which can also surprise us:
>> phie/1: "US"
== "US"
>> phie
== US$1.10
>> phie/1: "HK"
== "HK"
>> phie
== HK$1.10
>> type? phie
== money! ...so far no astonishments (and converting any of these to decimal would seem to make sense -- but would require losing the added information), but then...
>> phie + $1.50
** Script Error: HK$1.10 not same denomination as $1.50. ** Where: phie + $1.50 ...arguably a safety measure that would prevent the same kind of bug that cost us a Mars probe, but...
>> $1.50 + phie
== $2.60
>> to-money phie
== HK$1.10 ...apparently not consistently applied (?) requiring that we now remember when addition is symmetric and when it is not!!!
>> 1 + 2.5
== 3.5
>> type? 1 + 2.5
== decimal!
>> 2.5 + 1
== 3.5
>> type? 2.5 + 1
== decimal!
>> (1 + 2.5) = (2.5 + 1_
** Syntax Error: Invalid integer -- 1_. ** Where: (line 1) (1 + 2.5) = (2.5 + 1_
>> (1 + 2.5) = (2.5 + 1)
== true
>> ($1.50 + phie) = (phie + $1.50)
** Script Error: HK$1.10 not same denomination as $1.50. ** Where: phie + $1.50 THE TROUBLE WITH STRINGS!!! Having a string! datatype offers us lots of landmines to step on. Unlike almost all other types, a string! value can actually be a value all on its own...
>> phoe: "Hello"
== "Hello"
>> phum: join phoe [", " {world} "!"]
== "Hello, world!"
>> print phum
Hello, world! ...but can also serve as the external representation for values of other datatypes...
>> a: "1.2"
== "1.2"
>> b: do a
== 1.2
>> type? a
== string!
>> type? b
== decimal!
>> c: "http://www.rebol.com"
== "http://www.rebol.com"
>> d: load c
== http://www.rebol.com
>> type? c
== string!
>> type? d
== url! This is even trickier, because when I type (pardon the pun ;-)
>> e: 1.2
== 1.2
>> f: http://www.rebol.com
== http://www.rebol.com
>> type? e
== decimal!
>> type? f
== url! I'm actually typing strings the whole time. REBOL is interpreting and converting all along the way. THE PUNCH LINE!!! Now, the payoff question is this: How do we understand conversion from one datatype (TYPE1) to another (TYPE2)? There are several options: 1) The conversion may not make sense, so we barf.
>> to-money (1 = 1)
** Script Error: Invalid argument: true. ** Where: to money! :value 2) We base our conversion on some sort of semantic equivalence, and can do an exact (invertable) equivalence.
>> to-decimal 13
== 13
>> type? to-decimal 13
== decimal! Note that in this case the external representation reinforces our notion of "equivalence" for integer! and decimal! types:
>> to-decimal 13.00000000000000000000
== 13 3) We base our conversion on some sort of partial semantic equivalence, where the notion of "the best one can do" makes sense to us.
>> 13.0 = to-integer 13.0
== true
>> to-integer 13.99
== 13
>> 13.99 = to-integer 13.99
== false Here REBOL is using "discard the fractional part" as the implicit definition of "the best one can do". Other languages use "round to the nearest" instead. Either is defensible -- as long as it is clearly stated -- but it is nice to have an option to use the non-default choice as well!
>> round: func [x [number!]] [
[ to-integer (x + [ either x < 0 [-0.5] [0.5]) [ ]
>> round .2
== 0
>> round .7
== 1
>> round -.2
== 0
>> round -.7
== -1 Finally, we have a tricky option! 4) We do our conversion in two steps: TYPE1 -> ??? -> TYPE2 where the "???" represents some other type that is implicit in the implementation but not asked for by us, and maybe not even desired!
>> to-binary 100
== #{313030}
>> to-binary 1.25
== #{312E3235} Here we can infer that the "???" is string! , which is not what we might have expected (or desired?). Admittedly there is an open question: given the varieties of sizes of integer values and the whole "endedness" question, what is the bitstring representation of one hundred? #{00000064}, #{0064}, #{64}, #{00640000} ??? Is the following REALLY independent of platform>
>> charset [#"0" - #"9"]
== make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } BUT, add a dose of inconsistency, and the gravy gets VERY lumpy:
>> to-string "hi"
== "hi"
>> to-binary to-string "hi"
== #{6869}
>> to-binary "hi"
== #{6869} ...no surprise...
>> to-string first ["hi"]
== "hi"
>> to-binary to-string first ["hi"]
== #{6869}
>> to-binary first ["hi"]
== #{6869} ...still no surprise...
>> to-string ["hi"]
== "hi"
>> to-binary to-string ["hi"]
== #{6869}
>> to-binary ["hi"]
** Script Error: Invalid argument: hi. ** Where: to binary! :value ...LUMPY GRAVY! Not only is it a big surprise that the conversion fails, the error message is highly misleading! When does it make sense for string! to be an intermediate (and implicit, at that) type in a conversion between two other types? If/when it does, which kind of string! should be used -- the internal one, or the external one? Shouldn't the rules and rationales for typecasting and conversion be spelled out in agonizing detail? (And if I've missed it, please tell me where; I'll be glad to shut up and go read for a while. ;-) -jn-