Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Mold, load and Core 2.6

 [1/7] from: holger:rebol at: 6-Mar-2002 10:34


Here are some comments regarding the recent discussion on mold and load, explanations of which of those observations represent bugs and which do not, and what is going to change in the next upcoming Core 2.6 release: First of all, the *intended* use of load and mold for data is in the following way: stored data -> load -> data in memory -> mold -> stored data. Technically, instead of "load" you should use "first load/all", because load/all is more "transparent" in that it does not try to interpret the header or remove an outermost block. If used in this way, i.e. always starting off with load, there are, as far as we are aware of, no bugs in mold or load. Sometimes people try to use load and mold in a different way: data in memory -> mold -> stored data -> load -> data in memory i.e. to serialize data. If used for serialization, mold and load have a number of limitations, just like any other serialization system in any language. Some of the limitations are unavoidable (how do you serialize an open socket connection ?), others could be removed by improvements in the implementation. We are currently aware of the following issues when using load and mold for serialization: 1. Serialization means creating literal representations of data. Unfortunately not all datatypes *have* literal representations. This leads to two types of problems: 1.a) Some values, when molded, become words, and are thus indistinguishable from regular words. For instance "mold 'none" and "mold none" both result in none , which, when loaded back, becomes the word none, not the value none. Comment: Will be addressed in Core 2.6. 1.b) Some values, when molded, become a sequence of values that represent instructions how to recreate the value. For instance molding a hash! results in something like "make hash! [1 2 3]". The problem with that is that it requires the loading script to evaluate the resulting block (which it ordinarily is not supposed to do, because other items in the block, e.g. words, cannot be evaluated, plus evaluation is a security risk if the data is untrusted). Comment: Will be addressed in Core 2.6. 2. Series indices are not included in the molded data. For instance a string series "abc" with an index of 2 becomes "bc" after mold and load. Molding drops the data before the index. Comment: Will be addressed in Core 2.6. 3. It is not possible to create an object! without performing an evaluation of the value fields, i.e. molding and loading an object! represents a security risk, even if the loader is careful and explicitly checks for the words "make" and "object!", because the spec block may contain expressions with side effects. 3.a) A related issue: Objects have to be molded in such a way that the resulting object spec block can be evaluated. This causes some problems and ambiguities, e.g. object containing lit-words or set-words as values are not correctly molded and loaded, because during evaluation such values do not behave as regular values, but have side effects. Comment: Will be addressed in Core 2.6. 4. In memory it is possible to create values of certain datatypes that include characters or expressions that are usually not valid for that particular datatype. For instance it is possible to create an email! value which does not contain an "@" sign, or an issue which contains a semicolon. mold and load do not handle this correctly, because the scanner requires certain hints to identify datatypes. Comment: Not a bug, will not be addressed. Creating datatypes with invalid contents is simply an invalid operation. REBOL allows you to do it (because type restrictions are only checked when scanning), but that does not mean that it is a valid thing to do. If you want to be able to process arbitrary data then you need to use string! and binary! only. Other string series have limitations on their structure which have to be complied with for mold and load to work. 5. Various issues regarding references (circular or otherwise). This is always difficult to handle in the context of serialization. There are three cases: 5.a) The data represents a tree, i.e. each item is referenced no more than once, and there are no cycles. For this type of data mold and load should work without problems, and this is the type of data organization recommended if data needs to be serialized. 5.b) The data represents a directed, acyclic graph, i.e. there are no cycles, but data items can be referenced more than once. For this type of data mold and load should still work, but the referenced items may be included in the molded data separately for each reference, i.e. after loading the data back the references point to separate items. 5.c) The data represents a general, directed graph, with cycles. This is strongly discouraged :-). mold and load will not work at all with this kind of data. Comment: No changes are planned regarding this, and the current behavior is not considered to be a bug. Serialization of data structures with non-tree-based references requires special serialization functions. mold and load are not suitable for this. 6. Word bindings are not preserved by mold. Comment: Not a bug. No changes are planned regarding this. It would be pretty much impossible to correctly preserve word bindings without saving the complete REBOL machine state :-). Load always binds words into the global context. As far as mold and save are concerned, the major change for Core 2.6 is the /all refinement, which makes mold and save more suitable for serialization. The /all refinement has the following effects: - (Almost) all data types are molded in a literal form. Datatypes which already have a natural literal form continue to use this form (integer, words etc.) Datatypes which so far have not had a literal form will use a new notation that acts as a pseudo-literal. This notation is "#[type! description]" or, for some datatypes, "#[value]" (without the quotes). For instance: Value-oriented pseudo-literals: true -> #[true] false -> #[false] none -> #[none] unset! -> #[unset!] Datatype-oriented pseudo-literals: object -> #[object! [a: 1 b: 2 ...]] list -> #[list! [a b c ...]] etc. When loading pseudo-literals back no special refinements have to be used with 'load. The 'load function recognizes pseudo-literals just like all other literals. - When a series with an index different than the head of the series is molded then the complete series is molded while preserving the index. To do this, the series is molded in its pseudo-literal form, with the index following the content. For instance the string "abc" with an index at position 2 is molded as "#[string! "abc" 2]. Loading the string back results in a string "abc" with an index at position 2. - When object pseudo-literals are loaded the spec block is not treated as a block to be executed under the object's context, but strictly as a name/value pair block. This allows objects containing set-words, lit-words etc. to be loaded correctly. For instance #[object! [a: val: b: 1]] results in the object [ a: val: (a containing the set-word val) b: 1 ] instead of object [ a: 1 val: 1 b: 1 ] as you would get from make object! [a: val: b: 1]. Also, the value items are not evaluated before storing them in the object. They are treated as literals. These changes should make it completely safe to send molded objects across untrusted communication lines and load them back at the receiver. Please note that the normal output format of mold and load is not affected at all. The changes only affect the output of mold/all and load/all. The intended use is: - If you start with a string representation or a file, load that file, manipulate the resulting block, and then write the block back into a file, then use mold or save without the /all refinement. - If you start with some data structure in memory, mold it for serialization purposes, store it on disk or send it across a network, and then load it back, then use mold or save with the /all refinement. -- Holger Kruse [kruse--nordicglobal--com]

 [2/7] from: m::koopmans2::chello::nl at: 6-Mar-2002 20:48


Hi All, This makes life easier for everyone. I am very aware of the mentioned limitations as I see them when implementing Rugby. My current approach has been to use mold do and do mold. What happens when deserializing with do in 2.6 ? So... why these changes in particular? Floating code? I'll put out a Rugby for 2.6 that uses the new serialization scheme asap. --Maarten

 [3/7] from: greggirwin:mindspring at: 6-Mar-2002 13:52


Thanks Holger! Excellent info, as usual. --Gregg

 [4/7] from: al:bri:xtra at: 7-Mar-2002 16:29


Holger wrote:
> Serialization of data structures with non-tree-based
references requires special serialization functions. Once the experimental Core 2.6 comes out, I'll have revised versions of 'freeze and 'melt, which will handle this. Andrew Martin ICQ: 26227169 http://valley.150m.com/

 [5/7] from: nitsch-lists:netcologne at: 7-Mar-2002 18:17


RE: [REBOL] Mold, load and Core 2.6 [holger--rebol--net] wrote:
> Here are some comments regarding the recent discussion on mold and load, > explanations of which of those observations represent bugs and which do not,
<<quoted lines omitted: 58>>
> limitations on their structure which have to be complied with for mold and load > to work.
haha. change then import-email. or, use it, have some mad spammer and destroy your archive next time you save. since it "to-emails" whatever this guy thinks could be a nice broken address. i think this is generally, one will not check everywhere for proper formatting, to-email does the job 99,99% of time, then some crazy data destroys all. remembering {{} . i expect a fix in a year or so? #[email! "/badguy-hahaha"] would be so easy..
> 5. Various issues regarding references (circular or otherwise). This is always > difficult to handle in the context of serialization. There are three cases:
<<quoted lines omitted: 18>>
> much impossible to correctly preserve word bindings without saving the complete > REBOL machine state :-). Load always binds words into the global context.
why binding global ? we get kicked whenever somebody inserts a paren! cleverly. having load/unbound would be more secure. and binding to a restricted set of words in a fresh context, like [make object! true false none]. (to-block hangs sometimes here, and is not exactly the same)
> As far as mold and save are concerned, the major change for Core 2.6 is the > /all refinement, which makes mold and save more suitable for serialization.
<<quoted lines omitted: 36>>
> b: 1 > ] as you would get from make object! [a: val: b: 1].
i would think [a: #[val:] b: 1] are more obvious?
> Also, the value items are not evaluated before storing them in the object. > They are treated as literals. These changes should make it completely safe
<<quoted lines omitted: 9>>
> serialization purposes, store it on disk or send it across a network, > and then load it back, then use mold or save with the /all refinement.
all in all sounds great. makes load/mold for serialisation pretty usable. drawbacks are: -unparsable data breaks all -> no use of handy parsings like import-email. at least some check while molding would be nice, instead of something like [equal? mold data mold load mold data] as today.. -global binding -> paren! kills security (or use :this :that everywhere..), -crazy molded set-words -oh yes, and if the newline-tag could be set by programm.. i don't like having 4K-lines after reduce, unable to fix it in block-form. having to mold everything by hand and reload isnt the best solution..
> -- > Holger Kruse > [kruse--nordicglobal--com] > --
-volker

 [6/7] from: holger:rebol at: 7-Mar-2002 10:24


On Thu, Mar 07, 2002 at 06:17:59PM +0100, [nitsch-lists--netcologne--de] wrote:
> haha. change then import-email. or, use it, have some mad spammer and destroy > your archive next time you save. since it "to-emails" whatever this guy thinks could > be a nice broken address. > i think this is generally, one will not check everywhere for proper formatting, > to-email does the job 99,99% of time, then some crazy data destroys all.
We will have a look at that.
> remembering {{} . i expect a fix in a year or so?
???
> why binding global ? > we get kicked whenever somebody inserts a paren! cleverly.
No. paren!, set-word! etc. are evaluated less aggressively in Core 2.6. Carl mentioned that in his last message. Basically, if the intermediate result of an evaluation (e.g. looking up a word) is a paren!, set-word!, then in Core 2.6 the interpreter no longer continues the evaluation at that point, but rather uses the intermediate value. For instance: Up until now:
>> a: to-paren [1] >> type? a
== integer! In Core 2.6:
>> a: to-paren [1] >> type? a
== paren! Same for set-paths, set-words etc. After loading the data you will still have to check for function! types though (using get-words), if you later want to be able to safely access your block without having to use get-words.
> having load/unbound would be more secure.
That's what to-block does.
> and binding to a restricted set of words in a fresh context, like > [make object! true false none].
That would not help. Words not found in that context would still be bound into the global context. Besides with literals for true/false/none you no longer have to use the corresponding words to represent the values in molded data.
> (to-block hangs sometimes here, and is not exactly the same)
We are not aware of any such bug. Please send details to [feedback--rebol--com].
> all in all sounds great. makes load/mold for serialisation pretty usable. > drawbacks are: > -unparsable data breaks all -> no use of handy parsings like import-email. > at least some check while molding would be nice, > instead of something like [equal? mold data mold load mold data] as today..
Possibly...
> -global binding -> paren! kills security (or use :this :that everywhere..),
No. See above. The only danger are function! types, which are now available as pseudo-literals. Perhaps we will add something like load/safe to prevent that additional work.
> -crazy molded set-words
???
> -oh yes, and if the newline-tag could be set by programm.. > i don't like having 4K-lines after reduce, unable to fix it in block-form. > having to mold everything by hand and reload isnt the best solution..
Perhaps a mold and save refinement to add word wrapping... -- Holger Kruse [kruse--nordicglobal--com]

 [7/7] from: lmecir:mbox:vol:cz at: 15-Mar-2002 18:48


Hi all, trying to catch up with my mailbox. Holger, your explanation is nice and comprehensive. I do agree with you everywhere. I have got just one simple question: <<Holger>> ...snip... 5. Various issues regarding references (circular or otherwise). This is always difficult to handle in the context of serialization. There are three cases: ...snip... 5.c) The data represents a general, directed graph, with cycles. This is strongly discouraged :-). mold and load will not work at all with this kind of data. Comment: No changes are planned regarding this, and the current behavior is not considered to be a bug. Serialization of data structures with non-tree-based references requires special serialization functions. mold and load are not suitable for this. <</Holger>> I do agree with you on this. The problem is, that I want the MOLD (and/or LOAD) function to fail gracefully, i.e. to fire an error instead of "blowing up" the interpreter. Is that too much I am asking for? Best regards Ladislav

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted