Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Mold, load and Core 2.6

From: holger:rebol at: 6-Mar-2002 10:34

Here are some comments regarding the recent discussion on mold and load, explanations of which of those observations represent bugs and which do not, and what is going to change in the next upcoming Core 2.6 release: First of all, the *intended* use of load and mold for data is in the following way: stored data -> load -> data in memory -> mold -> stored data. Technically, instead of "load" you should use "first load/all", because load/all is more "transparent" in that it does not try to interpret the header or remove an outermost block. If used in this way, i.e. always starting off with load, there are, as far as we are aware of, no bugs in mold or load. Sometimes people try to use load and mold in a different way: data in memory -> mold -> stored data -> load -> data in memory i.e. to serialize data. If used for serialization, mold and load have a number of limitations, just like any other serialization system in any language. Some of the limitations are unavoidable (how do you serialize an open socket connection ?), others could be removed by improvements in the implementation. We are currently aware of the following issues when using load and mold for serialization: 1. Serialization means creating literal representations of data. Unfortunately not all datatypes *have* literal representations. This leads to two types of problems: 1.a) Some values, when molded, become words, and are thus indistinguishable from regular words. For instance "mold 'none" and "mold none" both result in none , which, when loaded back, becomes the word none, not the value none. Comment: Will be addressed in Core 2.6. 1.b) Some values, when molded, become a sequence of values that represent instructions how to recreate the value. For instance molding a hash! results in something like "make hash! [1 2 3]". The problem with that is that it requires the loading script to evaluate the resulting block (which it ordinarily is not supposed to do, because other items in the block, e.g. words, cannot be evaluated, plus evaluation is a security risk if the data is untrusted). Comment: Will be addressed in Core 2.6. 2. Series indices are not included in the molded data. For instance a string series "abc" with an index of 2 becomes "bc" after mold and load. Molding drops the data before the index. Comment: Will be addressed in Core 2.6. 3. It is not possible to create an object! without performing an evaluation of the value fields, i.e. molding and loading an object! represents a security risk, even if the loader is careful and explicitly checks for the words "make" and "object!", because the spec block may contain expressions with side effects. 3.a) A related issue: Objects have to be molded in such a way that the resulting object spec block can be evaluated. This causes some problems and ambiguities, e.g. object containing lit-words or set-words as values are not correctly molded and loaded, because during evaluation such values do not behave as regular values, but have side effects. Comment: Will be addressed in Core 2.6. 4. In memory it is possible to create values of certain datatypes that include characters or expressions that are usually not valid for that particular datatype. For instance it is possible to create an email! value which does not contain an "@" sign, or an issue which contains a semicolon. mold and load do not handle this correctly, because the scanner requires certain hints to identify datatypes. Comment: Not a bug, will not be addressed. Creating datatypes with invalid contents is simply an invalid operation. REBOL allows you to do it (because type restrictions are only checked when scanning), but that does not mean that it is a valid thing to do. If you want to be able to process arbitrary data then you need to use string! and binary! only. Other string series have limitations on their structure which have to be complied with for mold and load to work. 5. Various issues regarding references (circular or otherwise). This is always difficult to handle in the context of serialization. There are three cases: 5.a) The data represents a tree, i.e. each item is referenced no more than once, and there are no cycles. For this type of data mold and load should work without problems, and this is the type of data organization recommended if data needs to be serialized. 5.b) The data represents a directed, acyclic graph, i.e. there are no cycles, but data items can be referenced more than once. For this type of data mold and load should still work, but the referenced items may be included in the molded data separately for each reference, i.e. after loading the data back the references point to separate items. 5.c) The data represents a general, directed graph, with cycles. This is strongly discouraged :-). mold and load will not work at all with this kind of data. Comment: No changes are planned regarding this, and the current behavior is not considered to be a bug. Serialization of data structures with non-tree-based references requires special serialization functions. mold and load are not suitable for this. 6. Word bindings are not preserved by mold. Comment: Not a bug. No changes are planned regarding this. It would be pretty much impossible to correctly preserve word bindings without saving the complete REBOL machine state :-). Load always binds words into the global context. As far as mold and save are concerned, the major change for Core 2.6 is the /all refinement, which makes mold and save more suitable for serialization. The /all refinement has the following effects: - (Almost) all data types are molded in a literal form. Datatypes which already have a natural literal form continue to use this form (integer, words etc.) Datatypes which so far have not had a literal form will use a new notation that acts as a pseudo-literal. This notation is "#[type! description]" or, for some datatypes, "#[value]" (without the quotes). For instance: Value-oriented pseudo-literals: true -> #[true] false -> #[false] none -> #[none] unset! -> #[unset!] Datatype-oriented pseudo-literals: object -> #[object! [a: 1 b: 2 ...]] list -> #[list! [a b c ...]] etc. When loading pseudo-literals back no special refinements have to be used with 'load. The 'load function recognizes pseudo-literals just like all other literals. - When a series with an index different than the head of the series is molded then the complete series is molded while preserving the index. To do this, the series is molded in its pseudo-literal form, with the index following the content. For instance the string "abc" with an index at position 2 is molded as "#[string! "abc" 2]. Loading the string back results in a string "abc" with an index at position 2. - When object pseudo-literals are loaded the spec block is not treated as a block to be executed under the object's context, but strictly as a name/value pair block. This allows objects containing set-words, lit-words etc. to be loaded correctly. For instance #[object! [a: val: b: 1]] results in the object [ a: val: (a containing the set-word val) b: 1 ] instead of object [ a: 1 val: 1 b: 1 ] as you would get from make object! [a: val: b: 1]. Also, the value items are not evaluated before storing them in the object. They are treated as literals. These changes should make it completely safe to send molded objects across untrusted communication lines and load them back at the receiver. Please note that the normal output format of mold and load is not affected at all. The changes only affect the output of mold/all and load/all. The intended use is: - If you start with a string representation or a file, load that file, manipulate the resulting block, and then write the block back into a file, then use mold or save without the /all refinement. - If you start with some data structure in memory, mold it for serialization purposes, store it on disk or send it across a network, and then load it back, then use mold or save with the /all refinement. -- Holger Kruse [kruse--nordicglobal--com]