[REBOL] Re: Mold, load and Core 2.6
From: nitsch-lists:netcologne at: 7-Mar-2002 18:17
RE: [REBOL] Mold, load and Core 2.6
[holger--rebol--net] wrote:
>
> Here are some comments regarding the recent discussion on mold and load,
> explanations of which of those observations represent bugs and which do not,
> and what is going to change in the next upcoming Core 2.6 release:
>
> First of all, the *intended* use of load and mold for data is in the following
> way:
>
> stored data -> load -> data in memory -> mold -> stored data.
>
> Technically, instead of "load" you should use "first load/all", because
> load/all is more "transparent" in that it does not try to interpret the header
> or remove an outermost block.
>
> If used in this way, i.e. always starting off with load, there are, as far as
> we are aware of, no bugs in mold or load.
>
> Sometimes people try to use load and mold in a different way:
>
> data in memory -> mold -> stored data -> load -> data in memory
>
> i.e. to serialize data. If used for serialization, mold and load have a number
> of limitations, just like any other serialization system in any language. Some
> of the limitations are unavoidable (how do you serialize an open socket
> connection ?), others could be removed by improvements in the implementation.
>
> We are currently aware of the following issues when using load and mold for
> serialization:
>
> 1. Serialization means creating literal representations of data. Unfortunately
> not all datatypes *have* literal representations. This leads to two types of
> problems:
>
> 1.a) Some values, when molded, become words, and are thus indistinguishable
> from regular words. For instance "mold 'none" and "mold none" both result in
> none
> , which, when loaded back, becomes the word none, not the value none.
>
> Comment: Will be addressed in Core 2.6.
>
> 1.b) Some values, when molded, become a sequence of values that represent
> instructions how to recreate the value. For instance molding a hash! results in
> something like "make hash! [1 2 3]". The problem with that is that it requires
> the loading script to evaluate the resulting block (which it ordinarily is not
> supposed to do, because other items in the block, e.g. words, cannot be
> evaluated, plus evaluation is a security risk if the data is untrusted).
>
> Comment: Will be addressed in Core 2.6.
>
> 2. Series indices are not included in the molded data. For instance a string
> series "abc" with an index of 2 becomes "bc" after mold and load. Molding drops
> the data before the index.
>
> Comment: Will be addressed in Core 2.6.
>
> 3. It is not possible to create an object! without performing an evaluation of
> the value fields, i.e. molding and loading an object! represents a security
> risk, even if the loader is careful and explicitly checks for the words "make"
> and "object!", because the spec block may contain expressions with side
> effects.
>
> 3.a) A related issue: Objects have to be molded in such a way that the
> resulting object spec block can be evaluated. This causes some problems
> and ambiguities, e.g. object containing lit-words or set-words as values
> are not correctly molded and loaded, because during evaluation such values
> do not behave as regular values, but have side effects.
>
> Comment: Will be addressed in Core 2.6.
>
> 4. In memory it is possible to create values of certain datatypes that include
> characters or expressions that are usually not valid for that particular
> datatype. For instance it is possible to create an email! value which does not
> contain an "@" sign, or an issue which contains a semicolon. mold and load do not
> handle this correctly, because the scanner requires certain hints to identify
> datatypes.
>
> Comment: Not a bug, will not be addressed. Creating datatypes with invalid
> contents is simply an invalid operation. REBOL allows you to do it (because
> type restrictions are only checked when scanning), but that does not mean that
> it is a valid thing to do. If you want to be able to process arbitrary data
> then you need to use string! and binary! only. Other string series have
> limitations on their structure which have to be complied with for mold and load
> to work.
>
haha. change then import-email. or, use it, have some mad spammer and destroy
your archive next time you save. since it "to-emails" whatever this guy thinks could
be a nice broken address.
i think this is generally, one will not check everywhere for proper formatting,
to-email does the job 99,99% of time, then some crazy data destroys all.
remembering {{} . i expect a fix in a year or so?
#[email! "/badguy-hahaha"] would be so easy..
> 5. Various issues regarding references (circular or otherwise). This is always
> difficult to handle in the context of serialization. There are three cases:
>
> 5.a) The data represents a tree, i.e. each item is referenced no more than
> once, and there are no cycles. For this type of data mold and load should work
> without problems, and this is the type of data organization recommended if data
> needs to be serialized.
>
> 5.b) The data represents a directed, acyclic graph, i.e. there are no cycles,
> but data items can be referenced more than once. For this type of data
> mold and load should still work, but the referenced items may be included in the
> molded data separately for each reference, i.e. after loading the data back the
> references point to separate items.
>
> 5.c) The data represents a general, directed graph, with cycles. This is
> strongly discouraged :-). mold and load will not work at all with this kind of
> data.
>
> Comment: No changes are planned regarding this, and the current behavior is not
> considered to be a bug. Serialization of data structures with non-tree-based
> references requires special serialization functions. mold and load are not
> suitable for this.
>
> 6. Word bindings are not preserved by mold.
>
> Comment: Not a bug. No changes are planned regarding this. It would be pretty
> much impossible to correctly preserve word bindings without saving the complete
> REBOL machine state :-). Load always binds words into the global context.
>
why binding global ?
we get kicked whenever somebody inserts a paren! cleverly.
having load/unbound would be more secure.
and binding to a restricted set of words in a fresh context, like
[make object! true false none].
(to-block hangs sometimes here, and is not exactly the same)
> As far as mold and save are concerned, the major change for Core 2.6 is the
> /all refinement, which makes mold and save more suitable for serialization.
> The /all refinement has the following effects:
>
> - (Almost) all data types are molded in a literal form. Datatypes which already
> have a natural literal form continue to use this form (integer, words etc.)
> Datatypes which so far have not had a literal form will use a new notation
> that acts as a pseudo-literal. This notation is "#[type! description]" or,
> for some datatypes, "#[value]" (without the quotes). For instance:
>
> Value-oriented pseudo-literals:
>
> true -> #[true]
> false -> #[false]
> none -> #[none]
> unset! -> #[unset!]
>
> Datatype-oriented pseudo-literals:
>
> object -> #[object! [a: 1 b: 2 ...]]
> list -> #[list! [a b c ...]]
>
> etc.
>
> When loading pseudo-literals back no special refinements have to be used
> with 'load. The 'load function recognizes pseudo-literals just like all
> other literals.
>
> - When a series with an index different than the head of the series is molded
> then the complete series is molded while preserving the index. To do this,
> the series is molded in its pseudo-literal form, with the index following the
> content. For instance the string "abc" with an index at position 2 is molded
> as "#[string! "abc" 2]. Loading the string back results in a string "abc"
> with an index at position 2.
>
> - When object pseudo-literals are loaded the spec block is not treated as a
> block to be executed under the object's context, but strictly as a name/value
> pair block. This allows objects containing set-words, lit-words etc. to be
> loaded correctly. For instance #[object! [a: val: b: 1]] results in the
> object [
> a: val: (a containing the set-word val)
> b: 1
> ]
> instead of
> object [
> a: 1
> val: 1
> b: 1
> ] as you would get from make object! [a: val: b: 1].
>
i would think [a: #[val:] b: 1] are more obvious?
> Also, the value items are not evaluated before storing them in the object.
> They are treated as literals. These changes should make it completely safe
> to send molded objects across untrusted communication lines and load them
> back at the receiver.
>
> Please note that the normal output format of mold and load is not affected
> at all. The changes only affect the output of mold/all and load/all. The
> intended use is:
>
> - If you start with a string representation or a file, load that file,
> manipulate the resulting block, and then write the block back into a
> file, then use mold or save without the /all refinement.
>
> - If you start with some data structure in memory, mold it for
> serialization purposes, store it on disk or send it across a network,
> and then load it back, then use mold or save with the /all refinement.
>
all in all sounds great. makes load/mold for serialisation pretty usable.
drawbacks are:
-unparsable data breaks all -> no use of handy parsings like import-email.
at least some check while molding would be nice,
instead of something like [equal? mold data mold load mold data] as today..
-global binding -> paren! kills security (or use :this :that everywhere..),
-crazy molded set-words
-oh yes, and if the newline-tag could be set by programm..
i don't like having 4K-lines after reduce, unable to fix it in block-form.
having to mold everything by hand and reload isnt the best solution..
> --
> Holger Kruse
> [kruse--nordicglobal--com]
> --
-volker