What's the 'none' for in the parse-xml result?

[1/3] from: gavin::mckenzie::sympatico::ca at: 11-Jul-2001 7:36

Folks, I know that at a minimum the parse-xml function will return a block structure rooted with the following; ['document none none] Where the second none value will be replaced by the parsed document content. So, if I do: parse-xml "<foo>bar</foo>" I get the result: [document none [["foo" none ["bar"]]]] But my question is, has the purpose of the first none value (immediately after 'document) ever been explained? I'm writing up an extended version of parse-xml that addresses many of the non-compliance issues with the built-in parse-xml (such as lack of CDATA section support, namespaces etc.), and I'm betting that the first none value is intended for future use to hold the document's prolog (such as the internal DTD subset). Has the purpose of the first none value ever been discussed/revealed? Gavin.

[2/3] from: joel:neely:fedex at: 11-Jul-2001 2:20

Hi, Gavin, Gavin F. McKenzie wrote:

...

> ['document none none] >

...

> parse-xml "<foo>bar</foo>" > > I get the result: > > [document none [["foo" none ["bar"]]]] >

Just take your example a little further.

>>> parse-xml {<foo top="up" size="big">Hello!</foo>}

== [document none [["foo" ["top" "up" "size" "big"] ["Hello!"]]]] The block REBOL produces for an XML element contains the element name, attribute list, and content, in that order. The following aliases are handy...

>> alias 'third "content-of"

== content-of

>> alias 'second "attributes-of"

== attributes-of An element that has no attributes has NONE for its second part, just as an element that has no content has NONE for its third part. Each item in the content block (if there is one) will either be a string or a block (of similar structure) for a subordinate element.

>> parse-xml {<foo><bletch /></foo>}

== [document none [["foo" none [["bletch" none none]]]]] When attributes are present, they are presented in a block of name/value pairs suitable for searching with SELECT/SKIP

>> parse-xml {<socks color="navy" fiber="cotton" />}

== [document none [["socks" ["color" "navy" "fiber" "cotton"] none]]]

>> select/skip attributes-of first content-of x "color" 2

== ["navy"]

> I'm writing up an extended version of parse-xml that > addresses many of the non-compliance issues with the > built-in parse-xml (such as lack of CDATA section support, > namespaces etc.), and I'm betting that the first none value > is intended for future use to hold the document's prolog > (such as the internal DTD subset). >

Based on looking at the code for XML-LANGUAGE, my conclusion was that the block for the top-level document was simply another block that followed the above structure (to avoid fencepost issues). I wrote extensions to handle comments and CDATA a while back, and had thought about doing an article on XML in REBOL. (Are you interested in collaborating?) But I'm not sure what you have in mind for namespaces. Were you thinking of actually writing a validating parser? -jn- --------------------------------------------------------------- There are two types of science: physics and stamp collecting! -- Sir Arthur Eddington joel-dot-neely-at-fedex-dot-com

[3/3] from: gavin:mckenzie:sympatico:ca at: 11-Jul-2001 9:42

On July 11, 2001 3:20 AM Joel Neely wrote:

>Just take your example a little further. >[snip]

<<quoted lines omitted: 6>>

>is one) will either be a string or a block (of similar >structure) for a subordinate element.

Yes...I did know this, and I've enjoyed your previous submissions on helper functions for accessing the sub-structures of a parsed-xml block.

>>[snip] >Based on looking at the code for XML-LANGUAGE, my conclusion >was that the block for the top-level document was simply >another block that followed the above structure (to avoid >fencepost issues).

You may be right. I may be reading too much into it. The reason why I assumed that it might be intentional was because the notion of a top level 'document' structure that contains meta-information about the document (such as the DocumentType enclosing the prolog) itself is consistent with W3C XML DOM. Check out the IDL at: http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html In normal DOM based XML processing I'm used to dealing with a "document" object that contains a handle the the "document element" i.e. the root element of the document. This is consistent with the block structure returned by parse-xml.

>I wrote extensions to handle comments and CDATA a while back, >and had thought about doing an article on XML in REBOL. (Are >you interested in collaborating?) But I'm not sure what you >have in mind for namespaces. Were you thinking of actually >writing a validating parser?

Nooo...I wasn't going to go down the validation route, that's more than I need. It's just that without some support for entities, and CDATA sections, it's hard to process real-world XML data. By real-world XML data, I mean XML data that someone else created, hence you don't have the ability to constrain the amount of XML 1.0 functionality employed. Same thing for namespaces. If you have to deal with any sort of XML applications that package/envelope the content (e.g. SOAP, BizTalk, most XML EDI applications) then invariably you end up with one or two common circumstances: 1. Your XML data is enclosed in an 'envelope' denoted by a namespace 2. Your XML data contains data belonging to a namespace foreign to your original data Either of these circumstances require the ability to filter/mask or at least recognize namespace information. My plan was to add namespace info into the block structure. I've also created a SAX-style callback interface for occasions when you want to process an XML document in a streaming manner rather than suck the whole document into memory. Interested in collaborating? Heck...I'd be pleased. Though your REBOL expertise would outclass mine. I can offer XML expertise...XML (and its associated specs Namespaces/Schema/XSLT/DSig/etc.) is all I've been doing for four years. I'll post my parse-xml replacement tonight for (critical) review. Basically I've pretty much just used the BNF production rules from the XML 1.0 spec. Gavin.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted