Mailing List Archive: Re: What's the 'none' for in the parse-xml result?

[REBOL] Re: What's the 'none' for in the parse-xml result?

From: gavin:mckenzie:sympatico:ca at: 11-Jul-2001 9:42


On July 11, 2001 3:20 AM Joel Neely wrote:
>Just take your example a little further.
>[snip]
>The block REBOL produces for an XML element contains the
>element name, attribute list, and content, in that order.
>[snip]
>An element that has no attributes has NONE for its second
>part, just as an element that has no content has NONE for
>its third part.  Each item in the content block (if there
>is one) will either be a string or a block (of similar
>structure) for a subordinate element.

Yes...I did know this, and I've enjoyed your previous submissions on helper
functions for accessing the sub-structures of a parsed-xml block.

>>[snip]
>Based on looking at the code for XML-LANGUAGE, my conclusion
>was that the block for the top-level document was simply
>another block that followed the above structure (to avoid
>fencepost issues).

You may be right.  I may be reading too much into it.  The reason why I
assumed that it might be intentional was because the notion of a top level
'document' structure that contains meta-information about the document (such
as the DocumentType enclosing the prolog) itself is consistent with W3C XML
DOM.

Check out the IDL at:
 http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html

In normal DOM based XML processing I'm used to dealing with a "document"
object that contains a handle the the "document element" i.e. the root
element of the document.  This is consistent with the block structure
returned by parse-xml.

>I wrote extensions to handle comments and CDATA a while back,
>and had thought about doing an article on XML in REBOL.  (Are
>you interested in collaborating?)  But I'm not sure what you
>have in mind for namespaces.  Were you thinking of actually
>writing a validating parser?

Nooo...I wasn't going to go down the validation route, that's more than I
need.

It's just that without some support for entities, and CDATA sections, it's
hard to process real-world XML data.  By real-world XML data, I mean XML
data that someone else created, hence you don't have the ability to
constrain the amount of XML 1.0 functionality employed.

Same thing for namespaces.  If you have to deal with any sort of XML
applications that package/envelope the content (e.g. SOAP, BizTalk, most XML
EDI applications) then invariably you end up with one or two common
circumstances:
1. Your XML data is enclosed in an 'envelope' denoted by a namespace
2. Your XML data contains data belonging to a namespace foreign to your
original data

Either of these circumstances require the ability to filter/mask or at least
recognize namespace information.

My plan was to add namespace info into the block structure.

I've also created a SAX-style callback interface for occasions when you want
to process an XML document in a streaming manner rather than suck the whole
document into memory.

Interested in collaborating? Heck...I'd be pleased.  Though your REBOL
expertise would outclass mine.  I can offer XML expertise...XML (and its
associated specs Namespaces/Schema/XSLT/DSig/etc.) is all I've been doing
for four years.

I'll post my parse-xml replacement tonight for (critical) review.  Basically
I've pretty much just used the BNF production rules from the XML 1.0 spec.

Gavin.