Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: object2XML

From: bry:itnisk at: 10-Mar-2004 11:06

> What do you wish to do with namespaces?
They aren't at all as
> straightforward as they seem. They get
inherited, and the namespace
> prefixes can be reused within the nesting
of the document all the while
> resolving to totally different namespace
URIs. The namespace processing,
> if it has a chance, should be put into the
parser itself and not in
> xml-to-object or in some higher level
processing. Adding in a namespace
> aware SAX2-style handler into parse-xml is
IMHO the only workable way to
> go.
I'm talking about having a library of functions that call parse-xml that then do the namespace conformance checking, why would this be a good idea? 1. xml version 1.0 does not have any connection to the namespace specification (there is the following note from the current version of the spec: The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character. Which most processors do not accept the colon as a name character without a namespace declaration but as can be seen from the text above that is incorrect), therefore one can in fact have xml documents that have elements called blah:text and have those documents be well-formed, although of course that is not industry standard practice (but if you examine the svg put out by Illustrator, Photoshop etc. you will notice that when an xlink: namespace prefix is used there is no xlink namespace declaration in the document[this of course violates the xlink spec but not the xml spec]). Because of this it might be preferable to layer the namespace handling in such a way that one can build sricter levels of specification(s) conformance.
> > Of course it should be noted that the > > namespaces are placed in a block with
the
> > attributes but I don't think that is a
major
> > problem although there should of course
be
> > functions for returning just attributes > > without namespaces. > > > > But that's the thing: namespace
declarations *look* like attributes, but
> they really aren't as far as XML is
concerned. They need to be treated
> specially. >
hence my making a differentiation between them in my post. Again, to a straight conformant xml 1.0 processor that an attribute is called xmlns:hi means absolutely nothing. To a processor that understands both namespaces and xml 1.0 it does mean something. Therefore, again, I suppose that it is maybe useful to keep namespace handling as functions seperate from parse-xml.
> > What I find more irritating is the
textnodes:
> > > > I have 4 textnodes: > > > > "^/stuff here ^/" > > ["more"] > > "^/" > > and again > > "^/" > > > > now none is used in an empty tag,
but "^/"
> > is used for any empty textnode, and "^/ > > string^/" seems to be used for any
textnode
> > that has a sibling node, whereas
textnodes
> > that are only children are represented
as a
> > block with one string value. it would > > probably be better to just do that as > > another ["^/string value^/"] > > > > It depends what you want. Dropping any
whitespace is a decision that can
> only be made by the processing application
and not the parser. The
> parse-xml+ code has a set of default
handlers, but you could choose to
> implement your own. xml-to-object is
intended to work with "data" styles
> of XML and hence whitespace is more easily
discarded in such XML without
> too much risk.
Again that was not what I was complaining about, I found the difference between how a textnode was represented disconcerting for the usage of a more strict parser built on top of parse-xml. It seems to me that "^/string value here" is a reasonable way to signify that a node is a textnode, since an element name can't start with a ^ and one would just not check to see if a node were a textnode or element inside of an attribute block. NOTE: again, this is discussing the possibilty of a generic xml processing library of functions on top of parse-xml. so that you could have a strip-empty-text func that takes an rebolxmldom parameter, and returns the rebolxmldom at the end with all empty textnodes stripped out.
> > > > One of the things that should probably
be
> > considered for any functions for working > > with xml in rebol is optimizations for > > working with various types of xml, for > > example a document like structure such
as we
> > see above (for which I would say the
rule is
> > that a document structure has multiple > > textnodes, that an element which has as
a
> > direct child a textnode and an element
is a
> > document structure) as opposed to the
more
> > programmer friendly data type structure: > > > > Agreed, the decisions about "optimizing"
have to be done in light of the
> type of XML you're processing and at a
processing level above the parser,
> not down in the parser itself.
I'm suggesting that rather than having parse- xml as the first and final way to read a document, that one should have a library built around parse-xml. So I'm not saying that parse-xml should be fixed, I've come to the conclusion that it is reasonably okay as a starting point. Why is it reasonably okay, because frankly there is a lot of non- conformant xml out there that is, in usage, accepted by different applications and processors. I would as a general rule be against working with such stuff but, for an example, msxml accepts elements named xml, according to the recommendation that name is reserved: [Definition: A Name is a token beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters.] Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification that of course wouldn't be so bad but a lot of microsoft markup comes with elements named xml in them. alot of people using only msxml have xml documents with element names like: xml-metadata in them and such like. Probably it would be a good thing if one could accept those documents.