Mailing List Archive: Re: object2XML

[REBOL] Re: object2XML

From: bry:itnisk at: 10-Mar-2004 11:06


> What do you wish to do with namespaces?
They aren't at all as
> straightforward as they seem.  They get
inherited, and the namespace
> prefixes can be reused within the nesting
of the document all the while
> resolving to totally different namespace
URIs.  The namespace processing,
> if it has a chance, should be put into the
parser itself and not in
> xml-to-object or in some higher level
processing.  Adding in a namespace
> aware SAX2-style handler into parse-xml is
IMHO the only workable way to
> go.

I'm talking about having a library of
functions that call parse-xml that then do
the namespace conformance checking, why
would this be a good idea?

1. xml version 1.0 does not have any
connection to the namespace specification
(there is the following note from the
current version of the spec: The Namespaces
in XML Recommendation [XML Names] assigns a
meaning to names containing colon
characters. Therefore, authors should not
use the colon in XML names except for
namespace purposes, but XML processors must
accept the colon as a name character. Which
most processors do not accept the colon as a
name character without a namespace
declaration but as can be seen from the text
above that is incorrect), therefore one can
in fact have xml documents that have
elements called blah:text and have those
documents be well-formed, although of course
that is not industry standard practice (but
if you examine the svg put out by
Illustrator, Photoshop etc. you will notice
that when an xlink: namespace prefix is used
there is no xlink namespace declaration in
the document[this of course violates the
xlink spec but not the xml spec]).

Because of this it might be preferable to
layer the namespace handling in such a way
that one can build sricter levels of
specification(s) conformance.

> > Of course it should be noted that the
> > namespaces are placed in a block with
the
> > attributes but I don't think that is a
major
> > problem although there should of course
be
> > functions for returning just attributes
> > without namespaces.
> >
>
> But that's the thing: namespace
declarations *look* like attributes, but
> they really aren't as far as XML is
concerned.  They need to be treated
> specially.
>
hence my making a differentiation between
them in my post. Again, to a straight
conformant xml 1.0 processor that an
attribute is called xmlns:hi means
absolutely nothing. To a processor that
understands both namespaces and xml 1.0 it
does mean something. Therefore, again, I
suppose that it is maybe useful to keep
namespace handling as functions seperate
from parse-xml.

> > What I find more irritating is the
textnodes:
> >
> > I have 4 textnodes:
> >
> > "^/stuff here ^/"
> > ["more"]
> > "^/"
> > and again
> > "^/"
> >
> > now none is used in an empty tag,
but "^/"
> > is used for any empty textnode, and "^/
> > string^/" seems to be used for any
textnode
> > that has a sibling node, whereas
textnodes
> > that are only children are represented
as a
> > block with one string value. it would
> > probably be better to just do that as
> > another ["^/string value^/"]
> >
>
> It depends what you want.  Dropping any
whitespace is a decision that can
> only be made by the processing application
and not the parser.  The
> parse-xml+ code has a set of default
handlers, but you could choose to
> implement your own.  xml-to-object is
intended to work with "data" styles
> of XML and hence whitespace is more easily
discarded in such XML without
> too much risk.

Again that was not what I was complaining
about, I found the difference between how a
textnode was represented disconcerting for
the usage of a more strict parser built on
top of parse-xml. It seems to me
that "^/string value here" is a reasonable
way to signify that a node is a textnode,
since an element name can't start with a ^
and one would just not check to see if a
node were a textnode or element inside of an
attribute block.
NOTE: again, this is discussing the
possibilty of a generic xml processing
library of functions on top of parse-xml. so
that you could have a strip-empty-text func
that takes an rebolxmldom parameter, and
returns the rebolxmldom at the end with all
empty textnodes stripped out.

> >
> > One of the things that should probably
be
> > considered for any functions for working
> > with xml in rebol is optimizations for
> > working with various types of xml, for
> > example a document like structure such
as we
> > see above (for which I would say the
rule is
> > that a document structure has multiple
> > textnodes, that an element which has as
a
> > direct child a textnode and an element
is a
> > document structure) as opposed to the
more
> > programmer friendly data type structure:
> >
>
> Agreed, the decisions about "optimizing"
have to be done in light of the
> type of XML you're processing and at a
processing level above the parser,
> not down in the parser itself.

I'm suggesting that rather than having parse-
xml as the first and final way to read a
document, that one should have a library
built around parse-xml. So I'm not saying
that parse-xml should be fixed, I've come to
the conclusion that it is reasonably okay as
a starting point. Why is it reasonably okay,
because frankly there is a lot of non-
conformant xml out there that is, in usage,
accepted by different applications and
processors. I would as a general rule be
against working with such stuff but, for an
example, msxml accepts elements named xml,
according to the recommendation that name is
reserved: [Definition: A Name is a token
beginning with a letter or one of a few
punctuation characters, and continuing with
letters, digits, hyphens, underscores,
colons, or full stops, together known as
name characters.] Names beginning with the
string "xml", or with any string which would
match (('X'|'x') ('M'|'m') ('L'|'l')), are
reserved for standardization in this or
future versions of this specification

that of course wouldn't be so bad but a lot
of microsoft markup comes with elements
named xml in them. alot of people using only
msxml have xml documents with element names
like:
xml-metadata in them and such like. Probably
it would be a good thing if one could accept
those documents.