[REBOL] Re: object2XML
From: rebol:gavinmckenzie:fastmail:fm at: 9-Mar-2004 18:00
Comments below...
On Tue, 9 Mar 2004 21:52:59 CET, [bry--itnisk--com] said:
> Well I'm reasonably knowledgable in matters
> of xml usage, etc. although my rebol
> knowledge is shit, given that I just use it
> for small scripting hacks here and there
>
> if I use gavin's xml-object and a function
> to clean-up the output a bit:
>
> doc-tree: func[unpickedDom][pick third
> unpickedDom 1]
>
> I get
>
> >> xmldom: parse-xml read %t.xml
You meant parse-xml+ right? parse-xml is the REBOL built-in parser.
>[snip]
> now in this case I don't think the
> namespaces are a problem, I don't understand
> xml-object well enough to know if it fails
> on namespace problems, but a namespace
> function could be built easily enough to go
> through the block getting all referenced
> namespaces and checking against those
> references whenever a usage is encountered.
What do you wish to do with namespaces? They aren't at all as
straightforward as they seem. They get inherited, and the namespace
prefixes can be reused within the nesting of the document all the while
resolving to totally different namespace URIs. The namespace processing,
if it has a chance, should be put into the parser itself and not in
xml-to-object or in some higher level processing. Adding in a namespace
aware SAX2-style handler into parse-xml is IMHO the only workable way to
go.
> Of course it should be noted that the
> namespaces are placed in a block with the
> attributes but I don't think that is a major
> problem although there should of course be
> functions for returning just attributes
> without namespaces.
>
But that's the thing: namespace declarations *look* like attributes, but
they really aren't as far as XML is concerned. They need to be treated
specially.
> What I find more irritating is the textnodes:
>
> I have 4 textnodes:
>
> "^/stuff here ^/"
> ["more"]
> "^/"
> and again
> "^/"
>
> now none is used in an empty tag, but "^/"
> is used for any empty textnode, and "^/
> string^/" seems to be used for any textnode
> that has a sibling node, whereas textnodes
> that are only children are represented as a
> block with one string value. it would
> probably be better to just do that as
> another ["^/string value^/"]
>
It depends what you want. Dropping any whitespace is a decision that can
only be made by the processing application and not the parser. The
parse-xml+ code has a set of default handlers, but you could choose to
implement your own. xml-to-object is intended to work with "data" styles
of XML and hence whitespace is more easily discarded in such XML without
too much risk.
> One of the things that should probably be
> considered for any functions for working
> with xml in rebol is optimizations for
> working with various types of xml, for
> example a document like structure such as we
> see above (for which I would say the rule is
> that a document structure has multiple
> textnodes, that an element which has as a
> direct child a textnode and an element is a
> document structure) as opposed to the more
> programmer friendly data type structure:
>
> <customers>
> <customer>
> <name><fname>John</fname>
> <lname>Simpson</lname>
> </name>
> ....
> </customer>
> </customers>
>
Agreed, the decisions about "optimizing" have to be done in light of the
type of XML you're processing and at a processing level above the parser,
not down in the parser itself.