Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: object2XML

From: rebol:gavinmckenzie:fastmail:fm at: 9-Mar-2004 18:00

Comments below... On Tue, 9 Mar 2004 21:52:59 CET, [bry--itnisk--com] said:
> Well I'm reasonably knowledgable in matters > of xml usage, etc. although my rebol > knowledge is shit, given that I just use it > for small scripting hacks here and there > > if I use gavin's xml-object and a function > to clean-up the output a bit: > > doc-tree: func[unpickedDom][pick third > unpickedDom 1] > > I get > > >> xmldom: parse-xml read %t.xml
You meant parse-xml+ right? parse-xml is the REBOL built-in parser.
>[snip] > now in this case I don't think the > namespaces are a problem, I don't understand > xml-object well enough to know if it fails > on namespace problems, but a namespace > function could be built easily enough to go > through the block getting all referenced > namespaces and checking against those > references whenever a usage is encountered.
What do you wish to do with namespaces? They aren't at all as straightforward as they seem. They get inherited, and the namespace prefixes can be reused within the nesting of the document all the while resolving to totally different namespace URIs. The namespace processing, if it has a chance, should be put into the parser itself and not in xml-to-object or in some higher level processing. Adding in a namespace aware SAX2-style handler into parse-xml is IMHO the only workable way to go.
> Of course it should be noted that the > namespaces are placed in a block with the > attributes but I don't think that is a major > problem although there should of course be > functions for returning just attributes > without namespaces. >
But that's the thing: namespace declarations *look* like attributes, but they really aren't as far as XML is concerned. They need to be treated specially.
> What I find more irritating is the textnodes: > > I have 4 textnodes: > > "^/stuff here ^/" > ["more"] > "^/" > and again > "^/" > > now none is used in an empty tag, but "^/" > is used for any empty textnode, and "^/ > string^/" seems to be used for any textnode > that has a sibling node, whereas textnodes > that are only children are represented as a > block with one string value. it would > probably be better to just do that as > another ["^/string value^/"] >
It depends what you want. Dropping any whitespace is a decision that can only be made by the processing application and not the parser. The parse-xml+ code has a set of default handlers, but you could choose to implement your own. xml-to-object is intended to work with "data" styles of XML and hence whitespace is more easily discarded in such XML without too much risk.
> One of the things that should probably be > considered for any functions for working > with xml in rebol is optimizations for > working with various types of xml, for > example a document like structure such as we > see above (for which I would say the rule is > that a document structure has multiple > textnodes, that an element which has as a > direct child a textnode and an element is a > document structure) as opposed to the more > programmer friendly data type structure: > > <customers> > <customer> > <name><fname>John</fname> > <lname>Simpson</lname> > </name> > .... > </customer> > </customers> >
Agreed, the decisions about "optimizing" have to be done in light of the type of XML you're processing and at a processing level above the parser, not down in the parser itself.