Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

[REBOL] Re: Rebol & XML

From: bry:itnisk at: 5-Aug-2003 15:06

> At the moment, my thoughts are going towards a DOM model, >because Rebol is oriented that way, I feel, in reading and writing all
of a
>file at once.
Definitely should be DOM, dom is more familiar to most developers and more popular than SAX. [ The DOM model builds a tree in memory. I want to access the various values with path! values in Rebol. Here's a little XML (XMLSS from MS Excel 2002): XML: {<?xml version="1.0"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html=""> <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> <Author>Andrew John Martin</Author> <LastAuthor>Andrew John Martin</LastAuthor> <Created>2003-08-05T02:10:56Z</Created> <LastSaved>2003-08-05T02:10:57Z</LastSaved> <Company>Colenso High School</Company> <Version>10.4219</Version> </DocumentProperties> <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office"> <DownloadComponents/> <LocationOfComponents HRef="file:///\\"/> </OfficeDocumentSettings> </Workbook> } ]
>I'd like to processs the above and then access the author's name with
>script like: > XML/Workbook/DocumentProperties/Author
which is basically an xpath. I think it should probably be something like xpath XML "/Workbook/DocumentProperties/Author"
>And set it with Rebol script like: > XML/Workbook/DocumentProperties/Author: "Andrew Martin"
yeah that was something I was also considering, the possibility of an xpath setting syntax in Rebol.
>Also we should think about several tags at the same level of nesting,
>in table: > row > cell > cell > cell
in the xpath data model of xml this would be taken care of via position() so that one has row/cell[last()] returning the last cell node under row row/cell[position() = 2] or row/cell[2] returning the second. My idea was to have an object hierarchy that could be navigated in the normal rebol manner, than have an xpath parser that would parse out xpath strings to figure out the rebol path to something. This might have problems though.
>Unfortunately, there's a problem with accessing the attributes of a
tag! >For
>example, what's the path! value for accessing the value of the "xmlns" >attribute in the "DocumentProperties" tag? > > XML/Workbook/DocumentProperties/________ >Or perhaps I could use: > XML/Workbook/DocumentProperties/_Attribute/xmlns >Where "_Attribute" is the magic word for accessing attributes of a tag? >What do people think? Is there a better or more simpler way that I've >overlooked?
xmlns is a namespace declaration and as such not an actual attribute, depending on what specifications your parser supports, a completely valid xml parser supporting just the original xml specification would consider that as an attribute, however most parsers do not consider that as an attribute because they also support namespaces. well I think it needs to be abstracted one level so the information we get out is something like this (this is probably horribly wrong since I haven't had much occasion to use make object!, and that I did have was a while ago): xml: make object! [ element: make object![ name: "Workbook" attributes: [] default-namespace: "urn:schemas-microsoft-com:office:spreadsheet" namespaces:[o: "urn:schemas-microsoft-com:office:office" x: "urn:schemas-microsoft-com:office:excel" ss: "urn:schemas-microsoft-com:office:spreadsheet" html : ""] childtree: make object![ element: make object![ name: "DocumentProperties" .................... and so forth.................... ] ] ] ] consider if this has to handle xml like the following: <doc> <section>hi <p att="here">text</p> some more text</section> </doc> there has to be a way to get ahold of the various text nodes. There are three textnodes under section. So we would need something like this xml: make object![ element: make object![ name: "doc" childtree: make object![ element: make object![ name: "section" childtree: make object![ t1: "hi" element: make object![ name: "p" attributes: [ att: "here" ] t1: "text" ] t2: "some more text" ] ] ] ] ] okay, enough of that you get the point, it could probably be better designed, but problems here: if the name of an element has a namespace prefix: element: "svg:svg" then of course the svg prefix needs to be associated somewhere with the svg namespace. The same if an attribute is associated with a namespace prefix (this is very rare) Namespaces can be tricky, people have a lot of preconceptions about them that do not always bear out, different xml dialects have subtly different namespace processing models. Case in point is svg processing model which insists that if an svg namespaced element is within an element in a namespace the processor is unfamiliar with then the svg namespaced element is removed from the parse tree. Most xml dialects of course have a model of ignoring the unknown namespace and forging ahead. It might be possible to have a top-level object that holds all document namespaces, and use this as a way to optimize namespace checking, most of the time namespaces are declared on the document element, if a namespace isn't found there one can then try checking for it in the local tree, but if it is there than one does not have to check in the local tree. The structure above of course means that you can't have as you wanted before XML/Workbook/DocumentProperties But with this one could build an xpath interpreter ontop of it, or a lightweight one really quick that allowed you to write that and then went throught the steps. It would then also allow for us to have functions like: documentElement myxml which would return "Workbook" i.e. it would be possible to actually have something similar to a DOM implementation for Rebol.