[REBOL] Re: Rebol & XML
From: bry:itnisk at: 5-Aug-2003 15:06
> At the moment, my thoughts are going towards a DOM model,
>because Rebol is oriented that way, I feel, in reading and writing all
of a
>file at once.
Definitely should be DOM, dom is more familiar to most developers and
more popular than SAX.
[ The DOM model builds a tree in memory. I want to access the
various values with path! values in Rebol. Here's a little XML (XMLSS
from
MS Excel 2002):
XML: {<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Andrew John Martin</Author>
<LastAuthor>Andrew John Martin</LastAuthor>
<Created>2003-08-05T02:10:56Z</Created>
<LastSaved>2003-08-05T02:10:57Z</LastSaved>
<Company>Colenso High School</Company>
<Version>10.4219</Version>
</DocumentProperties>
<OfficeDocumentSettings
xmlns="urn:schemas-microsoft-com:office:office">
<DownloadComponents/>
<LocationOfComponents HRef="file:///\\"/>
</OfficeDocumentSettings>
</Workbook>
}
]
>I'd like to processs the above and then access the author's name with
Rebol
>script like:
> XML/Workbook/DocumentProperties/Author
which is basically an xpath. I think it should probably be something
like
xpath XML "/Workbook/DocumentProperties/Author"
>And set it with Rebol script like:
> XML/Workbook/DocumentProperties/Author: "Andrew Martin"
yeah that was something I was also considering, the possibility of an
xpath setting syntax in Rebol.
>Also we should think about several tags at the same level of nesting,
like
>in table:
> row
> cell
> cell
> cell
in the xpath data model of xml this would be taken care of via
position()
http://www.w3.org/TR/xpath#section-Node-Set-Functions
so that one has
row/cell[last()] returning the last cell node under row
row/cell[position() = 2] or row/cell[2] returning the second.
My idea was to have an object hierarchy that could be navigated in the
normal rebol manner, than have an xpath parser that would parse out
xpath strings to figure out the rebol path to something.
This might have problems though.
>Unfortunately, there's a problem with accessing the attributes of a
tag! >For
>example, what's the path! value for accessing the value of the "xmlns"
>attribute in the "DocumentProperties" tag?
>
> XML/Workbook/DocumentProperties/________
>Or perhaps I could use:
> XML/Workbook/DocumentProperties/_Attribute/xmlns
>Where "_Attribute" is the magic word for accessing attributes of a tag?
>What do people think? Is there a better or more simpler way that I've
>overlooked?
xmlns is a namespace declaration and as such not an actual attribute,
depending on what specifications your parser supports, a completely
valid xml parser supporting just the original xml specification would
consider that as an attribute, however most parsers do not consider that
as an attribute because they also support namespaces.
well I think it needs to be abstracted one level
so the information we get out is something like this (this is probably
horribly wrong since I haven't had much occasion to use make object!,
and that I did have was a while ago):
xml: make object! [
element: make object![
name: "Workbook"
attributes: []
default-namespace: "urn:schemas-microsoft-com:office:spreadsheet"
namespaces:[o: "urn:schemas-microsoft-com:office:office"
x: "urn:schemas-microsoft-com:office:excel"
ss: "urn:schemas-microsoft-com:office:spreadsheet"
html : "http://www.w3.org/TR/REC-html40"]
childtree: make object![
element: make object![
name: "DocumentProperties"
.................... and so forth....................
]
]
]
]
consider if this has to handle xml like the following:
<doc>
<section>hi <p att="here">text</p> some more text</section>
</doc>
there has to be a way to get ahold of the various text nodes.
There are three textnodes under section. So we would need something like
this
xml: make object![
element: make object![
name: "doc"
childtree: make object![
element: make object![
name: "section"
childtree: make object![
t1: "hi"
element: make object![
name: "p"
attributes: [
att: "here"
]
t1: "text"
]
t2: "some more text"
]
]
]
]
]
okay, enough of that you get the point, it could probably be better
designed, but problems here:
if the name of an element has a namespace prefix:
element: "svg:svg"
then of course the svg prefix needs to be associated somewhere with the
svg namespace.
The same if an attribute is associated with a namespace prefix (this is
very rare)
Namespaces can be tricky, people have a lot of preconceptions about them
that do not always bear out, different xml dialects have subtly
different namespace processing models. Case in point is svg processing
model which insists that if an svg namespaced element is within an
element in a namespace the processor is unfamiliar with then the svg
namespaced element is removed from the parse tree. Most xml dialects of
course have a model of ignoring the unknown namespace and forging ahead.
It might be possible to have a top-level object that holds all document
namespaces, and use this as a way to optimize namespace checking, most
of the time namespaces are declared on the document element, if a
namespace isn't found there one can then try checking for it in the
local tree, but if it is there than one does not have to check in the
local tree.
The structure above of course means that you can't have as you wanted
before
XML/Workbook/DocumentProperties
But with this one could build an xpath interpreter ontop of it, or a
lightweight one really quick that allowed you to write that and then
went throught the steps.
It would then also allow for us to have functions like:
documentElement myxml
which would return "Workbook"
i.e. it would be possible to actually have something similar to a DOM
implementation for Rebol.