[REBOL] Re: Rebol & XML
From: rebol:gavinmckenzie:fastmail:fm at: 8-Aug-2003 8:11
On Tue, 5 Aug 2003 22:04:47 +1200, "A J Martin" <[AJMartin--orcon--net--nz]>
said:
> Thanks, Bryan and Will!
>
> Bryan wrote:
> > I've thought about doing the same, mainly cause I want to have xpath in
> Rebol, and to do that I need a decent xml parser. I'm sure you're better
> qualified than me for doing it but if you need any help on the project
> I'd
> be glad to help.
>
> I've discovered that Gavin's parse-xml is based on SAX or the event model
> of
> processing XML. At the moment, my thoughts are going towards a DOM model,
> because Rebol is oriented that way, I feel, in reading and writing all of
> a
> file at once. The DOM model builds a tree in memory. I want to access the
> various values with path! values in Rebol. Here's a little XML (XMLSS
> from
> MS Excel 2002):
As I said, I would still recommend building a DOM implementation over-top
of xml-parse or some other xml-parser implementation. There were (maybe
still are?) very real and significant shortcomings in REBOL's built-in
parser, and so I'd recommend that you need a better parser implementation
underneath your DOM.
That said, my xml-parse implementation is slower than REBOL's -- hey I'm
not the world's best REBOL developer and I basically brute-force
translated the EBNF grammar productions from the XML spec into REBOL's
most-excellent parse capability.
> XML: {<?xml version="1.0"?>
> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
> xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
> xmlns:html="http://www.w3.org/TR/REC-html40">
> <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
> <Author>Andrew John Martin</Author>
> <LastAuthor>Andrew John Martin</LastAuthor>
> <Created>2003-08-05T02:10:56Z</Created>
> <LastSaved>2003-08-05T02:10:57Z</LastSaved>
> <Company>Colenso High School</Company>
> <Version>10.4219</Version>
> </DocumentProperties>
> <OfficeDocumentSettings
> xmlns="urn:schemas-microsoft-com:office:office">
> <DownloadComponents/>
> <LocationOfComponents HRef="file:///\\"/>
> </OfficeDocumentSettings>
> </Workbook>
> }
>
> I'd like to processs the above and then access the author's name with
> Rebol
> script like:
>
> XML/Workbook/DocumentProperties/Author
>
> And set it with Rebol script like:
>
> XML/Workbook/DocumentProperties/Author: "Andrew Martin"
>
The xml-object script will let you do that. Check out the web-archived
docs at:
http://web.archive.org/web/20020210063622/www3.sympatico.ca/gavin.mckenzie/rebol/xml-object-info.html
> Also we should think about several tags at the same level of nesting,
> like
> in table:
>
> row
> cell
> cell
> cell
>
> Unfortunately, there's a problem with accessing the attributes of a tag!
> For
> example, what's the path! value for accessing the value of the "xmlns"
> attribute in the "DocumentProperties" tag?
>
> XML/Workbook/DocumentProperties/________
I haven't found that having a different syntax for addressing attributes
to be helpful. I just consider the attribute to be a child of the
element. Yes, in theory it is possible to have an attribute and a
child-element of the same name, but in practice I've never seen such an
XML file in five+ years of working with XML. Or, at least not within the
same namespace.
> [...]
Gavin.