[REBOL] parse-xml cannot be reversed Re:
From: d95-mjo:nada:kth:se at: 30-Jul-2000 6:02
On Sun, 30 Jul 2000 [bhandley--zip--com--au] wrote:
> I attempted to write a function that would take the structure that parse-xml
> generates and export it back into a valid xml file.
> But, I found that it cannot be reliably done.
>
> Here's an example.
> >> parse-xml {<a>teststring<b/><c/></a>}
> == [document none [["a" none ["teststring" ["b" none none] ["c" none
> none]]]]]
>
> Just looking at the structure would lead you (or your program) to conclude
> that "b" was an attribute of an element "teststring", until you realise that
> attribute lists should not have an odd number of elements.
I have a theory, but I'm not sure if it's correct. Here it is anyway:
(Sorry about the messy code, it's a quick-n-dirty hack.)
Just looking at the structure may be a little bit confusing, but I
don't think a program would have a problem understanding that the
string data is not an element, if it's constructed in the
right
way. An element consists of:
[elementname [attributes] [subelements]]
where a subelement is either:
1) A block, which means it's an element.
2) A string, which means it's a string.
A recursive function for traversing the tree could look something
like this:
traverse-tree: func [element] [
either not none? element/3 [
prin rejoin ["<" element/1 ">"]
foreach subelement element/3 [
either block? subelement [
traverse-tree subelement
][
prin subelement
]
]
prin rejoin ["</" element/1 ">"]
][
prin rejoin ["<" element/1 "/>"]
]
]
I tested it on your example:
>> traverse-tree parse-xml {<a>teststring<b/><c/></a>}
<document><a>teststring<b/><c/></a></document>
With a few adjustments, it should be able to handle all xml-parsed
trees, afaik... but it's 5:48am right now, so I may be wrong. :-)
You can also parse the whole parse-xml structure with the new block
parser in /View and /Core 2.3. It only takes about 6 lines of
code. :-)
Try this:
doc-rule: ['document none! subtags-rule]
subtags-rule: [none! | into [some [tag-rule | substring-rule]]]
tag-rule: [into [string! parameters-rule subtags-rule]]
substring-rule: [string!]
parameters-rule: [none! | block!]
parse (parse-xml {<a>teststring<b/><c/></a>}) doc-rule
I have extended this into a callback xml-parser, that works a little
bit like the SAX parsers. It's very easy to extend for different
types of XML documents. Send me a mail if anyone is interested in
taking a look at it. I am using it to parse RSS-newsfeeds,
Moreover-newsfeeds, Slashdot-headlines and a few of my own XML
documents.
/Martin Johannesson, [d95-mjo--nada--kth--se]