Mailing List Archive: parse-xml cannot be reversed Re:

[REBOL] parse-xml cannot be reversed Re:

From: d95-mjo:nada:kth:se at: 30-Jul-2000 6:02


On Sun, 30 Jul 2000 [bhandley--zip--com--au] wrote:

> I attempted to write a function that would take the structure that parse-xml
> generates and export it back into a valid xml file.
> But, I found that it cannot be reliably done.
>
> Here's an example.
> >> parse-xml {<a>teststring<b/><c/></a>}
> == [document none [["a" none ["teststring" ["b" none none] ["c" none
> none]]]]]
>
> Just looking at the structure would lead you (or your program) to conclude
> that "b" was an attribute of an element "teststring", until you realise that
> attribute lists should not have an odd number of elements.

I have a theory, but I'm not sure if it's correct. Here it is anyway:
(Sorry about the messy code, it's a quick-n-dirty hack.)

Just looking at the structure may be a little bit confusing, but I
don't think a program would have a problem understanding that the
string data is not an element, if it's constructed in the
right
 way. An element consists of:

[elementname [attributes] [subelements]]

where a subelement is either:

1) A block, which means it's an element.
2) A string, which means it's a string.

A recursive function for traversing the tree could look something
like this:

traverse-tree: func [element] [
  either not none? element/3 [
    prin rejoin ["<" element/1 ">"]
    foreach subelement element/3 [
      either block? subelement [
        traverse-tree subelement
      ][
        prin subelement
      ]
    ]
    prin rejoin ["</" element/1 ">"]
  ][
    prin rejoin ["<" element/1 "/>"]
  ]
]

I tested it on your example:

>> traverse-tree parse-xml {<a>teststring<b/><c/></a>}
<document><a>teststring<b/><c/></a></document>

With a few adjustments, it should be able to handle all xml-parsed
trees, afaik... but it's 5:48am right now, so I may be wrong. :-)

You can also parse the whole parse-xml structure with the new block
parser in /View and /Core 2.3. It only takes about 6 lines of
code. :-)

Try this:

doc-rule: ['document none! subtags-rule]
subtags-rule: [none! | into [some [tag-rule | substring-rule]]]
tag-rule: [into [string! parameters-rule subtags-rule]]
substring-rule: [string!]
parameters-rule: [none! | block!]
parse (parse-xml {<a>teststring<b/><c/></a>}) doc-rule

I have extended this into a callback xml-parser, that works a little
bit like the SAX parsers. It's very easy to extend for different
types of XML documents. Send me a mail if anyone is interested in
taking a look at it. I am using it to parse RSS-newsfeeds,
Moreover-newsfeeds, Slashdot-headlines and a few of my own XML
documents.

/Martin Johannesson, [d95-mjo--nada--kth--se]