Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] parse-xml cannot be reversed Re:

From: d95-mjo:nada:kth:se at: 30-Jul-2000 6:02

On Sun, 30 Jul 2000 [bhandley--zip--com--au] wrote:
> I attempted to write a function that would take the structure that parse-xml > generates and export it back into a valid xml file. > But, I found that it cannot be reliably done. > > Here's an example. > >> parse-xml {<a>teststring<b/><c/></a>} > == [document none [["a" none ["teststring" ["b" none none] ["c" none > none]]]]] > > Just looking at the structure would lead you (or your program) to conclude > that "b" was an attribute of an element "teststring", until you realise that > attribute lists should not have an odd number of elements.
I have a theory, but I'm not sure if it's correct. Here it is anyway: (Sorry about the messy code, it's a quick-n-dirty hack.) Just looking at the structure may be a little bit confusing, but I don't think a program would have a problem understanding that the string data is not an element, if it's constructed in the right way. An element consists of: [elementname [attributes] [subelements]] where a subelement is either: 1) A block, which means it's an element. 2) A string, which means it's a string. A recursive function for traversing the tree could look something like this: traverse-tree: func [element] [ either not none? element/3 [ prin rejoin ["<" element/1 ">"] foreach subelement element/3 [ either block? subelement [ traverse-tree subelement ][ prin subelement ] ] prin rejoin ["</" element/1 ">"] ][ prin rejoin ["<" element/1 "/>"] ] ] I tested it on your example:
>> traverse-tree parse-xml {<a>teststring<b/><c/></a>}
<document><a>teststring<b/><c/></a></document> With a few adjustments, it should be able to handle all xml-parsed trees, afaik... but it's 5:48am right now, so I may be wrong. :-) You can also parse the whole parse-xml structure with the new block parser in /View and /Core 2.3. It only takes about 6 lines of code. :-) Try this: doc-rule: ['document none! subtags-rule] subtags-rule: [none! | into [some [tag-rule | substring-rule]]] tag-rule: [into [string! parameters-rule subtags-rule]] substring-rule: [string!] parameters-rule: [none! | block!] parse (parse-xml {<a>teststring<b/><c/></a>}) doc-rule I have extended this into a callback xml-parser, that works a little bit like the SAX parsers. It's very easy to extend for different types of XML documents. Send me a mail if anyone is interested in taking a look at it. I am using it to parse RSS-newsfeeds, Moreover-newsfeeds, Slashdot-headlines and a few of my own XML documents. /Martin Johannesson, [d95-mjo--nada--kth--se]