Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

parse-xml cannot be reversed

 [1/4] from: bhandley:zip:au at: 30-Jul-2000 1:56


I attempted to write a function that would take the structure that parse-xml generates and export it back into a valid xml file. But, I found that it cannot be reliably done. Here's an example.
>> parse-xml {<a>teststring<b/><c/></a>}
== [document none [["a" none ["teststring" ["b" none none] ["c" none none]]]]] Just looking at the structure would lead you (or your program) to conclude that "b" was an attribute of an element "teststring", until you realise that attribute lists should not have an odd number of elements. Maybe parse-xml should be creating a normal three element block for #PCDATA but use none for the first two elements. I haven't used this much so I would like to know if there are any comments or objections to this conclusion. Brett.

 [2/4] from: d95-mjo:nada:kth:se at: 30-Jul-2000 6:02


On Sun, 30 Jul 2000 [bhandley--zip--com--au] wrote:
> I attempted to write a function that would take the structure that parse-xml > generates and export it back into a valid xml file.
<<quoted lines omitted: 6>>
> that "b" was an attribute of an element "teststring", until you realise that > attribute lists should not have an odd number of elements.
I have a theory, but I'm not sure if it's correct. Here it is anyway: (Sorry about the messy code, it's a quick-n-dirty hack.) Just looking at the structure may be a little bit confusing, but I don't think a program would have a problem understanding that the string data is not an element, if it's constructed in the right way. An element consists of: [elementname [attributes] [subelements]] where a subelement is either: 1) A block, which means it's an element. 2) A string, which means it's a string. A recursive function for traversing the tree could look something like this: traverse-tree: func [element] [ either not none? element/3 [ prin rejoin ["<" element/1 ">"] foreach subelement element/3 [ either block? subelement [ traverse-tree subelement ][ prin subelement ] ] prin rejoin ["</" element/1 ">"] ][ prin rejoin ["<" element/1 "/>"] ] ] I tested it on your example:
>> traverse-tree parse-xml {<a>teststring<b/><c/></a>}
<document><a>teststring<b/><c/></a></document> With a few adjustments, it should be able to handle all xml-parsed trees, afaik... but it's 5:48am right now, so I may be wrong. :-) You can also parse the whole parse-xml structure with the new block parser in /View and /Core 2.3. It only takes about 6 lines of code. :-) Try this: doc-rule: ['document none! subtags-rule] subtags-rule: [none! | into [some [tag-rule | substring-rule]]] tag-rule: [into [string! parameters-rule subtags-rule]] substring-rule: [string!] parameters-rule: [none! | block!] parse (parse-xml {<a>teststring<b/><c/></a>}) doc-rule I have extended this into a callback xml-parser, that works a little bit like the SAX parsers. It's very easy to extend for different types of XML documents. Send me a mail if anyone is interested in taking a look at it. I am using it to parse RSS-newsfeeds, Moreover-newsfeeds, Slashdot-headlines and a few of my own XML documents. /Martin Johannesson, [d95-mjo--nada--kth--se]

 [3/4] from: bhandley:zip:au at: 31-Jul-2000 10:54


> With a few adjustments, it should be able to handle all xml-parsed > trees, afaik... but it's 5:48am right now, so I may be wrong. :-) >
I think I stand corrected. Which is good :)
> You can also parse the whole parse-xml structure with the new block > parser in /View and /Core 2.3. It only takes about 6 lines of
<<quoted lines omitted: 6>>
> parameters-rule: [none! | block!] > parse (parse-xml {<a>teststring<b/><c/></a>}) doc-rule
This is great. I was wanting to see an example of block parse with into in action. Brett.

 [4/4] from: joel:neely:fedex at: 31-Jul-2000 15:49


[bhandley--zip--com--au] wrote:
> I attempted to write a function that would take the structure that parse-xml > generates and export it back into a valid xml file. > But, I found that it cannot be reliably done. >
Beg pardon, but it can be done.
> Here's an example. > >> parse-xml {<a>teststring<b/><c/></a>}
<<quoted lines omitted: 3>>
> that "b" was an attribute of an element "teststring", until you realise that > attribute lists should not have an odd number of elements.
No. Looking at that structure tells me that <a> has no attributes, but has three pieces of content: a string, a <b> element, and a <c> element. teststring is a string and not the name of an element. We know this because an XML element is always represented by a block with three parts: 1) name: a string 2) attributes: either none or a block of name/value pairs 3) contents: either none or a block of content items, *each of which must be either a string or an element block* Since "teststring" occurrs AS A TOP-LEVEL MEMBER of the content of <a>, it must be a string. If "teststring" were the name of an element nested inside <a>, it would have to be the first element of its own block, something like: [document none [["a" none [["teststring" #1 #2] ["b" none none] #3]]]] where #1 is the attribute list of <teststring> (or none) #2 is the content of <teststring> (or none) #3 is the rest of the content of <a>, after <teststring> and <b> The code below should do what you want (except for the placement of ignorable-whitespace values, but that is left as an exercise for the reader ;-) -jn- _xdump: func [ b [block!] {xml structure} p [string!] /local tag pp was-string ][ tag: trim to-string first b prin join copy p [join copy "<" tag] if found? second b [ foreach [n v] second b [ prin join copy " " [trim n "=" mold v] ] ] either none? third b [ print join copy "><" [tag "/>"] ][ print ">" pp: join copy p " " was-string: false foreach x third b [ was-string: not any-block? x either was-string [ if 0 < length? trim x [ prin join copy pp x ] ][ _xdump x pp ] ] if was-string [print ""] print [join copy p [copy "</" trim tag ">"]] ] ] xdump: func [ b [block!] {the xml structure from parse-xml} ][ _xdump first third b copy "" print "" ]

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted