Mailing List Archive: Re: parse-xml and build-tag

[REBOL] Re: parse-xml and build-tag

From: joel:neely:fedex at: 7-Oct-2001 16:38


Hi, Hallvard,

Hallvard Ystad wrote:
> 1) When I use the parse-xml function, here's what I get:
>
> >> xml-doc: parse-xml {<test><tag>This is inside "tag"</tag><goodForNothi
> ng/> And this is in the outer tag, the "test" tag.</test>}
> == [document none [["test" none [["tag" none [{This is inside "tag"}]] ["
> goodForNothing" none none] { And this is in the outer tag,...
> >> print mold xml-doc
> [document none [["test" none [["tag" none [{This is inside "tag"}]] ["goo
> dForNothing" none none] { And this is in the outer tag, the "test" tag.}]
> ]]]
> >>
>
> Is there some good documentation for the use of this function somewhere,
> and, not least, for the kind of block tree it returns?
>

I haven't seen it documented, but the returned block structure works is
organized as follows:

*  content strings are represented as strings, with all ignorablewhitespace
   retained (e.g., any leading/trailing newlines, indentation, etc.)

*  an XML element is represented by a three-element block

   [ elementname attributeblock contentblock ]

   where:

   *  elementname is a string giving the name of the element itself;
   *  attributeblock is either a block of name/value pairs or NONE,
      depending on whether attributes were present in the element; and
   *  contentblock is either a block of content items (strings and/or
      element blocks) or NONE, depending on whether the element had
      any contents.

*  the top level of the structure is a three-element block with the
   word DOCUMENT (note: not the string "document"!) as its first element,
   NONE as the second element (presumably no attributes), and the root
   XML element as the only member in its third block.

For example:

  >> parse-xml {<foo where="here" when="now"/>}
  == [document none [["foo" ["where" "here" "when" "now"] none]]]

which shows the DOCUMENT word (with no attributes) and a content of
only one item -- the "foo" element.  That element has two attributes
(with values, of course) and no content.  Similarly,

  >> parse-xml {<foo where="here" when="now"></foo>}
  == [document none [["foo" ["where" "here" "when" "now"] none]]]

having no content is equivalent to being an empty element.  However,

  >> parse-xml {
  {    <foo where="here" when="now">
  {    </foo>
  {    }
  == [document none [["foo" ["where" "here" "when" "now"] ["^/"]]]]

shows that an ignorablewhitespace string (e.g., only a newline)
is retained as the content of the "foo" element.

> 2) There is a build-tag function, which isn't perfect, but it _is_.
> Has anyone written a good function to go the other way? I.e. to turn
> a tag into a block or into an object?
>

How about this?

  >> first third parse-xml {<foo where="here" when="now">}
  == ["foo" ["where" "here" "when" "now"] none]

IOW, let PARSE-XML do the work, then pluck out the first (and only)
element in the content of the (hypothetical) document containing
only that single tag.

Then you get a block structure that is consistent with the above
description (element name, attributes, and NONE).

HTH!

-jn-

--
; Joel Neely  [joel--neely--fedex--com]  901-263-4460  38017/HKA/9677
REBOL []  foreach [order string]  sort/skip reduce [ true "!"
false  head reverse "rekcah"  none "REBOL "  prin "Just " "another "
] 2 [prin string] print ""