Mailing List Archive: Re: XML-Parsing?!?

[REBOL] Re: XML-Parsing?!?

From: joel:neely:fedex at: 25-Oct-2000 10:35


Hi, Petr,

Petr Krenzelok wrote:
> I just don't understand one thing yet, - why don't you folks use load/markup?
>
> invoice: load/markup %some-file-received.xml
>
> and then e.g. print invoice/<invoice-number>, as path selection works with
> tags. It's just pity it doesn't allow to use strings, it could be helpful
> sometimes ...
>

Among other reasons, because XML does two nice things:

1)  Abstracts semantic content away from syntactic details.  Consider
    the data file %sample.xml

        <sample>
          <emp id="123" lastn="Doaks" firstn="Joe"/>
          <emp id="234"
               firstn="John"
               lastn="Doe"
          />
          <emp id="345" firstn="Till" lastn="Eulenspiegel" />
          <dept id="012" title="Software Development" />
          <emp
            id="456"
            lastn="Zorro"
          />
        </sample>

    (where the inconsistent layout is deliberate!)  one can easily write
    code to process the block structure from PARSE-XML, as in:

        REBOL []

        emplist: make object! [
          emps: []
          build-emps: func [x [block!]] [
            foreach element x [
              if all [block? element  element/1 = "emp"] [
                append emps any [select element/2 "lastn"  ""]
                append emps any [select element/2 "firstn" ""]
              ]
            ]
          ]
          print-emps: func [/local temps] [
            foreach [ln fn] sort/skip temps: copy emps 2 [
              print [fn ln]
            ]
          ]
          run: func [] [
            build-emps third first third parse-xml read %sample.xml
            print-emps
          ]
        ]

    which does:

        >> do %emplist.r
        >> emplist/run
        Joe Doaks
        John Doe
        Till Eulenspiegel
        Zorro

    Whereas, if I had said:

        >> foo: load/markup %sample.xml
        == [<sample> "^/  " <emp id="123" lastn="Doaks" firstn="Joe"/>
        "^/  " <emp id="234"
               firstn="John"
               lastn="Doe"
          /> "^...
        >> foreach item foo [print mold item]
        <sample>
        "^/  "
        <emp id="123" lastn="Doaks" firstn="Joe"/>
        "^/  "
        <emp id="234"
               firstn="John"
               lastn="Doe"
          />
        "^/  "
        <emp id="345" firstn="Till" lastn="Eulenspiegel" />
        "^/  "
        <dept id="012" title="Software Development" />
        "^/  "
        <emp
            id="456"
            lastn="Zorro"
          />
        "^/"
        </sample>
        "^/^/"

    I trust that it's clear that there's still a lot of work to be done
    to find all the right data.  Furthermore,

        >> foo/<sample>
        == "^/  "
        >> foo/id
        ** Script Error: Invalid path value: id.
        ** Where: foo/id
        >> foo/<emp>
        ** Script Error: Invalid path value: <emp>.
        ** Where: foo/<emp>

    are fairly useless as building blocks for processing the content,
    especially as compared with what EMPLIST can do with the resulting
    block structure from PARSE-XML.

2)  XML allows nested data structures to be represented nicely.  By using
    PARSE-XML, we get a nice recursive representation of that nested
    structure with no further work required before processing it.

    For example:

        <sample2>
          <page title="Home Page" url="http://www.foo.com/">
            <page title="About foo.com" url="about.html" />
            <page title="Contact Us!" url="contact.html" />
            <page title="Locations" url="locations/">
              <page title="London" url="gb.html" />
              <page title="Prague" url="cz.html" />
              <page title="Darmstadt" url="de.html" />
            </page>
            <page title="Products" url="products/">
              <page title="Widgets" url="widgets.html" />
              <page title="Blivets" url="blivets.html" />
            </page>
          </page>
        </sample2>

    one can easily write

        REBOL []

        pagetree: make object! [
          padding: "                              "
          pad: func [s [string!]] [ copy/part join s padding 30 ]
          print-tree: func [prefix [string!] x [block!]] [
            if x/1 = "page" [
              prefix: join prefix any [select x/2 "url" ""]
              print [
                pad any [select x/2 "title" ""]
        	prefix
              ]
            ]
            if found? x/3 [
              foreach item x/3 [
                if block? item [print-tree prefix item]
              ]
            ]
          ]
          run: func [f [file!]] [
            print-tree "" first third parse-xml read f
          ]
        ]

    which does

        >> do %pagetree.r
        >> pagetree/run %sample2.xml
        Home Page                      http://www.foo.com/
        About foo.com                  http://www.foo.com/about.html
        Contact Us!                    http://www.foo.com/contact.html
        Locations                      http://www.foo.com/locations/
        London                         http://www.foo.com/locations/gb.html
        Prague                         http://www.foo.com/locations/cz.html
        Darmstadt                      http://www.foo.com/locations/de.html
        Products                       http://www.foo.com/products/
        Widgets                        http://www.foo.com/products/widgets.html
        Blivets                        http://www.foo.com/products/blivets.html

    Whereas LOAD/MARKUP only gives me a linear enumeration of tags and
    strings which requires that I write more code to figure out which tags
    are to be nested inside which others, etc...

I prefer to let PARSE-XML do the work for me.

OBTW, I've also written a collection of "helper" objects and functions
that further simplify the common tasks of traversing the recursive block
structure and applying the right process at each node, but that's a
story for another day.

-jn-

--
; Joel Neely  [joel--neely--fedex--com]  901-263-4460  38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]