[REBOL] Re: XML-Parsing?!?
From: joel:neely:fedex at: 25-Oct-2000 10:35
Hi, Petr,
Petr Krenzelok wrote:
> I just don't understand one thing yet, - why don't you folks use load/markup?
>
> invoice: load/markup %some-file-received.xml
>
> and then e.g. print invoice/<invoice-number>, as path selection works with
> tags. It's just pity it doesn't allow to use strings, it could be helpful
> sometimes ...
>
Among other reasons, because XML does two nice things:
1) Abstracts semantic content away from syntactic details. Consider
the data file %sample.xml
<sample>
<emp id="123" lastn="Doaks" firstn="Joe"/>
<emp id="234"
firstn="John"
lastn="Doe"
/>
<emp id="345" firstn="Till" lastn="Eulenspiegel" />
<dept id="012" title="Software Development" />
<emp
id="456"
lastn="Zorro"
/>
</sample>
(where the inconsistent layout is deliberate!) one can easily write
code to process the block structure from PARSE-XML, as in:
REBOL []
emplist: make object! [
emps: []
build-emps: func [x [block!]] [
foreach element x [
if all [block? element element/1 = "emp"] [
append emps any [select element/2 "lastn" ""]
append emps any [select element/2 "firstn" ""]
]
]
]
print-emps: func [/local temps] [
foreach [ln fn] sort/skip temps: copy emps 2 [
print [fn ln]
]
]
run: func [] [
build-emps third first third parse-xml read %sample.xml
print-emps
]
]
which does:
>> do %emplist.r
>> emplist/run
Joe Doaks
John Doe
Till Eulenspiegel
Zorro
Whereas, if I had said:
>> foo: load/markup %sample.xml
== [<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/>
"^/ " <emp id="234"
firstn="John"
lastn="Doe"
/> "^...
>> foreach item foo [print mold item]
<sample>
"^/ "
<emp id="123" lastn="Doaks" firstn="Joe"/>
"^/ "
<emp id="234"
firstn="John"
lastn="Doe"
/>
"^/ "
<emp id="345" firstn="Till" lastn="Eulenspiegel" />
"^/ "
<dept id="012" title="Software Development" />
"^/ "
<emp
id="456"
lastn="Zorro"
/>
"^/"
</sample>
"^/^/"
I trust that it's clear that there's still a lot of work to be done
to find all the right data. Furthermore,
>> foo/<sample>
== "^/ "
>> foo/id
** Script Error: Invalid path value: id.
** Where: foo/id
>> foo/<emp>
** Script Error: Invalid path value: <emp>.
** Where: foo/<emp>
are fairly useless as building blocks for processing the content,
especially as compared with what EMPLIST can do with the resulting
block structure from PARSE-XML.
2) XML allows nested data structures to be represented nicely. By using
PARSE-XML, we get a nice recursive representation of that nested
structure with no further work required before processing it.
For example:
<sample2>
<page title="Home Page" url="http://www.foo.com/">
<page title="About foo.com" url="about.html" />
<page title="Contact Us!" url="contact.html" />
<page title="Locations" url="locations/">
<page title="London" url="gb.html" />
<page title="Prague" url="cz.html" />
<page title="Darmstadt" url="de.html" />
</page>
<page title="Products" url="products/">
<page title="Widgets" url="widgets.html" />
<page title="Blivets" url="blivets.html" />
</page>
</page>
</sample2>
one can easily write
REBOL []
pagetree: make object! [
padding: " "
pad: func [s [string!]] [ copy/part join s padding 30 ]
print-tree: func [prefix [string!] x [block!]] [
if x/1 = "page" [
prefix: join prefix any [select x/2 "url" ""]
print [
pad any [select x/2 "title" ""]
prefix
]
]
if found? x/3 [
foreach item x/3 [
if block? item [print-tree prefix item]
]
]
]
run: func [f [file!]] [
print-tree "" first third parse-xml read f
]
]
which does
>> do %pagetree.r
>> pagetree/run %sample2.xml
Home Page http://www.foo.com/
About foo.com http://www.foo.com/about.html
Contact Us! http://www.foo.com/contact.html
Locations http://www.foo.com/locations/
London http://www.foo.com/locations/gb.html
Prague http://www.foo.com/locations/cz.html
Darmstadt http://www.foo.com/locations/de.html
Products http://www.foo.com/products/
Widgets http://www.foo.com/products/widgets.html
Blivets http://www.foo.com/products/blivets.html
Whereas LOAD/MARKUP only gives me a linear enumeration of tags and
strings which requires that I write more code to figure out which tags
are to be nested inside which others, etc...
I prefer to let PARSE-XML do the work for me.
OBTW, I've also written a collection of "helper" objects and functions
that further simplify the common tasks of traversing the recursive block
structure and applying the right process at each node, but that's a
story for another day.
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]