Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: XML-Parsing?!?

From: joel:neely:fedex at: 25-Oct-2000 10:35

Hi, Petr, Petr Krenzelok wrote:
> I just don't understand one thing yet, - why don't you folks use load/markup? > > invoice: load/markup %some-file-received.xml > > and then e.g. print invoice/<invoice-number>, as path selection works with > tags. It's just pity it doesn't allow to use strings, it could be helpful > sometimes ... >
Among other reasons, because XML does two nice things: 1) Abstracts semantic content away from syntactic details. Consider the data file %sample.xml <sample> <emp id="123" lastn="Doaks" firstn="Joe"/> <emp id="234" firstn="John" lastn="Doe" /> <emp id="345" firstn="Till" lastn="Eulenspiegel" /> <dept id="012" title="Software Development" /> <emp id="456" lastn="Zorro" /> </sample> (where the inconsistent layout is deliberate!) one can easily write code to process the block structure from PARSE-XML, as in: REBOL [] emplist: make object! [ emps: [] build-emps: func [x [block!]] [ foreach element x [ if all [block? element element/1 = "emp"] [ append emps any [select element/2 "lastn" ""] append emps any [select element/2 "firstn" ""] ] ] ] print-emps: func [/local temps] [ foreach [ln fn] sort/skip temps: copy emps 2 [ print [fn ln] ] ] run: func [] [ build-emps third first third parse-xml read %sample.xml print-emps ] ] which does:
>> do %emplist.r >> emplist/run
Joe Doaks John Doe Till Eulenspiegel Zorro Whereas, if I had said:
>> foo: load/markup %sample.xml
== [<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/> "^/ " <emp id="234" firstn="John" lastn="Doe" /> "^...
>> foreach item foo [print mold item]
<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/> "^/ " <emp id="234" firstn="John" lastn="Doe" /> "^/ " <emp id="345" firstn="Till" lastn="Eulenspiegel" /> "^/ " <dept id="012" title="Software Development" /> "^/ " <emp id="456" lastn="Zorro" /> "^/" </sample> "^/^/" I trust that it's clear that there's still a lot of work to be done to find all the right data. Furthermore,
>> foo/<sample>
== "^/ "
>> foo/id
** Script Error: Invalid path value: id. ** Where: foo/id
>> foo/<emp>
** Script Error: Invalid path value: <emp>. ** Where: foo/<emp> are fairly useless as building blocks for processing the content, especially as compared with what EMPLIST can do with the resulting block structure from PARSE-XML. 2) XML allows nested data structures to be represented nicely. By using PARSE-XML, we get a nice recursive representation of that nested structure with no further work required before processing it. For example: <sample2> <page title="Home Page" url="http://www.foo.com/"> <page title="About foo.com" url="about.html" /> <page title="Contact Us!" url="contact.html" /> <page title="Locations" url="locations/"> <page title="London" url="gb.html" /> <page title="Prague" url="cz.html" /> <page title="Darmstadt" url="de.html" /> </page> <page title="Products" url="products/"> <page title="Widgets" url="widgets.html" /> <page title="Blivets" url="blivets.html" /> </page> </page> </sample2> one can easily write REBOL [] pagetree: make object! [ padding: " " pad: func [s [string!]] [ copy/part join s padding 30 ] print-tree: func [prefix [string!] x [block!]] [ if x/1 = "page" [ prefix: join prefix any [select x/2 "url" ""] print [ pad any [select x/2 "title" ""] prefix ] ] if found? x/3 [ foreach item x/3 [ if block? item [print-tree prefix item] ] ] ] run: func [f [file!]] [ print-tree "" first third parse-xml read f ] ] which does
>> do %pagetree.r >> pagetree/run %sample2.xml
Home Page http://www.foo.com/ About foo.com http://www.foo.com/about.html Contact Us! http://www.foo.com/contact.html Locations http://www.foo.com/locations/ London http://www.foo.com/locations/gb.html Prague http://www.foo.com/locations/cz.html Darmstadt http://www.foo.com/locations/de.html Products http://www.foo.com/products/ Widgets http://www.foo.com/products/widgets.html Blivets http://www.foo.com/products/blivets.html Whereas LOAD/MARKUP only gives me a linear enumeration of tags and strings which requires that I write more code to figure out which tags are to be nested inside which others, etc... I prefer to let PARSE-XML do the work for me. OBTW, I've also written a collection of "helper" objects and functions that further simplify the common tasks of traversing the recursive block structure and applying the right process at each node, but that's a story for another day. -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]