Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

XML-Parsing?!?

 [1/16] from: pscheller:atos:ch at: 24-Oct-2000 15:57


Hi ya all :-) I have the need to parse XML-Documents to form a HTML page from it. Now with all the functions related to that I still was unable to extract any tags value from a XML-file. I know I could do all the parsing on my own, but I suspect that somehow Rebol could do this for me in a more convenient way. Or am I wrong? Now if someone could explain me the concepts of these functions any further. Or just tell me I'm completely wrong, I'm just stuck right now. parse-xml: returns a block which should contain the tags and values load: it should parse the file but its no use for me because the tags are still unseperated from their values xml-language: What is this object good for?? Greets to all, pat le cat -- ___________________________________________________________________ Atos (Schweiz) AG Patrick Scheller Industriestrasse 19 CH-8304 Wallisellen

 [2/16] from: joel:neely:fedex at: 25-Oct-2000 7:50


Hi, Patrick, I've been playing with parse-xml for quite a while (in fact, that's what got me to using REBOL seriously in the first place), so let me give a couple of hints that may help. [rebol-bounce--rebol--com] wrote:
> I have the need to parse XML-Documents to form a HTML page from it. Now > with all the functions related to that I still was unable to extract any > tags value from a XML-file. I know I could do all the parsing on my own, > but I suspect that somehow Rebol could do this for me in a more > convenient way. Or am I wrong? >
Absolutely right! I do it all the time.
> Now if someone could explain me the concepts of these functions any > further. Or just tell me I'm completely wrong, I'm just stuck right > now. > > parse-xml: returns a block which should contain the tags and values >
PARSE-XML takes a string and gives you back a structure of nested blocks that represents the XML structure in the string. A typical example is:
>> foo: {<a>
{ <b>Hi, Patrick!</b> { <c type="demo" /> { <d pos="last"> { end { </d> { </a>} == {<a> <b>Hi, Patrick!</b> <c type="demo" /> <d pos="last"> end </d> </a>}
>> fum: parse-xml foo
== [document none [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"... You might also say fee: parse-xml read %fie.xml etc... PARSE-XML uses the following convention to represent an XML element <name a0="v0" a1="v1" ...> ...content... </name> is parsed into a block with three members [ "name" ["a0" "v0" "a1" "v0" ...] [...content...] ] 1) The first member is a string that is the name of the element; 2) The second member is: 2a) if the element had attributes, a block containing name/value pairs for all attributes (each as a string); or 2b) if the element did not have attributes, then NONE; 3) The third member is: 3a) if the element had content, *even ignorable-whitespace*, a block containing each piece of content as a member; or 3b) if the element was empty, then NONE. Note that, in (3a) above, each contained element is nested block, and each occurrence of PCDATA is represented as a string. In addition, any comment <!-- ... --> or PI <? ... ?> which may occur in the XML document are simply ignored. I have a modified version of PARSE-XML which retains them, but have almost never needed it for serious applications. The nice thing about having the attributes as a name/value block is that you can say things like attribute-value: select some-element/2 "attribute-name" and not worry about what order they were in, etc. The current version of PARSE-XML is non-validating (which means that no checking is performed on which elements/attributes may/must occur at any point. It assumes that your arrangement of elements and attributes is what you wanted. It also does minimal syntax error handling and can be fooled into blowing up. For example, if you hand it the content of a large HTML document, it will likely have a stack overflow, as it thinks that tags such as <br> and <hr>, or unclosed instances of <p>, <tr>, <td> etc..., will be closed later on and nests everything following them. You CAN use PARSE-XML on XHTML-conforming documents, however. Just be sure to close all non-empty tags, put attribute values in double-quotes, and write empty HTML tags as self-closing (as in <br /> and <hr />). The other convention you must know is that the entire XML structure from the file is treated as the content of an imaginary element with a name as the *WORD* 'document and with no attributes. With all of that background, and using the results of the console transcript above, we can see:
>> fum/1
== document
>> fum/2
== none
>> fum/3
== [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"]] "^/"]]] Since FUM was the result of PARSE-XML, its first member is the word 'document and its second member is NONE. Its third member is a block containing only the top-level element of the original XML. (That's why FUM/3 appears to be doubly-nested; the content block is FUM/3 and contains only one element FUM/3/1, but that element is itself represented as a block!)
>> foreach el fum/3 [print mold el]
["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"]] "^/"]] Remember from the console example that we had
>> foo: {<a>
{ <b>Hi, Patrick!</b> { <c type="demo" /> { <d pos="last"> { end { </d> { </a>} so that the top-level element has a name of "a", no attributes, and three subordinate elements, <b> <c ...> and <d ...>, in its content.
>> topelement: fum/3/1
== ["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"]] "^/"]]
>> topelement/1
== "a"
>> topelement/2
== none
>> topelement/3
== ["^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"]] "^/"]
>> foreach item topelement/3 [print mold item]
"^/" ["b" none ["Hi, Patrick!"]] "^/" ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"]] "^/" Wait! someone may think. "There are seven subordinate members here, not three!" Remember that ignorable-whitespace is retained by PARSE-XML, so the NEWLINE values between <a> and <b>, </b> and <c ...>, <c ...> and <d ...>, and </d> and </a> are also in the content block for the top level element (<a>). To wrap up, notice that the <b> element had no attributes, so its block representation has NONE as the second member. It containined only a single string (with no whitespace) so the third member for the block representing <b> is a block with only one string in it. The <c> element had an attribute, but no content, so it is rep- resented by a block whose second member is a block of name/value pair(s) and whose third member is NONE. Finally, <d> had both attributes and content, so it is represented by a block with non-NONE values in the second and third positions. Note that the whitespace surrounding the string "end" is included in the content string. To get you started writing REBOL to handle XML-derived data, here are a couple of utilities you may find useful: _xdump: func [ b [block!] {xml structure} p [string!] /local tag pp was-string ][ tag: trim to-string first b prin join copy p [join copy "<" tag] if found? second b [ foreach [n v] second b [ prin join copy " " [trim n "=" mold v] ] ] either none? third b [ print " />" ][ print ">" pp: join copy p " " was-string: false foreach x third b [ was-string: not any-block? x either was-string [ if 0 < length? trim x [ print join copy pp x ] ][ _xdump x pp ] ] print [join copy p [copy "</" trim tag ">"]] ] ] xdump: func [ b [block!] {the xml structure from parse-xml} ][ _xdump first third b copy "" print "" ] The Xdump function simply pretty-prints a block structure from PARSE-XML to the console. It can serve as an example of the kind of recursive code you may be writing if you traverse general block structures.
>> xdump fum
<a> <b> Hi, Patrick! </b> <c type="demo" /> <d pos="last"> end </d> </a> Notice that it is not overly smart! The embedded ^/ in the content string for <d> causes an extra blank line. Since most of my XML applications really don't care about the ignorable-whitespace, I also wrote the following, inspired by TRIM for STRING! data: trim-xml: func [ b [block!] /local content item ][ content: third b if found? content [ while [not tail? content] [ item: first content either block? item [ trim-xml item content: next content ][ either 0 = length? trim item [ remove content ][ content: next content ] ] ] if 0 = length? head content [ b/3: none ] ] b ] Now we can say
>> trim-xml fum
== [document none [["a" none [["b" none ["Hi, Patrick!"]] ["c" ["type" "demo"] none] ["d" ["pos" "last"] ["end^/"]]]]]]
>> foreach item topelement/3 [print mold item]
["b" none ["Hi, Patrick!"]] ["c" ["type" "demo"] none] ["d" ["pos" "last"] ["end^/"]] And the whitespace-only content strings are gone.
> load: it should parse the file but its no use for me because the tags > are still unseperated from their values >
I've never had to use LOAD for XML processing.
> xml-language: What is this object good for?? >
XML-LANGUAGE is the object that contains the support for PARSE-XML. In general it is A Good Thing to implement a complex function by writing complex-function-wrapper: make object! [ ... support functions and data go here ... top-entry: func [...top-level-arguments...] [...body...] ] so that all the support stuff doesn't pollute the global namespace, cause accidental name collisions, etc. You can then call the function either by complex-function-wrapper/top-entry ...arguments... or by defining complex-function: func [...argumemnts...] [ complex-function-wrapper/top-entry ...arguments... ] just for pretty. XML-LANGUAGE fulfills that role for PARSE-XML.
> Greets to all, pat le cat >
Le cat says, "Purr", and thanks you! -jn-

 [3/16] from: sharriff:aina:med-iq at: 25-Oct-2000 14:19


Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND? Sharriff Aina med.iq information & quality in healthcare AG Gutenbergstr. 42 41564 Kaarst tel.: 02131-3669-0 fax: 02131-3669-599 www.med-iq.de

 [4/16] from: petr:krenzelok:trz:cz at: 25-Oct-2000 15:35


Joel Neely wrote:
> Hi, Patrick, > > I've been playing with parse-xml for quite a while (in fact, that's > what got me to using REBOL seriously in the first place), so let me > give a couple of hints that may help.
I just don't understand one thing yet, - why don't you folks use load/markup? invoice: load/markup %some-file-received.xml and then e.g. print invoice/<invoice-number>, as path selection works with tags. It's just pity it doesn't allow to use strings, it could be helpful sometimes ... I know you can't check XML doc conformity, but .... -pekr-

 [5/16] from: brett:codeconscious at: 26-Oct-2000 1:30


Hi Petr, I don't understand your example, could you expand on it please? As far as I know load/markup gives back xml as a flat block of tags and strings - correct? Or is my verion of Rebol out of date again? :) Brett.

 [6/16] from: pscheller:atos:ch at: 25-Oct-2000 16:31


Petr Krenzelok wrote:
> Joel Neely wrote:
Hi Petr
> > I've been playing with parse-xml for quite a while (in fact, that's > > what got me to using REBOL seriously in the first place), so let me
<<quoted lines omitted: 4>>
> tags. It's just pity it doesn't allow to use strings, it could be helpful > sometimes ...
Well actually I tried it all. But I still dont get it. Either it doesn't work the same way with me or I still have a big lack of understanding. All I need is the possibility to simply read an external XML-File and extract values from certain tags like this: Pseudo: print the value of Prozesse/cre ------------------------BEGIN XML---------------------------------- <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!-- Diese Datei enthält die Stati der xxx relevanten Prozesse --> <!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt --> <test1 name = "Patrick" /> <test2 name = "Scheller" /> <Prozesse> <xxx = "1" /> <cre = "2" /> <ora = "3" /> <xxxrec = "4" /> <db_ppb = "5" /> </Prozesse> <!-- Ende der Datei --> -------------------------END XML----------------------------------- How would you do this Petr?? Thanx and greets from me :-) ___________________________________________________________________ Atos (Schweiz) AG Patrick Scheller Phone: ++41- 1-877 69 69 Industriestrasse 19 Fax: ++41- 1-877 69 99 CH-8304 Wallisellen Internet: [pscheller--atos--ch]

 [7/16] from: pscheller:atos:ch at: 25-Oct-2000 16:35


Hi Joel First of all thank you very much for this intensiv reply. I still need some time to read it all through. Still, as I wrote Petr I have the simple need for a possibility to read an external XML-File and extract values from certain tags like this: Pseudo: print the value of Prozesse/cre ------------------------BEGIN XML---------------------------------- <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!-- Diese Datei enthält die Stati der xxx relevanten Prozesse --> <!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt --> <test1 name = "Patrick" /> <test2 name = "Scheller" /> <Prozesse> <xxx = "1" /> <cre = "2" /> <ora = "3" /> <xxxrec = "4" /> <db_ppb = "5" /> </Prozesse> <!-- Ende der Datei --> -------------------------END XML----------------------------------- Joel Neely wrote:
> Hi, Patrick, > I've been playing with parse-xml for quite a while (in fact, that's
<<quoted lines omitted: 6>>
> > convenient way. Or am I wrong? > Absolutely right! I do it all the time.
Glad to hear that :-)
> > Now if someone could explain me the concepts of these functions any > > further. Or just tell me I'm completely wrong, I'm just stuck right
<<quoted lines omitted: 22>>
> == [document none [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/" > ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"...
.... Thanx again Joel "Live long and prosper" :-) Greets to all, pat le cat

 [8/16] from: joel:neely:fedex at: 25-Oct-2000 9:35


Hi, Sharriff, [rebol-bounce--rebol--com] wrote:
> Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND? >
I'm not familiar with Xerces. Can you point me to a URL? Thanks! -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

 [9/16] from: brett:codeconscious at: 26-Oct-2000 1:50


Just to add to Joel's comprehensive description. You can also you the parse function to deal with the structure returned by the parse-xml function. Here an example which Martin Johannesson described some time back. Just copy and paste this into a Rebol console session. xml-structure: ['document none! xml-element-contents] xml-element-node: [into [xml-element-name xml-element-attributes xml-element-contents]] xml-element-name: [set elt-name string! (print elt-name)] xml-element-attributes: [none! | block!] xml-element-contents: [none! | into [some [xml-element-node | string! ]]] xml-target: http://p.moreover.com/cgi-local/page?index_devoper+xml parse parse-xml read xml-target xml-structure I put in a bit of code into xml-element-name in order to print the element names as they occur. You could go on to adjust the parse rule for a specific dtd. Which is what I did for moreover.com. It would be nice if some wrote a DTD to parse-rule translator :) Brett.

 [10/16] from: sharriff:aina:med-iq at: 25-Oct-2000 15:45


Its being used it several pakages, even commercial ones. It is a main component of the ENHYDRA SERVER (www.enhydra.org), a free java application server. Xerces---> http://xml.apache.org/ Regards Sharriff Aina med.iq information & quality in healthcare AG Gutenbergstr. 42 41564 Kaarst Joel Neely <[joel--neely--f] An: [rebol-list--rebol--com] edex.com> Kopie: Gesendet von: Thema: [REBOL] Re: Antwort: Re: XML-Parsing?!? rebol-bounce@ rebol.com 25.10.00 15:35 Bitte antworten an rebol-list Hi, Sharriff, [rebol-bounce--rebol--com] wrote:
> Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND? >
I'm not familiar with Xerces. Can you point me to a URL? Thanks! -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

 [11/16] from: pscheller:atos:ch at: 25-Oct-2000 17:10


Hi Shariff [Sharriff--Aina--med-iq--de] wrote:
> Its being used it several pakages, even commercial ones. It is a main > component of the ENHYDRA SERVER (www.enhydra.org), a free java application > server. > Xerces---> http://xml.apache.org/
I know XML.Apache.org and we are about to look into its libraries (as long as they're C++). Sounds very attractive indeed, but I think that with Rebol certain applications could be implemented much easier than with c++. Greets pat le cat

 [12/16] from: joel:neely:fedex at: 25-Oct-2000 10:35


Hi, Petr, Petr Krenzelok wrote:
> I just don't understand one thing yet, - why don't you folks use load/markup? > > invoice: load/markup %some-file-received.xml > > and then e.g. print invoice/<invoice-number>, as path selection works with > tags. It's just pity it doesn't allow to use strings, it could be helpful > sometimes ... >
Among other reasons, because XML does two nice things: 1) Abstracts semantic content away from syntactic details. Consider the data file %sample.xml <sample> <emp id="123" lastn="Doaks" firstn="Joe"/> <emp id="234" firstn="John" lastn="Doe" /> <emp id="345" firstn="Till" lastn="Eulenspiegel" /> <dept id="012" title="Software Development" /> <emp id="456" lastn="Zorro" /> </sample> (where the inconsistent layout is deliberate!) one can easily write code to process the block structure from PARSE-XML, as in: REBOL [] emplist: make object! [ emps: [] build-emps: func [x [block!]] [ foreach element x [ if all [block? element element/1 = "emp"] [ append emps any [select element/2 "lastn" ""] append emps any [select element/2 "firstn" ""] ] ] ] print-emps: func [/local temps] [ foreach [ln fn] sort/skip temps: copy emps 2 [ print [fn ln] ] ] run: func [] [ build-emps third first third parse-xml read %sample.xml print-emps ] ] which does:
>> do %emplist.r >> emplist/run
Joe Doaks John Doe Till Eulenspiegel Zorro Whereas, if I had said:
>> foo: load/markup %sample.xml
== [<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/> "^/ " <emp id="234" firstn="John" lastn="Doe" /> "^...
>> foreach item foo [print mold item]
<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/> "^/ " <emp id="234" firstn="John" lastn="Doe" /> "^/ " <emp id="345" firstn="Till" lastn="Eulenspiegel" /> "^/ " <dept id="012" title="Software Development" /> "^/ " <emp id="456" lastn="Zorro" /> "^/" </sample> "^/^/" I trust that it's clear that there's still a lot of work to be done to find all the right data. Furthermore,
>> foo/<sample>
== "^/ "
>> foo/id
** Script Error: Invalid path value: id. ** Where: foo/id
>> foo/<emp>
** Script Error: Invalid path value: <emp>. ** Where: foo/<emp> are fairly useless as building blocks for processing the content, especially as compared with what EMPLIST can do with the resulting block structure from PARSE-XML. 2) XML allows nested data structures to be represented nicely. By using PARSE-XML, we get a nice recursive representation of that nested structure with no further work required before processing it. For example: <sample2> <page title="Home Page" url="http://www.foo.com/"> <page title="About foo.com" url="about.html" /> <page title="Contact Us!" url="contact.html" /> <page title="Locations" url="locations/"> <page title="London" url="gb.html" /> <page title="Prague" url="cz.html" /> <page title="Darmstadt" url="de.html" /> </page> <page title="Products" url="products/"> <page title="Widgets" url="widgets.html" /> <page title="Blivets" url="blivets.html" /> </page> </page> </sample2> one can easily write REBOL [] pagetree: make object! [ padding: " " pad: func [s [string!]] [ copy/part join s padding 30 ] print-tree: func [prefix [string!] x [block!]] [ if x/1 = "page" [ prefix: join prefix any [select x/2 "url" ""] print [ pad any [select x/2 "title" ""] prefix ] ] if found? x/3 [ foreach item x/3 [ if block? item [print-tree prefix item] ] ] ] run: func [f [file!]] [ print-tree "" first third parse-xml read f ] ] which does
>> do %pagetree.r >> pagetree/run %sample2.xml
Home Page http://www.foo.com/ About foo.com http://www.foo.com/about.html Contact Us! http://www.foo.com/contact.html Locations http://www.foo.com/locations/ London http://www.foo.com/locations/gb.html Prague http://www.foo.com/locations/cz.html Darmstadt http://www.foo.com/locations/de.html Products http://www.foo.com/products/ Widgets http://www.foo.com/products/widgets.html Blivets http://www.foo.com/products/blivets.html Whereas LOAD/MARKUP only gives me a linear enumeration of tags and strings which requires that I write more code to figure out which tags are to be nested inside which others, etc... I prefer to let PARSE-XML do the work for me. OBTW, I've also written a collection of "helper" objects and functions that further simplify the common tasks of traversing the recursive block structure and applying the right process at each node, but that's a story for another day. -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

 [13/16] from: joel:neely:fedex at: 25-Oct-2000 12:36


Hi, Patrick, Errrmmm... your sample isn't legal XML!! [rebol-bounce--rebol--com] wrote:
[snip]
> All I need is the possibility to simply read an external XML-File and > extract values from certain tags like this:
<<quoted lines omitted: 14>>
> <!-- Ende der Datei --> > -------------------------END XML-----------------------------------
An XML document must have exactly one top-level XML element. You have three (<test1 ...>, <test2 ...>, and <Prozesse ...>). In addition, you have what appear (by intent) to be five content elements under <Prozesse ...> which are not valid. If elements, they would have to be written something like: <xxx value="1"/> <cre value="2"/> ...etc... If they were intended to be attributes of the <Prozesse ...> element, then you need to lose the "<" and "/>" bracketing around them, and embed them INSIDE the <Prozesse ...> tag itself. For the sake of furthering the discussion, I will guess what you meant as: <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!-- Diese Datei enthält die Stati der xxx relevanten Prozesse --> <!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt --> <datei> <test1 name = "Patrick" /> <test2 name = "Scheller" /> <Prozesse xxx = "1" cre = "2" ora = "3" xxxrec = "4" db_ppb = "5" /> </datei> <!-- Ende der Datei --> (that is, assuming that you wanted attributes). With that assumption, we can use something like the following: REBOL [] prozesse: make object! [ select-element-attribute: function [ "collect all values for given element/attribute names" b [block!] "xml document" e [string!] "element name" a [string!] "attribute name" ][ buffer attval ][ buffer: copy [] if all [ found? second b found? attval: select second b a ][ append buffer attval ] if found? third b [ foreach sub third b [ if block? sub [ append buffer select-element-attribute sub e a ] ] ] buffer ] ] to do this:
>> foo: parse-xml read %sample3.xml
XML Version: 1.0 == [document none [["datei" none ["^/" ["test1" ["name" "Patrick"] ...
>> prozesse/select-element-attribute foo "Prozesse" "cre"
== ["2"] Hope this helps! -jn- -- ; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677 REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] { | e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]

 [14/16] from: petr:krenzelok:trz:cz at: 25-Oct-2000 20:25


----- Original Message ----- From: Patrick Scheller <[pscheller--atos--ch]> To: <[rebol-list--rebol--com]> Sent: Wednesday, October 25, 2000 4:31 PM Subject: [REBOL] Re: XML-Parsing?!?
> Petr Krenzelok wrote: > > Joel Neely wrote:
<<quoted lines omitted: 3>>
> > > give a couple of hints that may help. > > I just don't understand one thing yet, - why don't you folks use
load/markup?
> > invoice: load/markup %some-file-received.xml > > and then e.g. print invoice/<invoice-number>, as path selection works
with
> > tags. It's just pity it doesn't allow to use strings, it could be
helpful
> > sometimes ... > Well actually I tried it all. But I still dont get it. Either it doesn't
<<quoted lines omitted: 11>>
> <xxx = "1" /> > <cre = "2" />
Aaah, sure, I was not familiar with such tag syntax ... so we've got values inside of tags, right? Hmm ... There is no other possibility than full blown powerfull parser .... ... or :-) 1) remove damned bloody spaces which are left there even after performing trim/lines upon string ... 2) try following hack :-) REBOL [] str: {<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!-- Diese Datei enthält die Stati der xxx relevanten Prozesse --> <!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt --> <test1 name = "Patrick" /> <test2 name = "Scheller" /> <Prozesse> <xxx = "1" /> <cre = "2" /> <ora = "3" /> <xxxrec = "4"> <db_ppb = "5" /> </Prozesse> <!-- Ende der Datei --> } blk1: load/markup trim/lines copy str blk2: load/markup replace/all trim/lines copy str "> <" "><" print mold blk1 ; so, blk1 not of much use because of " " in between the tags ... print mold blk2 result: copy [] foreach tag blk2 [ either not tag? tag [insert tail result tag][ ; for possibility of argument being string e.g. either not found? find tag "=" [insert tail result tag][ ; for possibility of argument being regular tag tmp: parse tag "=" insert tail result to-tag first tmp insert tail result first skip tail tmp either (last tmp) = "/" [-2][-1] ; is the last item in tag "/"? ] ] ] ; it surely still has its flaws and is rather limited in usability :-) print mold result print "print result/<cre>" print result/<cre> ; hmm, I know it's flat, you wanted to print <prozesse>/<cre>, maybe a little func could help? print-tag: func [parent-tag what][print select find result parent-tag what] print-tag <Prozesse> <Cre> ; but still - our func will return first <cre> after first <prozesse> found, but it doesn't have to be our subtag .... :-) ; too lazy to think deeper ;-) Cheers, -pekr-

 [15/16] from: pscheller:atos:ch at: 26-Oct-2000 10:59


Hi Joel Joel Neely wrote:
> Hi, Patrick, > > Errrmmm... your sample isn't legal XML!!
It isn't?? Damn I'm learning XML the same time as Rebol. So much to learn so little time... I have to construct a webclient for our server process, which can show the state of the running (or dead) processes and the transactions we've proceeded. And all this in a complete new environment with a dead line of next week!! I chose Rebol for this job, because I thought its features are cool and it would save me time. Yet it doesn't save me time au contraire :-( I know Rebol is surely as good as I hoped it would be, but I still can't figure out its basic concepts. The "Official Rebol Guide" from Rebol-Press is no big help in getting a clean overview. In fact I found it pretty useless if one doesn't know Rebol yet. [snip]
> > All I need is the possibility to simply read an external XML-File and > > extract values from certain tags like this:
<<quoted lines omitted: 21>>
> <Prozesse ...> which are not valid. If elements, they would have to > be written something like:
I see. Thanx.
> For the sake of furthering the discussion, I will guess what you meant as:
[snip] Yes indeed that's what I need.
> (that is, assuming that you wanted attributes). With that assumption, > we can use something like the following:
[snip]
> to do this: > >> foo: parse-xml read %sample3.xml
<<quoted lines omitted: 3>>
> == ["2"] > Hope this helps!
A bit yes. You take me a step further every eMail :-) Sadly I must turn away from Rebol for this task onto PHP. But I will try to stay in touch with Rebol and with some time (who knows? :-) I will understand it. Thanx Joel Hreets to all, pat le sad

 [16/16] from: eventi:nyic at: 26-Oct-2000 13:18


I'm by no means an expert, but here's something I've been playing with: REBOL[] ;; utility stuff tablevel: 0 inc: func [ 'var ] [ set var add 1 get var ] dec: func [ 'var ] [ set var subtract get var 1 ] indent: does [ repeat junk tablevel [ prin "^-" ] ] xml-parser: make object! [ handled: make block! 10 dispatch: func [ tagname attribute-list contents ] [ ; print rejoin [ "Dispaching " tagname ] do get select handled tagname attribute-list contents ] start: stop: none parse: func [ xml ] [ start do-block xml stop ] do-block: func [ xml [block!] /local tagname attribute-list contents name value element ][ foreach [tagname attribute-list contents] xml [ either find handled tagname [ dispatch tagname attribute-list contents ][ ;; This part handles the unhandlable ;; Remove the comments, and it'll print the XML back out ; indent prin rejoin ["<" tagname] ; inc tablevel if attribute-list [ foreach [name value] attribute-list [ ; prin rejoin [" " name {="} value {"}] ] ] either contents [ ; print ">" foreach element contents [ either equal? type? element block! [ do-block element ][ ; indent print element ] ] ; dec tablevel ; indent print rejoin ["</" tagname ">"] ] [ ; dec tablevel ; indent print " />" ] ] ] ] ] ;; Here's an example: parses a page from moreover.com, and makes it into link soup html: make string! "" emit: func [ what ] [ append html what ] article: make object! [ headline: time: url: none ] do-headline: func [attribute-list contents] [article/headline: copy contents] do-url: func [attribute-list contents] [article/url: copy contents] do-time: func [attribute-list contents] [article/time: copy contents] article-parser: make xml-parser [ handled: [ "headline_text" 'do-headline "url" 'do-url "harvest_time" 'do-time ] ] do-article: func [attribute-list contents] [ foreach element contents [ either equal? type? element block! [ article-parser/parse element ][ ; indent print element ] ] emit rejoin [ {<a href="} article/url {">} article/headline </a> article/time <br> ] ] moreover-parser: make xml-parser [ start: does [ emit [ <html> <body> ] ] stop: does [ emit [ </body> </html> ] ] handled: [ "article" 'do-article ] ] ;; You have to be a big fan of f---edcompany.com's webboards to appreciate this link. ;; "This is not a toy to be trifled with by children like you!" moreover-parser/parse parse-xml read http://p.moreover.com/cgi-local/page?index_crm+xml print html

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted