XML-Parsing?!?
[1/16] from: pscheller:atos:ch at: 24-Oct-2000 15:57
Hi ya all :-)
I have the need to parse XML-Documents to form a HTML page from it. Now
with all the functions related to that I still was unable to extract any
tags value from a XML-file. I know I could do all the parsing on my own,
but I suspect that somehow Rebol could do this for me in a more
convenient way. Or am I wrong?
Now if someone could explain me the concepts of these functions any
further. Or just tell me I'm completely wrong, I'm just stuck right
now.
parse-xml: returns a block which should contain the tags and values
load: it should parse the file but its no use for me because the tags
are still unseperated from their values
xml-language: What is this object good for??
Greets to all, pat le cat
--
___________________________________________________________________
Atos (Schweiz) AG
Patrick Scheller
Industriestrasse 19
CH-8304 Wallisellen
[2/16] from: joel:neely:fedex at: 25-Oct-2000 7:50
Hi, Patrick,
I've been playing with parse-xml for quite a while (in fact, that's
what got me to using REBOL seriously in the first place), so let me
give a couple of hints that may help.
[rebol-bounce--rebol--com] wrote:
> I have the need to parse XML-Documents to form a HTML page from it. Now
> with all the functions related to that I still was unable to extract any
> tags value from a XML-file. I know I could do all the parsing on my own,
> but I suspect that somehow Rebol could do this for me in a more
> convenient way. Or am I wrong?
>
Absolutely right! I do it all the time.
> Now if someone could explain me the concepts of these functions any
> further. Or just tell me I'm completely wrong, I'm just stuck right
> now.
>
> parse-xml: returns a block which should contain the tags and values
>
PARSE-XML takes a string and gives you back a structure of nested
blocks that represents the XML structure in the string. A typical
example is:
>> foo: {<a>
{ <b>Hi, Patrick!</b>
{ <c type="demo" />
{ <d pos="last">
{ end
{ </d>
{ </a>}
== {<a>
<b>Hi, Patrick!</b>
<c type="demo" />
<d pos="last">
end
</d>
</a>}
>> fum: parse-xml foo
== [document none [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/"
["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"...
You might also say
fee: parse-xml read %fie.xml
etc...
PARSE-XML uses the following convention to represent an XML element
<name a0="v0" a1="v1" ...> ...content... </name>
is parsed into a block with three members
[ "name" ["a0" "v0" "a1" "v0" ...] [...content...] ]
1) The first member is a string that is the name of the element;
2) The second member is:
2a) if the element had attributes, a block containing
name/value pairs for all attributes (each as a string); or
2b) if the element did not have attributes, then NONE;
3) The third member is:
3a) if the element had content, *even ignorable-whitespace*,
a block containing each piece of content as a member; or
3b) if the element was empty, then NONE.
Note that, in (3a) above, each contained element is nested block,
and each occurrence of PCDATA is represented as a string. In
addition, any comment <!-- ... --> or PI <? ... ?> which may occur
in the XML document are simply ignored. I have a modified version
of PARSE-XML which retains them, but have almost never needed it
for serious applications.
The nice thing about having the attributes as a name/value block
is that you can say things like
attribute-value: select some-element/2 "attribute-name"
and not worry about what order they were in, etc.
The current version of PARSE-XML is non-validating (which means
that no checking is performed on which elements/attributes may/must
occur at any point. It assumes that your arrangement of elements
and attributes is what you wanted. It also does minimal syntax
error handling and can be fooled into blowing up. For example, if
you hand it the content of a large HTML document, it will likely
have a stack overflow, as it thinks that tags such as <br> and <hr>,
or unclosed instances of <p>, <tr>, <td> etc..., will be closed
later on and nests everything following them.
You CAN use PARSE-XML on XHTML-conforming documents, however.
Just be sure to close all non-empty tags, put attribute values in
double-quotes, and write empty HTML tags as self-closing (as in
<br /> and <hr />).
The other convention you must know is that the entire XML structure
from the file is treated as the content of an imaginary element
with a name as the *WORD* 'document and with no attributes.
With all of that background, and using the results of the console
transcript above, we can see:
>> fum/1
== document
>> fum/2
== none
>> fum/3
== [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/"
["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"]
["^/ end^/"]] "^/"]]]
Since FUM was the result of PARSE-XML, its first member is the word
'document and its second member is NONE. Its third member is a block
containing only the top-level element of the original XML. (That's
why FUM/3 appears to be doubly-nested; the content block is FUM/3
and contains only one element FUM/3/1, but that element is itself
represented as a block!)
>> foreach el fum/3 [print mold el]
["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/"
["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"]
["^/ end^/"]] "^/"]]
Remember from the console example that we had
>> foo: {<a>
{ <b>Hi, Patrick!</b>
{ <c type="demo" />
{ <d pos="last">
{ end
{ </d>
{ </a>}
so that the top-level element has a name of "a", no attributes, and
three subordinate elements, <b> <c ...> and <d ...>, in its content.
>> topelement: fum/3/1
== ["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/"
["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"]
["^/ end^/"]] "^/"]]
>> topelement/1
== "a"
>> topelement/2
== none
>> topelement/3
== ["^/" ["b" none ["Hi, Patrick!"]] "^/"
["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"]
["^/ end^/"]] "^/"]
>> foreach item topelement/3 [print mold item]
"^/"
["b" none ["Hi, Patrick!"]]
"^/"
["c" ["type" "demo"] none]
"^/"
["d" ["pos" "last"] ["^/ end^/"]]
"^/"
Wait!
someone may think. "There are seven subordinate members
here, not three!" Remember that ignorable-whitespace is retained
by PARSE-XML, so the NEWLINE values between <a> and <b>, </b> and
<c ...>, <c ...> and <d ...>, and </d> and </a> are also in the
content block for the top level element (<a>).
To wrap up, notice that the <b> element had no attributes, so its
block representation has NONE as the second member. It containined
only a single string (with no whitespace) so the third member for
the block representing <b> is a block with only one string in it.
The <c> element had an attribute, but no content, so it is rep-
resented by a block whose second member is a block of name/value
pair(s) and whose third member is NONE.
Finally, <d> had both attributes and content, so it is represented
by a block with non-NONE values in the second and third positions.
Note that the whitespace surrounding the string "end" is included
in the content string.
To get you started writing REBOL to handle XML-derived data, here
are a couple of utilities you may find useful:
_xdump: func [
b [block!] {xml structure}
p [string!]
/local
tag
pp
was-string
][
tag: trim to-string first b
prin join copy p [join copy "<" tag]
if found? second b [
foreach [n v] second b [
prin join copy " " [trim n "=" mold v]
]
]
either none? third b [
print " />"
][
print ">"
pp: join copy p " "
was-string: false
foreach x third b [
was-string: not any-block? x
either was-string [
if 0 < length? trim x [
print join copy pp x
]
][
_xdump x pp
]
]
print [join copy p [copy "</" trim tag ">"]]
]
]
xdump: func [
b [block!] {the xml structure from parse-xml}
][
_xdump first third b copy ""
print ""
]
The Xdump function simply pretty-prints a block structure from
PARSE-XML to the console. It can serve as an example of the kind
of recursive code you may be writing if you traverse general
block structures.
>> xdump fum
<a>
<b>
Hi, Patrick!
</b>
<c type="demo" />
<d pos="last">
end
</d>
</a>
Notice that it is not overly smart! The embedded ^/ in the content
string for <d> causes an extra blank line. Since most of my XML
applications really don't care about the ignorable-whitespace, I
also wrote the following, inspired by TRIM for STRING! data:
trim-xml: func [
b [block!]
/local
content
item
][
content: third b
if found? content [
while [not tail? content] [
item: first content
either block? item [
trim-xml item
content: next content
][
either 0 = length? trim item [
remove content
][
content: next content
]
]
]
if 0 = length? head content [
b/3: none
]
]
b
]
Now we can say
>> trim-xml fum
== [document none [["a" none [["b" none ["Hi, Patrick!"]]
["c" ["type" "demo"] none] ["d" ["pos" "last"] ["end^/"]]]]]]
>> foreach item topelement/3 [print mold item]
["b" none ["Hi, Patrick!"]]
["c" ["type" "demo"] none]
["d" ["pos" "last"] ["end^/"]]
And the whitespace-only content strings are gone.
> load: it should parse the file but its no use for me because the tags
> are still unseperated from their values
>
I've never had to use LOAD for XML processing.
> xml-language: What is this object good for??
>
XML-LANGUAGE is the object that contains the support for PARSE-XML.
In general it is A Good Thing to implement a complex function by
writing
complex-function-wrapper: make object! [
... support functions and data go here ...
top-entry: func [...top-level-arguments...] [...body...]
]
so that all the support stuff doesn't pollute the global namespace,
cause accidental name collisions, etc.
You can then call the function either by
complex-function-wrapper/top-entry ...arguments...
or by defining
complex-function: func [...argumemnts...] [
complex-function-wrapper/top-entry ...arguments...
]
just for pretty.
XML-LANGUAGE fulfills that role for PARSE-XML.
> Greets to all, pat le cat
>
Le cat says, "Purr", and thanks you!
-jn-
[3/16] from: sharriff:aina:med-iq at: 25-Oct-2000 14:19
Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND?
Sharriff Aina
med.iq information & quality in healthcare AG
Gutenbergstr. 42
41564 Kaarst
tel.: 02131-3669-0
fax: 02131-3669-599
www.med-iq.de
[4/16] from: petr:krenzelok:trz:cz at: 25-Oct-2000 15:35
Joel Neely wrote:
> Hi, Patrick,
>
> I've been playing with parse-xml for quite a while (in fact, that's
> what got me to using REBOL seriously in the first place), so let me
> give a couple of hints that may help.
I just don't understand one thing yet, - why don't you folks use load/markup?
invoice: load/markup %some-file-received.xml
and then e.g. print invoice/<invoice-number>, as path selection works with
tags. It's just pity it doesn't allow to use strings, it could be helpful
sometimes ...
I know you can't check XML doc conformity, but ....
-pekr-
[5/16] from: brett:codeconscious at: 26-Oct-2000 1:30
Hi Petr,
I don't understand your example, could you expand on it please?
As far as I know load/markup gives back xml as a flat block of tags and
strings - correct?
Or is my verion of Rebol out of date again? :)
Brett.
[6/16] from: pscheller:atos:ch at: 25-Oct-2000 16:31
Petr Krenzelok wrote:
> Joel Neely wrote:
Hi Petr
> > I've been playing with parse-xml for quite a while (in fact, that's
> > what got me to using REBOL seriously in the first place), so let me
<<quoted lines omitted: 4>>
> tags. It's just pity it doesn't allow to use strings, it could be helpful
> sometimes ...
Well actually I tried it all. But I still dont get it. Either it doesn't
work the same way with me or I still have a big lack of understanding.
All I need is the possibility to simply read an external XML-File and
extract values from certain tags like this:
Pseudo: print the value of Prozesse/cre
------------------------BEGIN XML----------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!-- Diese Datei enthält die Stati der xxx relevanten Prozesse -->
<!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->
<test1 name = "Patrick" />
<test2 name = "Scheller" />
<Prozesse>
<xxx = "1" />
<cre = "2" />
<ora = "3" />
<xxxrec = "4" />
<db_ppb = "5" />
</Prozesse>
<!-- Ende der Datei -->
-------------------------END XML-----------------------------------
How would you do this Petr??
Thanx and greets from me :-)
___________________________________________________________________
Atos (Schweiz) AG
Patrick Scheller Phone: ++41- 1-877 69 69
Industriestrasse 19 Fax: ++41- 1-877 69 99
CH-8304 Wallisellen Internet: [pscheller--atos--ch]
[7/16] from: pscheller:atos:ch at: 25-Oct-2000 16:35
Hi Joel
First of all thank you very much for this intensiv reply. I still need
some time to read it all through.
Still, as I wrote Petr I have the simple need for a possibility to read
an external XML-File and extract values from certain tags like this:
Pseudo: print the value of Prozesse/cre
------------------------BEGIN XML----------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!-- Diese Datei enthält die Stati der xxx relevanten Prozesse -->
<!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->
<test1 name = "Patrick" />
<test2 name = "Scheller" />
<Prozesse>
<xxx = "1" />
<cre = "2" />
<ora = "3" />
<xxxrec = "4" />
<db_ppb = "5" />
</Prozesse>
<!-- Ende der Datei -->
-------------------------END XML-----------------------------------
Joel Neely wrote:
> Hi, Patrick,
> I've been playing with parse-xml for quite a while (in fact, that's
<<quoted lines omitted: 6>>
> > convenient way. Or am I wrong?
> Absolutely right! I do it all the time.
Glad to hear that :-)
> > Now if someone could explain me the concepts of these functions any
> > further. Or just tell me I'm completely wrong, I'm just stuck right
<<quoted lines omitted: 22>>
> == [document none [["a" none ["^/" ["b" none ["Hi, Patrick!"]] "^/"
> ["c" ["type" "demo"] none] "^/" ["d" ["pos" "last"] ["^/ end^/"...
....
Thanx again Joel "Live long and prosper" :-)
Greets to all, pat le cat
[8/16] from: joel:neely:fedex at: 25-Oct-2000 9:35
Hi, Sharriff,
[rebol-bounce--rebol--com] wrote:
> Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND?
>
I'm not familiar with Xerces. Can you point me to a URL?
Thanks!
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]
[9/16] from: brett:codeconscious at: 26-Oct-2000 1:50
Just to add to Joel's comprehensive description.
You can also you the parse function to deal with the structure returned by
the parse-xml function.
Here an example which Martin Johannesson described some time back.
Just copy and paste this into a Rebol console session.
xml-structure: ['document none! xml-element-contents]
xml-element-node: [into [xml-element-name xml-element-attributes
xml-element-contents]]
xml-element-name: [set elt-name string! (print elt-name)]
xml-element-attributes: [none! | block!]
xml-element-contents: [none! | into [some [xml-element-node |
string! ]]]
xml-target: http://p.moreover.com/cgi-local/page?index_devoper+xml
parse parse-xml read xml-target xml-structure
I put in a bit of code into xml-element-name in order to print the element
names as they occur.
You could go on to adjust the parse rule for a specific dtd. Which is what I
did for moreover.com.
It would be nice if some wrote a DTD to parse-rule translator :)
Brett.
[10/16] from: sharriff:aina:med-iq at: 25-Oct-2000 15:45
Its being used it several pakages, even commercial ones. It is a main
component of the ENHYDRA SERVER (www.enhydra.org), a free java application
server.
Xerces---> http://xml.apache.org/
Regards
Sharriff Aina
med.iq information & quality in healthcare AG
Gutenbergstr. 42
41564 Kaarst
Joel Neely
<[joel--neely--f] An: [rebol-list--rebol--com]
edex.com> Kopie:
Gesendet von: Thema: [REBOL] Re: Antwort: Re: XML-Parsing?!?
rebol-bounce@
rebol.com
25.10.00
15:35
Bitte
antworten an
rebol-list
Hi, Sharriff,
[rebol-bounce--rebol--com] wrote:
> Cool Stuff Joel!!! could one invocate Xerces mit /COMMAND?
>
I'm not familiar with Xerces. Can you point me to a URL?
Thanks!
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]
[11/16] from: pscheller:atos:ch at: 25-Oct-2000 17:10
Hi Shariff
[Sharriff--Aina--med-iq--de] wrote:
> Its being used it several pakages, even commercial ones. It is a main
> component of the ENHYDRA SERVER (www.enhydra.org), a free java application
> server.
> Xerces---> http://xml.apache.org/
I know XML.Apache.org and we are about to look into its libraries (as
long as they're C++). Sounds very attractive indeed, but I think that
with Rebol certain applications could be implemented much easier than
with c++.
Greets pat le cat
[12/16] from: joel:neely:fedex at: 25-Oct-2000 10:35
Hi, Petr,
Petr Krenzelok wrote:
> I just don't understand one thing yet, - why don't you folks use load/markup?
>
> invoice: load/markup %some-file-received.xml
>
> and then e.g. print invoice/<invoice-number>, as path selection works with
> tags. It's just pity it doesn't allow to use strings, it could be helpful
> sometimes ...
>
Among other reasons, because XML does two nice things:
1) Abstracts semantic content away from syntactic details. Consider
the data file %sample.xml
<sample>
<emp id="123" lastn="Doaks" firstn="Joe"/>
<emp id="234"
firstn="John"
lastn="Doe"
/>
<emp id="345" firstn="Till" lastn="Eulenspiegel" />
<dept id="012" title="Software Development" />
<emp
id="456"
lastn="Zorro"
/>
</sample>
(where the inconsistent layout is deliberate!) one can easily write
code to process the block structure from PARSE-XML, as in:
REBOL []
emplist: make object! [
emps: []
build-emps: func [x [block!]] [
foreach element x [
if all [block? element element/1 = "emp"] [
append emps any [select element/2 "lastn" ""]
append emps any [select element/2 "firstn" ""]
]
]
]
print-emps: func [/local temps] [
foreach [ln fn] sort/skip temps: copy emps 2 [
print [fn ln]
]
]
run: func [] [
build-emps third first third parse-xml read %sample.xml
print-emps
]
]
which does:
>> do %emplist.r
>> emplist/run
Joe Doaks
John Doe
Till Eulenspiegel
Zorro
Whereas, if I had said:
>> foo: load/markup %sample.xml
== [<sample> "^/ " <emp id="123" lastn="Doaks" firstn="Joe"/>
"^/ " <emp id="234"
firstn="John"
lastn="Doe"
/> "^...
>> foreach item foo [print mold item]
<sample>
"^/ "
<emp id="123" lastn="Doaks" firstn="Joe"/>
"^/ "
<emp id="234"
firstn="John"
lastn="Doe"
/>
"^/ "
<emp id="345" firstn="Till" lastn="Eulenspiegel" />
"^/ "
<dept id="012" title="Software Development" />
"^/ "
<emp
id="456"
lastn="Zorro"
/>
"^/"
</sample>
"^/^/"
I trust that it's clear that there's still a lot of work to be done
to find all the right data. Furthermore,
>> foo/<sample>
== "^/ "
>> foo/id
** Script Error: Invalid path value: id.
** Where: foo/id
>> foo/<emp>
** Script Error: Invalid path value: <emp>.
** Where: foo/<emp>
are fairly useless as building blocks for processing the content,
especially as compared with what EMPLIST can do with the resulting
block structure from PARSE-XML.
2) XML allows nested data structures to be represented nicely. By using
PARSE-XML, we get a nice recursive representation of that nested
structure with no further work required before processing it.
For example:
<sample2>
<page title="Home Page" url="http://www.foo.com/">
<page title="About foo.com" url="about.html" />
<page title="Contact Us!" url="contact.html" />
<page title="Locations" url="locations/">
<page title="London" url="gb.html" />
<page title="Prague" url="cz.html" />
<page title="Darmstadt" url="de.html" />
</page>
<page title="Products" url="products/">
<page title="Widgets" url="widgets.html" />
<page title="Blivets" url="blivets.html" />
</page>
</page>
</sample2>
one can easily write
REBOL []
pagetree: make object! [
padding: " "
pad: func [s [string!]] [ copy/part join s padding 30 ]
print-tree: func [prefix [string!] x [block!]] [
if x/1 = "page" [
prefix: join prefix any [select x/2 "url" ""]
print [
pad any [select x/2 "title" ""]
prefix
]
]
if found? x/3 [
foreach item x/3 [
if block? item [print-tree prefix item]
]
]
]
run: func [f [file!]] [
print-tree "" first third parse-xml read f
]
]
which does
>> do %pagetree.r
>> pagetree/run %sample2.xml
Home Page http://www.foo.com/
About foo.com http://www.foo.com/about.html
Contact Us! http://www.foo.com/contact.html
Locations http://www.foo.com/locations/
London http://www.foo.com/locations/gb.html
Prague http://www.foo.com/locations/cz.html
Darmstadt http://www.foo.com/locations/de.html
Products http://www.foo.com/products/
Widgets http://www.foo.com/products/widgets.html
Blivets http://www.foo.com/products/blivets.html
Whereas LOAD/MARKUP only gives me a linear enumeration of tags and
strings which requires that I write more code to figure out which tags
are to be nested inside which others, etc...
I prefer to let PARSE-XML do the work for me.
OBTW, I've also written a collection of "helper" objects and functions
that further simplify the common tasks of traversing the recursive block
structure and applying the right process at each node, but that's a
story for another day.
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]
[13/16] from: joel:neely:fedex at: 25-Oct-2000 12:36
Hi, Patrick,
Errrmmm... your sample isn't legal XML!!
[rebol-bounce--rebol--com] wrote:
[snip]
> All I need is the possibility to simply read an external XML-File and
> extract values from certain tags like this:
<<quoted lines omitted: 14>>
> <!-- Ende der Datei -->
> -------------------------END XML-----------------------------------
An XML document must have exactly one top-level XML element. You
have three (<test1 ...>, <test2 ...>, and <Prozesse ...>). In addition,
you have what appear (by intent) to be five content elements under
<Prozesse ...> which are not valid. If elements, they would have to
be written something like:
<xxx value="1"/>
<cre value="2"/>
...etc...
If they were intended to be attributes of the <Prozesse ...> element,
then you need to lose the "<" and "/>" bracketing around them, and
embed them INSIDE the <Prozesse ...> tag itself.
For the sake of furthering the discussion, I will guess what you meant as:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!-- Diese Datei enthält die Stati der xxx relevanten Prozesse -->
<!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->
<datei>
<test1 name = "Patrick" />
<test2 name = "Scheller" />
<Prozesse
xxx = "1"
cre = "2"
ora = "3"
xxxrec = "4"
db_ppb = "5" />
</datei>
<!-- Ende der Datei -->
(that is, assuming that you wanted attributes). With that assumption,
we can use something like the following:
REBOL []
prozesse: make object! [
select-element-attribute: function [
"collect all values for given element/attribute names"
b [block!] "xml document"
e [string!] "element name"
a [string!] "attribute name"
][
buffer
attval
][
buffer: copy []
if all [
found? second b
found? attval: select second b a
][
append buffer attval
]
if found? third b [
foreach sub third b [
if block? sub [
append buffer select-element-attribute sub e a
]
]
]
buffer
]
]
to do this:
>> foo: parse-xml read %sample3.xml
XML Version: 1.0
== [document none [["datei" none ["^/" ["test1" ["name" "Patrick"] ...
>> prozesse/select-element-attribute foo "Prozesse" "cre"
== ["2"]
Hope this helps!
-jn-
--
; Joel Neely [joel--neely--fedex--com] 901-263-4460 38017/HKA/9677
REBOL [] do [ do func [s] [ foreach [a b] s [prin b] ] sort/skip
do function [s] [t] [ t: "" foreach [a b] s [repend t [b a]] t ] {
| e s m!zauafBpcvekexEohthjJakwLrngohOqrlryRnsctdtiub} 2 ]
[14/16] from: petr:krenzelok:trz:cz at: 25-Oct-2000 20:25
----- Original Message -----
From: Patrick Scheller <[pscheller--atos--ch]>
To: <[rebol-list--rebol--com]>
Sent: Wednesday, October 25, 2000 4:31 PM
Subject: [REBOL] Re: XML-Parsing?!?
> Petr Krenzelok wrote:
> > Joel Neely wrote:
<<quoted lines omitted: 3>>
> > > give a couple of hints that may help.
> > I just don't understand one thing yet, - why don't you folks use
load/markup?
> > invoice: load/markup %some-file-received.xml
> > and then e.g. print invoice/<invoice-number>, as path selection works
with
> > tags. It's just pity it doesn't allow to use strings, it could be
helpful
> > sometimes ...
> Well actually I tried it all. But I still dont get it. Either it doesn't
<<quoted lines omitted: 11>>
> <xxx = "1" />
> <cre = "2" />
Aaah, sure, I was not familiar with such tag syntax ... so we've got values
inside of tags, right? Hmm ...
There is no other possibility than full blown powerfull parser ....
... or :-)
1) remove damned bloody spaces which are left there even after performing
trim/lines upon string ...
2) try following hack :-)
REBOL []
str: {<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!-- Diese Datei enthält die Stati der xxx relevanten Prozesse -->
<!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->
<test1 name = "Patrick" />
<test2 name = "Scheller" />
<Prozesse>
<xxx = "1" />
<cre = "2" />
<ora = "3" />
<xxxrec = "4">
<db_ppb = "5" />
</Prozesse>
<!-- Ende der Datei -->
}
blk1: load/markup trim/lines copy str
blk2: load/markup replace/all trim/lines copy str "> <" "><"
print mold blk1 ; so, blk1 not of much use because of " " in between the
tags ...
print mold blk2
result: copy []
foreach tag blk2 [
either not tag? tag [insert tail result tag][ ; for
possibility of argument being string e.g.
either not found? find tag "=" [insert tail result tag][ ; for
possibility of argument being regular tag
tmp: parse tag "="
insert tail result to-tag first tmp
insert tail result first skip tail tmp either (last tmp) = "/"
[-2][-1] ; is the last item in tag "/"?
]
]
]
; it surely still has its flaws and is rather limited in usability :-)
print mold result
print "print result/<cre>"
print result/<cre>
; hmm, I know it's flat, you wanted to print <prozesse>/<cre>, maybe a
little func could help?
print-tag: func [parent-tag what][print select find result parent-tag what]
print-tag <Prozesse> <Cre>
; but still - our func will return first <cre> after first <prozesse> found,
but it doesn't have to be our subtag .... :-)
; too lazy to think deeper ;-)
Cheers,
-pekr-
[15/16] from: pscheller:atos:ch at: 26-Oct-2000 10:59
Hi Joel
Joel Neely wrote:
> Hi, Patrick,
>
> Errrmmm... your sample isn't legal XML!!
It isn't?? Damn I'm learning XML the same time as Rebol. So much to
learn so little time... I have to construct a webclient for our server
process, which can show the state of the running (or dead) processes and
the transactions we've proceeded. And all this in a complete new
environment with a dead line of next week!!
I chose Rebol for this job, because I thought its features are cool and
it would save me time. Yet it doesn't save me time au contraire :-( I
know Rebol is surely as good as I hoped it would be, but I still can't
figure out its basic concepts.
The "Official Rebol Guide" from Rebol-Press is no big help in getting a
clean overview. In fact I found it pretty useless if one doesn't know
Rebol yet.
[snip]
> > All I need is the possibility to simply read an external XML-File and
> > extract values from certain tags like this:
<<quoted lines omitted: 21>>
> <Prozesse ...> which are not valid. If elements, they would have to
> be written something like:
I see. Thanx.
> For the sake of furthering the discussion, I will guess what you meant as:
[snip]
Yes indeed that's what I need.
> (that is, assuming that you wanted attributes). With that assumption,
> we can use something like the following:
[snip]
> to do this:
> >> foo: parse-xml read %sample3.xml
<<quoted lines omitted: 3>>
> == ["2"]
> Hope this helps!
A bit yes. You take me a step further every eMail :-)
Sadly I must turn away from Rebol for this task onto PHP. But I will try
to stay in touch with Rebol and with some time (who knows? :-) I will
understand it. Thanx Joel
Hreets to all, pat le sad
[16/16] from: eventi:nyic at: 26-Oct-2000 13:18
I'm by no means an expert, but here's something I've been playing with:
REBOL[]
;; utility stuff
tablevel: 0
inc: func [ 'var ] [ set var add 1 get var ]
dec: func [ 'var ] [ set var subtract get var 1 ]
indent: does [ repeat junk tablevel [ prin "^-" ] ]
xml-parser: make object! [
handled: make block! 10
dispatch: func [ tagname attribute-list contents ] [
; print rejoin [ "Dispaching " tagname ]
do get select handled tagname attribute-list contents
]
start:
stop: none
parse: func [ xml ] [ start do-block xml stop ]
do-block: func [
xml [block!]
/local tagname attribute-list contents name value element
][
foreach [tagname attribute-list contents] xml [
either find handled tagname [
dispatch tagname attribute-list contents
][
;; This part handles the unhandlable
;; Remove the comments, and it'll print the XML back out
; indent prin rejoin ["<" tagname]
; inc tablevel
if attribute-list [
foreach [name value] attribute-list [
; prin rejoin [" " name {="} value {"}]
]
]
either contents [
; print ">"
foreach element contents [
either equal? type? element block! [
do-block element
][
; indent print element
]
]
; dec tablevel
; indent print rejoin ["</" tagname ">"]
] [
; dec tablevel
; indent print " />"
]
]
]
]
]
;; Here's an example: parses a page from moreover.com, and makes it into
link soup
html: make string! ""
emit: func [ what ] [ append html what ]
article: make object! [
headline:
time:
url: none
]
do-headline: func [attribute-list contents] [article/headline: copy
contents]
do-url: func [attribute-list contents] [article/url: copy contents]
do-time: func [attribute-list contents] [article/time: copy contents]
article-parser: make xml-parser [
handled: [ "headline_text" 'do-headline "url" 'do-url "harvest_time"
'do-time ]
]
do-article: func [attribute-list contents] [
foreach element contents [
either equal? type? element block! [
article-parser/parse element
][
; indent print element
]
]
emit rejoin [ {<a href="} article/url {">} article/headline </a>
article/time <br> ]
]
moreover-parser: make xml-parser [
start: does [ emit [ <html> <body> ] ]
stop: does [ emit [ </body> </html> ] ]
handled: [ "article" 'do-article ]
]
;; You have to be a big fan of f---edcompany.com's webboards to appreciate
this link.
;; "This is not a toy to be trifled with by children like you!"
moreover-parser/parse parse-xml read
http://p.moreover.com/cgi-local/page?index_crm+xml
print html
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted