Mailing List Archive: Re: XML-Parsing?!?

[REBOL] Re: XML-Parsing?!?

From: petr:krenzelok:trz:cz at: 25-Oct-2000 20:25


----- Original Message -----
From: Patrick Scheller <[pscheller--atos--ch]>
To: <[rebol-list--rebol--com]>
Sent: Wednesday, October 25, 2000 4:31 PM
Subject: [REBOL] Re: XML-Parsing?!?

> Petr Krenzelok wrote:
> > Joel Neely wrote:
>
> Hi Petr
>
> > > I've been playing with parse-xml for quite a while (in fact, that's
> > > what got me to using REBOL seriously in the first place), so let me
> > > give a couple of hints that may help.
>
> > I just don't understand one thing yet, - why don't you folks use
load/markup?
> > invoice: load/markup %some-file-received.xml
> > and then e.g. print invoice/<invoice-number>, as path selection works
with
> > tags. It's just pity it doesn't allow to use strings, it could be
helpful
> > sometimes ...
> Well actually I tried it all. But I still dont get it. Either it doesn't
> work the same way with me or I still have a big lack of understanding.
>
> All I need is the possibility to simply read an external XML-File and
> extract values from certain tags like this:
>
> Pseudo: print the value of Prozesse/cre
>
> ------------------------BEGIN XML----------------------------------
> <?xml version="1.0" encoding="UTF-8"  standalone="yes" ?>
> <!-- Diese Datei enth�lt die Stati der xxx relevanten Prozesse -->
> <!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->
>
> <test1 name = "Patrick" />
> <test2 name = "Scheller" />
> <Prozesse>
>         <xxx = "1" />
>         <cre = "2" />

Aaah, sure, I was not familiar with such tag syntax ... so we've got values
inside of tags, right? Hmm ...
There is no other possibility than full blown powerfull parser ....

... or :-)

1) remove damned bloody spaces which are left there even after performing
trim/lines upon string ...
2) try following hack :-)

REBOL []

str: {<?xml version="1.0" encoding="UTF-8"  standalone="yes" ?>
<!-- Diese Datei enth�lt die Stati der xxx relevanten Prozesse -->
<!-- 0=OK, 1=Fehler/fehlend, 2=Unbekannt -->

<test1 name = "Patrick" />
<test2 name = "Scheller" />
<Prozesse>
        <xxx = "1" />
        <cre = "2" />
        <ora = "3" />
        <xxxrec = "4">
        <db_ppb = "5" />
</Prozesse>

<!-- Ende der Datei -->
}

blk1: load/markup trim/lines copy str
blk2: load/markup replace/all trim/lines copy str "> <" "><"

print mold blk1  ; so, blk1 not of much use because of " " in between the
tags ...
print mold blk2

result: copy []
foreach tag blk2 [
     either not tag? tag [insert tail result tag][                  ; for
possibility of argument being string e.g.
          either not found? find tag "=" [insert tail result tag][  ; for
possibility of argument being regular tag
              tmp: parse tag "="
              insert tail result to-tag first tmp
              insert tail result first skip tail tmp either (last tmp) = "/"
[-2][-1]  ; is the last item in tag "/"?
           ]
      ]
]

; it surely still has its flaws and is rather limited in usability :-)

print mold result

print "print result/<cre>"
print result/<cre>

; hmm, I know it's flat, you wanted to print <prozesse>/<cre>, maybe a
little func could help?

print-tag: func [parent-tag what][print select find result parent-tag what]

print-tag <Prozesse> <Cre>

; but still - our func will return first <cre> after first <prozesse> found,
but it doesn't have to be our subtag .... :-)
; too lazy to think deeper ;-)

Cheers,
-pekr-