XML-processor toy. Or: "RFC"

[1/3] from: christian::ensel::gmx::de at: 18-Nov-2000 1:01

Hello list, looking thru the hundreds of read and unread posts to this list, XML and REBOL's inbuild XML 'support' is mentioned at least every some days. Someone - wasn't it Andrew? - wanted to convert XML's DTDs to REBOL's parse rules. I must have overseen the :) which followed this idea ... Okay. Instead of preparing for an exam, I played with XML and read about it on w3.org. The grammar specified there inspired me to convert it to a parse dialect, which isn't that hard as I first thought. In it's current state it's far from being complete - but it does some cute little things which give a hint to what it can do some day in far future.

>> xml-data: {<?xml version="1.0" standalone="yes"?>

<!DOCTYPE test [ <!ENTITY ME "Christian Ensel"> <!ELEMENT che:money ANY> <!ATTLIST che:money che:currency CDATA "USD" che:amount CDATA #REQUIRED

]> <space:name> This is some text typed by &ME;. <che:money xmlns:che = "http://www.foo.bar" che:amount = "0.02" che:currency = "USD"

My two cents someday?!? <element attribute="<1>" /> </che:money> </space:name>}

>> xml/process xml-data

This results in the following object tree, far from beeing complete, but IMHO some very cute things work already (e.g. declaring Entities, see the marker ^^^^^^):

>> probe xml/the-Document

make object! [ name: none attrs: [] content: [ make object! [ name: "space:name" attrs: [] content: [ "^/ This is some text typed by " "Christian Ensel" ^^^^^^^^^^^^^^^^^ ".^/ " make object! [ name: "che:money" attrs: [ make object! [ name: "xmlns:che" value: "http://www.foo.bar" ] make object! [ name: "che:amount" value: "0.02" ] make object! [ name: "che:currency" value: "USD" ] ] content: [ "^/ My two cents?!?^/ " make object! [ name: "element" attrs: [ make object! [ name: "attribute" value: "<1>" ^^^^^ ] ] content: [] ] "^/ " ] ] "^/" ] ] ] ] It's fun working with PARSE , even though I'm strongly missing some features which would help a lot, e.g. a NOT keyword or the possibility to parse a string TO ["<" | "&" | "]]>"]. Things like that ... The processor recognizes tags which aren't nested correctly, but is very strict in this - it simply stops execution. I'm very busy these days, so I'll make only little steps in next days, but I will appreciate any comments on the idea to parse. Because I'm a little bit uncertain on some design decisions :) I'm not even sure if processing XML is a task where REBOL is well suited for (thinking about things like UNICODE etc.). As I already said, comments, please ;) Attached you find the most recent version. I guess in it's current state it does some 2 or 3 % of what a XML processor should do, and the code looks (and is, I guess) very ugly :( Hint: calling XML/PROCESS with the refinement /APPLY-RULES and the name of one of the rules in the XML object (simple the word, no lit-word, no path) allows for testing single rules. As in

>> xml/process/apply-rule {<?xml?>} Prolog

== true But you'll probably end up with == false more often ... As I already said, comments, please ;) Regards Christian [Christian--Ensel--GMX--De] -- Attached file included as plaintext by Listar -- -- File: xml-processor.r ;######################################################## REBOL XML-Processor ## ; �� REBOL [ title: "XML-Processor" author: "Christian 'CHE' Ensel" email: [christian--ensel--gmx--de] date: 16-Nov-2000 version: 0.0.4 ] ;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XML xx ; �� XML: make object! [ ;=============================================================== SETTINGS == ; �� comments?: no validate?: [ yes | no ] the-application-wants-comments: true the-application-wants-no-comments: false ;======================================================= HELPER-FUNCTIONS == ; �� MAKE-TAG: does [ make object! [ name: none attrs: make block! 0 content: make block! 0 ] ] MAKE-NAMESPACE: function [ the-NSPrefix [string!] the-NSTarget [string!] ] [] [ repend the-Namespaces [ the-NSPrefix the-NSTarget make block! 0 ] ] QNAME-PREFIX: function [ a-QName [string!] ] [ the-Namespace ] [ if equal? 2 length? the-Namespace: parse a-QName ":" [first the-Namespace] ] QNAME-LOCALPART: function [ a-QName [string!] ] [ the-Namespace ] [ either equal? 2 length? the-Namespace: parse a-QName ":" [second the-Namespace] [first the-Namespace] ] SAME-NAME?: function [ a-QName b-QName ] [ a-NSTarget b-NSTarget a-NSName b-NSName ] [ a-NSTarget: select the-Namespaces qname-prefix a-QName b-NSTarget: select the-Namespaces qname-prefix b-QName a-NSName: qname-localpart a-QName b-NSName: qname-localpart b-QName (equal? a-NSName b-NSName) and (equal? a-NSTarget b-NSTarget) ] ;======================================================== DATA-CONTAINERS == ; �� the-Document: none the-Tags: none the-Tag: none the-EntityRefs: [ "&" ("&") | "<" ("<") | ">" (">") | """ ({"}) | "'" ("'") ] the-PEReferences: [ "%DEBUG;" ("DEBUG") ] the-Namespaces: ["xml" "http://www.w3.org/XML/1998/namespace" [] "che" "http://www.che.de" ["book" "title" "isbn" "author" "price"] "ensel" "http://www.che.de" ["book" "title" "isbn" "author" "price"] "w3c" "http://www.w3.org" ["book" "title" "isbn" "author" "price"] ] ;======================================================= PROCESS xml-data == ; �� PROCESS: function [data [string!] /APPLY-RULE 'rule [word!] ] [] [ the-Tag: the-Document: make-tag append the-Tags: make block! [] the-Tag either apply-rule [ parse/all/case data get in self rule ][ parse/all/case data Document ] ] ;--------------------------------------------------------------------------- ;====================================================== GENERIC DTD RULES == ; �� ; Rules like these will be generated automatically some day. Or some ; other approach will be choosen. the-Amount-AttRule: [ "amount" Eq AttValue ] the-Currency-AttRule: [ "currency" Eq [ "'" [ "DEM" | "OES" | "SFR" ] "'" | {"} [ "DEM" | "OES" | "SFR" ] {"} ] ] the-Money-ElemRule: [ "<money" S the-Amount-AttRule S the-Currency-AttRule Opt S "/>" ] ;================================================================ GRAMMAR == { �� [N�] ---------- http://www.w3.org/??? | | Conventions: | �� | � A number in format x.y denotes a rules added by me | wich acts as a helper to the rule x | | � Rules which are "terminal" rules in some sense are | specifying nothing but charsets. In opposition to the | official XML grammar these rules' names end with | an exclamation mark - so I can use them as if they | were REBOL datatypes. | | � Results of a rule name 'FooBar usually are to be | kept in a word named 'the-FooBar . | | [N�] - http://www.w3.org/TR/1999/REC-xml-names-19990114 | | | | | } [01] Document: [ Prolog Element any Misc ] [03] S: [ copy the-S some WhiteSpace! ] [03.1] WhiteSpace!: charset [ " ^-^/^M" ] [04] [05] NCNameChar!: [ Letter! | Digit! | #"." | #"-" | #"_" | CombiningChar! | Extender! ] [04] NameChar!: [ Letter! | Digit! | #"." | #"-" | #"_" | #":" | CombiningChar! | Extender! ] [05] [04] NCName: [ copy the-NCName [ [ Letter! | #"_" ] any NCNameChar! ] ] [05] [06] QName: [ copy the-QName [ opt [ Prefix ":" ] LocalPart ] ] [05] Name: [ copy the-Name [ [ Letter! | #"_" | #":" ] any NameChar! ] ] [07] Nmtoken: [ copy the-Nmtoken some NameChar! ] [09.1] EntityValueChar!: complement charset {%&"} [09] EntityValue: [ ( the-EntityValue: make string! 0 ) [ {"} any [ Reference ( append the-EntityValue the-Reference ) | PEReference ( append the-EntityValue the-PEReference ) | copy the-EntityValueChar EntityValueChar! ( append the-EntityValue the-EntityValueChar ) ] {"} | "'" any [ Reference ( append the-EntityValue the-Reference ) | PEReference ( append the-EntityValue the-PEReference ) | copy the-EntityValueChar EntityValueChar! ( append the-EntityValue the-EntityValueChar ) ] "'" ] ] [10.1] AttChar!: complement charset {<&"} [10] AttValue: [ ( the-AttValue: make string! 0 ) [ {"} any [ Reference ( append the-AttValue the-Reference ) | copy the-AttChar AttChar! ( append the-AttValue the-AttChar ) ] {"} | "'" any [ Reference ( append the-AttValue the-Reference ) | copy the-AttChar AttChar! ( append the-AttValue the-AttChar ) ] {"} ] ] [11] SystemLiteral: [ copy the-SystemLiteral [ {"} any SystemChar! {"} | "'" any SystemChar! "'" ] ] [11.1] SystemChar!: complement charset {"} [12.1] Pubid: [ "PUBLIC" S Public-ID-Lit ] [12] PubidLiteral: [ copy the-PubidLiteral [ {"} any

[2/3] from: al:bri:xtra at: 18-Nov-2000 13:39

Christian wrote:

> Someone - wasn't it Andrew? - wanted to convert XML's DTDs to REBOL's

parse rules. I must have overseen the :) which followed this idea ... Here's the smileyface: :-)

> > ...I haven't got time tonight/this morning to produce a full XML

solution that uses objects. Maybe next week.

> It's fun working with PARSE , even though I'm strongly missing some

features which would help a lot, e.g. a NOT keyword or the possibility to parse a string TO ["<" | "&" | "]]>"]. Things like that ... I agree that 'to in the 'parse dialect does seem unnecessarily restricted. I keep banging my head against 'to's limitations. Now it would be nice to have a Rebol object! to XML converter. Andrew Martin ICQ: 26227169 http://members.nbci.com/AndrewMartin/

[3/3] from: brett:codeconscious at: 18-Nov-2000 13:17

I've been musing on this recently. Funnily enough I've been building my own xml-processor toy like Christian's though I haven't got as far yet as actually having it do something. I wanted to parse the xml-dtd to create a rebol object that would in turn offer two functions. The first to parse the actual xml document into a Rebol form. The second to convert the Rebol form back to an xml document. Using a seperately generated object gets around the problem of mimicking xml into Rebol (eg. attributes / contents) and skirts around the issue of having one right Rebol form for holding xml data. So the Rebol form could be a wordy dialect, nested objects, block structure suitable for use with paths, or combination of these. Whatever is the most useful. Unfortunately I haven't quite got the time to see this through at the moment. Brett.