Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

ANN: xmlparse.r "a more compliant XML parser"

 [1/2] from: gavin::mckenzie::sympatico::ca at: 13-Jul-2001 19:11


Folks, Here's my first crack at trying to get my own XML parser up to a quality where I feel prepared to release it. The benefits of this parser over REBOL's built-in parse-xml ? - automatically expands character entities like &amp; < etc. - parses out the information contained in the XML document prolog - handles CDATA sections - handles comments - handles processing instructions and... - provides a parser-callback handler interface modeled on SAX - includes a handler that converts the parsed XML into a series of nested blocks like REBOL's built-in parse-xml The next major chunk of forthcoming functionality is XML Namespaces processing. The namespace handling is 80% there, and there is a switch for turning on/off namespace processing during parsing. As someone who builds commercial XML products (shhh...in C++), I know that namespaces are often vital to processing real-world XML documents such as XHTML, BizTalk, SOAP, ebXML, etc. I hope to have namespace functionality completely done by the end of the weekend. I built this for my own needs...and it made for a trial-by-fire experience for learning to use REBOLs (wonderful) parse mechanism. I basically started with the BNF production rules in the XML 1.0 spec and the XML Namespaces spec. I've got some more XML processing scripts that I'm working on polishing up. Any comments, criticisms, suggestions are welcome. The script has a lengthy Purpose: section in it -- no substitute for nice HTML documentation, but for now it's the best I can do. You can get the script at http://www3.sympatico.ca/gavin.mckenzie/xml-parse.r Gavin.

 [2/2] from: gavin:mckenzie:sympatico:ca at: 13-Jul-2001 19:41


*sigh* Ok...so *now* I've posted the most recent rev of xml-parse.r. The previous one that's been up there for about 20 minutes was an older (non-functioning) rev. One more point...the parser doesn't really do any well-formedness checking yet. It assumes that you've got at least a well-formed XML document. But then again, that's not such a big deal, the built-in REBOL parse-xml doesn't do much well-formedness checking either. For instance:
>> parse-xml {<a>}
== [document none [["a" none none]]] Hmmm...a lone <a> does not a well-formed document make. Gavin.