World: r3wp

Join the discussions in the REBOL3 world...

[XML] xml related conversations

older newer	first last
BrianH 8-Nov-2005 [326x2]	The important thing is to make sure that the events or data structures are a good map of the semantic model of XML. They have standards abut that too.
BrianH 8-Nov-2005 [326x2]	(abut = about)
CarstenK 8-Nov-2005 [328]	John: I''ve downloaded the scripts and will check them.
Christophe 8-Nov-2005 [329]	Did you have a look at the source of 'parse-xml ? Is this what is meant to be event-driven ?
BrianH 8-Nov-2005 [330]	No, parse-xml generates a (broken, incomplete) DOM tree. Gavin McKenzie's xml-parse is more like a SAX parser.
Christophe 8-Nov-2005 [331]	hum... i will digg a little more into the the theory i think. I had learnad another approach to that. Thanks anyway for showing the way !
CarstenK 9-Nov-2005 [332x3]	I've also had a look inside xml-parse, it seems to be really like SAX - ready to use. But nobody is maintaining it, I think. As far as I understand, somebody could create a Handler to get the desired block structure (for instance a Handler for RebXML or any other model). I have to learn about this in REBOL. A question: how can I measure memory for a block or an object tree in REBOL?
	RebXML: I did some testing with rebxml, the documents I used can be found here: http://www.simplix.de/rebol/resources/xml/xmltests.zip There is also a simple script that reads the XML docs in and writes them back. Some problems I found: - empty attributes, I have fixed this in the zip - entities in content: all should be escaped, because they can be found there, otherwise a " gets &quot; - comments after last element missed - comments before first element - missing line feed - missing PIs in output Another question: encoding - it seems that all output files will be written in iso-8859-1 ?
	I have no idea about comparision of XML documents (input and output of rebxml for instance ) to ensure correctness, but it seems to be difficult.
Geomol 9-Nov-2005 [335x2]	About memory for block or object, If you mean in bytes internally in REBOL, I don't know. But you could save the block or object to a file and see a size that way. You can of course see the length of a serie with: length?
Geomol 9-Nov-2005 [335x2]	About encoding in RebXML, rebxml2xml let you produce utf-8 by specifying the /utf-8 refinement: rebxml2xml/utf-8 <some rebxml data>
CarstenK 9-Nov-2005 [337]	With length? i need some recursion, otherwise I get only the first level of the block if it is nested? How to serialize an object tree in REBOL - is there some function available?
Geomol 9-Nov-2005 [338x2]	Carsten, a recursive function to count length of blocks with nested blocks: total-length?: func [b [block!] /local n] [ n: 0 foreach e b [if block? e [n: n + total-length? e] n: n + 1] ]
Geomol 9-Nov-2005 [338x2]	total-length? will count elements, and another block is also an element.
CarstenK 9-Nov-2005 [340]	John: Thank you, I'll play with it. I found this python tool - maybe some interessting ideas there: http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/quickref He uses objects but I like the idea for accessing xml - replacing the dots with slashes it looks for me like REBOL: doc/a/nodeName doc/a/b/1 ... doc/xml
Chris 10-Nov-2005 [341x3]	Catching up a little. Be interesting to summarise this thread as there are many different ideas expressed. rebxml looks interesting for loading, saving and likely extracting xml, but still perhaps difficult to manipulate.
	note: this group isn't showing on the web site, is this due to [web public] instead of [web-public] ?
	I've also noticed a tendency to kick the DOM (no doubt for good reason) -- though worth noting that it is a complete api to xml and it is a standard api, I wouldn't underestimate the value of the latter, particularly when it comes to Rebol advocacy...
Geomol 11-Nov-2005 [344]	RebXML is meant for conversion to/from the RebXML format and other formats (incl. XML). I use the RebXML format with NicomDoc, which makes it a lot easier to handle document formats. Let's say, you've got an XML file, and want to convert it to a format easily read by some application, then you first use xml2rebxml to get the XML file to RebXML format. Then make a converter from RebXML to the final format by renaming the rebxml2xml script and change it to do the output, that is wanted. rebxml2xml holds the structure of the RebXML format, so it's easier to start with that script. Search for "output" in rebxml2xml. Maybe I should make a converter from RebXML to some format very easily manipulated directly within REBOL, like the python tool, Carsten found.
Chris 11-Nov-2005 [345x2]	But this is the issue here with Rebol and XML, there are solutions that suit one XML operation or another. Aiming for loosely implementing DOM gives us loading, extraction, modification, and saving without affecting the integrity of the data structure. Examples: changing the title of an HTML page, adding an entry to an RSS file, etc.
Chris 11-Nov-2005 [345x2]	Using DOM methods, you can do this albeit clumsily, but completely. All through a set of standard functions, with no need to manipulate the structure directly.
Pekr 11-Nov-2005 [347]	hmm, couldn't we just somehow mix the aproach, so to have some streamed dom? :-) I don't like the idea of having 10MB XML interchange file to load into memory ....
Chris 11-Nov-2005 [348]	Any less than you'd want a 10mb Rebol interchange file? What % of cases would this be an issue?
Volker 11-Nov-2005 [349]	xml is used to store word-files, rebol not? :)
CarstenK 12-Nov-2005 [350x2]	in the moment i play a little bit with xml-parse.r, it has a lot of things done, some are still open (like <!ENTITY ...> parsing) and it is like SAX - I try to implement some handlers to learn REBOL, but it's still in progess. A benefit of xml-parse is, that there would be only one parser and some kind of standard API and the handler could then generate rebxml or some other desired format
CarstenK 12-Nov-2005 [350x2]	DOM: in java APIs there were allways problems with dom - big amount of memory, not optimized for a language, so there was a need for optimized tools like JDOM, XOM or DOM4J, they all prefer SAX for parsing and have their own internal model - of course the API is special for all these tools and no standard like DOM
Volker 12-Nov-2005 [352]	I guess in rebol we have fewer problems than java, as rebol is dynamic and java has to emulate that? So it cant map its own classes because the format is not known at compile-time? While we can. And then xml in memory should be in the order of rebol-blocks?
Maxim 13-Nov-2005 [353]	out of the blue, can anyone point me to the (or one) official XML spec ? (if there are many, it should be the one most used on windows and in things like PHP) thanks!
Chris 14-Nov-2005 [354]	http://www.w3.org/TR/REC-xml/
Maxim 14-Nov-2005 [355x2]	thanks Chris !
Maxim 14-Nov-2005 [355x2]	will be reading top to bottom ... not that this is any fun... ;-)
Christophe 27-Nov-2005 [357]	Has somebody already give a try to a SAX implementation ?
Will 8-Jan-2006 [358]	http://tech.motion-twin.com/xmllight.html
Maxim 22-Mar-2006 [359x4]	xml is such bloat.. I am parsing xml these days and for two characters of data, I often have a 100+ characters of nested stupidity.
	an empiric test (subjective to the xml structure and tag names obviously, but this IS a real world xml file)
	693 kb in xml form ==> 90 kb in nested rebol blocks
	I left the tabs at 2 spaces in the rebol output, so that the comparison is fair.
Anton 23-Mar-2006 [363]	no need to convince us :-)
[unknown: 9] 23-Mar-2006 [364]	Agreed. So, write a Rebol block ML that does everything as well as XML, and we will support it.
Th�r 4-Apr-2006 [365]	manual resync...
Maxim 12-Apr-2006 [366x2]	my god reading the w3c spec for XML is insane.
Maxim 12-Apr-2006 [366x2]	XML overcomplicates soooo many things. its like the standard, for people who can't make up their minds: You can do this, or that or this too, but only when this and that or this occur outside and inside that other thing.
Sunanda 12-Apr-2006 [368]	XML was intended to be a simplification of SGML. But they forgot to ask first "why is SGML apparently some complicated?" So they ended up adding back in most of the complications in an ad hoc way.
Allen 12-Apr-2006 [369x2]	XML was a simple 2 page spec originally.
Allen 12-Apr-2006 [369x2]	I think that might be why the microformats are taking off. They use XML in its simplest, intended form.
Graham 12-Apr-2006 [371x2]	I'm on a list discussing, inter alia, CCR .. which stands for continuity of care record. It's XML, and so guys are saying it's taken them 50,000 lines to write the parsing code etc.
Graham 12-Apr-2006 [371x2]	Possibly an exaggeration on their part.
Pekr 12-Apr-2006 [373x2]	I think not, Graham .... we have such a problem ... big corporation, we try to define xml formats. The trouble is, big products do wrap it for you, but what about smaller companies?
Pekr 12-Apr-2006 [373x2]	not to mention browser incompatibilities, because in the case of XML, browser is your "preview" interface ...
Geomol 12-Apr-2006 [375]	If you need a simple XML spec, don't forget my RebXML: http://home.tiscali.dk/john.niclasen/rebxml/ (Only a couple of pages.) It's an easy way to work with XML inside REBOL, and on the same page you'll find scripts for converting between XML and RebXML.
older newer	first last