World: r3wp

Join the discussions in the REBOL3 world...

[XML] xml related conversations

older newer	first last
Geomol 6-Nov-2005 [239]	By "handle", I mean parse them, but comments ain't in the output. The script shouldn't stop for valid XML input.
CarstenK 6-Nov-2005 [240]	I played around with some shorter XML document, to figure out, how it works - my REBOL experiences are from last week, so maybe I'm doing something wrong. The comments will be parsed and the block looks also complete but during writing it stops after an element that is followed by some comments. So far as I have seen these comments are left out in the block but there are a lot of whitespaces between the last printed element and the next missing element.
Geomol 6-Nov-2005 [241]	Carsten, yes, I get the same problem here. I'll look into it.
CarstenK 6-Nov-2005 [242]	cool, thank you for your time!
Geomol 6-Nov-2005 [243x3]	Carsten, ok I found a bug related to multiple comments after each other. Get fixed script here: http://home.tiscali.dk/john.niclasen/rebxml/xml2rebxml.r
	Carsten, the script still strip comments. Do you need the comments to be lead through to the output? (I'm a bit in two minds about, how it should work.)
	I've uploaded the script to the library.
Pekr 7-Nov-2005 [246]	taken from ML - http://www.xom.nu
CarstenK 7-Nov-2005 [247]	I will try the new xml2rebxml.r, I think it would be nice to preserve the comments. If somebody writes xml in a text editor and makes some annotations, so it its nice, if he gets these comments back after processing the files with some other (REBOL) tool. But this feature has some lower priority. I found some more thing in xml2rebxml.r, only the entities replace/all att-data ">" #">" replace/all att-data "<" #"<" replace/all att-data "&" #"&" will be replaced, the other two are missed, I think: replace/all att-data """ #"^"" replace/all att-data "'" #"'"
Pekr 7-Nov-2005 [248x2]	at xom.nu, you can find various articles too ...
Pekr 7-Nov-2005 [248x2]	What is wrong with XML apis - http://www.artima.com/intv/xmlapis.html
Geomol 7-Nov-2005 [250]	Carsten, you're right about the " and '. As I read the DTD (http://www.w3.org/TR/2004/REC-xml-20040204/), those can only be found in attribute values (see [10] AttValue), not in character data (see [14] CharData). Is that correct?
Pekr 7-Nov-2005 [251x3]	http://www.artima.com/intv/dom.html- The Good, the bad and the DOM - "a camel is a horse designed by committee" :-)
	I seem to like XOM, at least upon what author says about it - of course, he eventually might be biased towards his own work - http://www.artima.com/intv/xomdesign.html - if it is true that simplicity was his motivation, then we could look into XOM as possible way to go ...
	hmm, not so easy and small anyway ... probably the best aproch will be to decide what direction we go and then starting to build rebol-oriented solution, not trying to port something. Looking at some stuff it seems to me sometimes it is designed to fit target language, e.g. java ....
MichaelB 7-Nov-2005 [254]	For sure we shouldn't try to simply port something. But maybe it's anyway better to see what Christophe (Coussement) is doing (or his team). But XOM as a base for ideas might not be bad, as it's well designed based on some simple principles which I would sign at least. But it's completely object oriented, so there might be a more Rebol like way to go - don't know. What I would be interested to know is how Christophe is going to handle Unicode files? There are some scripts to help converting utf8 and the like, but I can'f oversee right now how well this will work.
Pekr 7-Nov-2005 [255]	I liked the discussion Chris and Brian hold here week or so ago ... simply let's find a way of how to work with XML in rebol - once we know what do we want, we can start coding ...
MichaelB 7-Nov-2005 [256x2]	As Christophe told on the mailinglist - we actually need both SAX and DOM, because if you have a large document and are only interested in a sequence of appearings of elements one at a time, you don't need DOM, but if you need information about the overall structure of a document you have to read in the whole document and that's DOM. But if Christophe is doing DOM already - don't know to what extend - this would be very nice and might be ok for now.
MichaelB 7-Nov-2005 [256x2]	Would it make sense to have XML files be represented as a port like xml:// . This could make sense for DOM and for SAX. But please correct me if that's stupid. For SAX this would enable one to copy from the port and get events by copying, for some one could navigate with some dialect and position the cursor in the document. A copy would read the data at the current positon - but then a block or something which represents an element could be returned. But I guess that's not well thought out. :-)
Geomol 7-Nov-2005 [258]	Carsten, I've added suport for " and ' in xml2rebxml. I've also added preservation of comments, if xml2rebxml is called with /preserve refinement (just call it like: xml2rebxml/preserve <xml code>). I've uploaded the scripts to my page: http://home.tiscali.dk/john.niclasen/rebxml/ I think, they need some testing, before they go to the library at www.rebol.org.
CarstenK 7-Nov-2005 [259x2]	John, I've downloaded it from your website - thank you! One more question from an unexperienced REBOL-user: What is the most commen way to enhance a block I've got with xml2rebxml, source is <?xml version="1.0" encoding="iso-8859-1"?> <chapter id="ch_testxml" name="Test XML"> <title>A chapter with some xml tests</title> <sect1 id="sct_about" name="About my Tests"> <title>What kind of tests I will do</title> <body> <para>Some simple paragraph.</para> </body> </sect1> </chapter> After read in the file with my-doc: xml2rebxml read %test.xml I'd like to insert a second sect1-element in the block my-doc, whats the best way - just to avoid some stupid mistakes.
CarstenK 7-Nov-2005 [259x2]	To Michael: I'm not sure if need DOM and SAX, there problem is, that the commitee tried to develop language independant interfaces - so both APIs have problems in the targeted programming language. DOM is inefficient, and you should avoid it. The best way seems to be: 1. have a parser like SAX with events 2. build the model in the best way for your language 3. provide a API for your language Basically XOM does it for JAVA very well, E.R.H. uses a SAX parser and converts to its own object model that is optimized for java. For REBOL this should be something like a block, I think. (Blocks are best way to store things in REBOL ?). But thats internal side of the the tool and could be the rebxml block structure. As api there should be a dialect, maybe one that uses a port (there I have less knowledge - have to learn about this).
Geomol 7-Nov-2005 [261]	Carsten, to insert second sect1, do something like: append last my-doc [sect1 id "sct_about" name "Another about" [title "etc....."]]
Pekr 7-Nov-2005 [262]	Thanks Carsten, that clarifies things clearly to me .... I like Sax aproach more too .... IIRC Gavain's stuff was Sax like too ... it just could not write back to XML ...
Christophe 7-Nov-2005 [263x5]	Well this is a great place to learn !
	Pekr: I do not know XOM, i will study it. Maybe it fits beter than our idea of DOM.
	MichaelB: about unicode handling. That's a point we didn't think about, because we're working in iso-8859-1 (western european) and not utf-8 or-16. So we've to see what would the cost be of it. If here is any suggestion about how to handle this, those are mostly welcome ! (I handled a similar problem with a simple replace/all, but i don't know if it's the best approach) About a port-approach... What should be the advantages ?
	Geomol: you've done a great job with your rebxml. But we really need some kind a dialect to easilly acces nested data. Like Xpath... I need to be able to say get-data [//*/bbb/ccc[@id='geek']] and get the info. I think xpath have a great notation for that (and a standard). So e have to find the format wich best fit this dialect...
	I was fighting today to find the best internal data format. Out of the tests seems object! the most performant when using nested data structure. hash! when not nested. but the problem with object! is that we cannot have a recurrent element in the structure, like: <aaa> <bbb>content</bbb> <bbb bbb_attrib="attrib1"></bbb> </aaa> because, of course, when evaluated the last definition of bbb overrides the others. So, we are trying to work with hash! We got a little diminution of the overhead comparing to XML, but the processing time compare to block! seems from 10 to 20% more. I need some more tests about data retrieving in the structure to find the right combination; Any suggestion is welcome !
Volker 7-Nov-2005 [268]	A rough idea: Maybe like vid does it? /color /colors ? it puts the first color in color if there is only one. if there are more, they are put in /colors-block .
Christophe 7-Nov-2005 [269]	I do not get where you gain in performance? Or do i get it wrong ?
Volker 7-Nov-2005 [270x3]	because you can use an object as long as there is only one value. But not sure if that helps.
	but 10-20% is not much anyway.
	And with blocks there is a better chance to use rebcode?
BrianH 7-Nov-2005 [273]	Or for that matter, block parsing.
Christophe 7-Nov-2005 [274x2]	Volker: i got your point. I don't know yet. I will study it tomorrow.
Christophe 7-Nov-2005 [274x2]	rebcode could be an issue. But still under development ..
Gregg 7-Nov-2005 [276]	Should this group be web public?
Pekr 7-Nov-2005 [277]	Gregg - I think no problem here to make it web-public ...
Gregg 7-Nov-2005 [278]	Done.
Christophe 7-Nov-2005 [279]	Gregg: as fast as lightning :-)
Geomol 7-Nov-2005 [280]	He's like a Marvel Super Hero! :-)
Volker 7-Nov-2005 [281]	Hat-man? :)
Graham 7-Nov-2005 [282]	lol
MichaelB 7-Nov-2005 [283]	carsten: I should have kept my mouth shut about XOM and asked you before :-) the port-idea was just that a thought - in any case if one wants to use a dialect there has to be an entity to interpret the dialect, whether that's an function or something else doesn't matter, but a port seams to be a common rebol entity to encapsulate things - that's why I thought it would maybe even make sense to use a port as abstraction .... opening a port to an xml file and the port will parse it in whatever way - by sending (inserting) a dialected block into the port the xml document could be worked on - at least from the users point of view one wouldn't have to handle the xml-code-block/rebol code block separetely - even though it might be nice to access it directly .... well maybe I have too little glue about ports so the idea might not make too much sense when I forgot about some important drawbacks and the like
CarstenK 7-Nov-2005 [284x3]	to michael: maybe you can show some rebol pseude code, how to read all chapters from a book.xml file, so we had some nice use case to think about
	... using a XML port
	to John (or geomol), first I've got the following error: >> my-cdoc: xml2rebxml/preserve read %short.xml Syntax Error: Invalid word -- --> Near: (line 9) --> So I replaced insert tail output load join "<!--" data with insert tail output join "<!--" data and it works fine with my files! You were right, the replacements in text nodes are only & > <. In attributes we need to escape the other 2 entities as allready done by you.
MichaelB 7-Nov-2005 [287]	carsten: I have to think about it ... quite some time I even used a java xml library
CarstenK 7-Nov-2005 [288]	Some more ideas: I think the idea behind rebxml is great - build some common format representing xml in REBOL blocks. Some more ideas/wishes: - maybe rebxml could be changed to ignore ignorable whitespaces, thats all whitespace between elements like line feeds, indention (beside elements with xml:space="preserve"), the block would be much smaller, but so the rebxml2xml script requires maybe a refinement /prettyprint with automatic indention - I think rebxml is a great idea, but for easier parsing maybe some words would help that indicate the beginning of special nodes like [elem "chapter" attribs [name "value" id "0815"] [ elem "sect" attribs [ id "5x12"] [ ....]] does it make sense?
older newer	first last