World: r3wp
[XML] xml related conversations
older newer | first last |
Chris 30-Oct-2005 [104x4] | I've thought a little about how to implement this, I see four main considerations -- parsing, internal representation, accessors, rendering. 1) can reuse RT's or Gavin's code, 2) objects? nested blocks? how should this look? 3) functions tied to the objects in (2), or a dialect? 4) appears to be the easy part... |
2) and to a lesser extent 3) are key to progressing. | |
I don't think it is a priority that 2) be moldable as we can take advantage of Rebol's 'quirks'. | |
3) -- xml [doc: load %file.xml elmt: doc/get-element-by-id "foo" elmt/tag-name: "p" save %file.xml doc] -- just one example of how it might work... | |
Pekr 30-Oct-2005 [108x2] | Chris - somehow I don't understand, what you are talking about here? You have to always to parse first, no? |
Gavain' s code consists of two or so sections - first you parse into block representation, then you convert to object representation ... | |
BrianH 30-Oct-2005 [110] | Perhaps a hash for attributes and a block for contents. How would you represent namespaces? |
Chris 30-Oct-2005 [111] | Petr, it's best to know what format you're parsing to before you actually attempt to parse. I'm making the assumption that the results of parse-xml, parse-xml+ and xml-to-object are unsuitable for manipulation. |
Pekr 30-Oct-2005 [112] | oh, now I understand what did you mean. I thought you are trying to somehow "parse XML without actually parsing it", my bad :-) |
BrianH 30-Oct-2005 [113] | Chris, you assume wrong. They may be a little awkward but standard path and block manipulations can be used on the structures generated by those parsers, as long as you stick to the overall structural conventions. |
Chris 30-Oct-2005 [114] | Hypothetically, we stick to Gavin's block format -- how much work will it be to implement, say 'get-tags-by-name' , 'get-element-by-id', 'parent-node'? |
Pekr 30-Oct-2005 [115x2] | Maybe it would be good to look at Gabriele's Temple - he did only basic html parsing, but provided such code |
in fact, I like his templating system and I don't want to allow any other kind of templating system, which does not respect my requirements. Temle is good here. Even Jaime likes it or so it seems :-) | |
Chris 30-Oct-2005 [117] | It is certainly a Rebolish way to look at the XML data, I see a linear structure as being more manageable... |
Pekr 30-Oct-2005 [118x2] | Maybe we could look at those, study the code and then start to talk of which way to go ... |
what do you mean by linear structure? Block of blocks? | |
Chris 30-Oct-2005 [120x2] | Block of objects. |
Perhaps I don't understand Temple fully, but it doesn't so much manipulate an arbitrary XML file, rather pick and choose parts of a larger XML-based template? | |
Pekr 30-Oct-2005 [122] | hmm, dunno of how to explain it. It simply parses XML, creates block of blocks structure. Then you have those functions like find-by-id, find-by-name, etc., which you can use to manipulate values ... then, once done, you generate XML. What I did not like is, that ti builds the structure from the scratch, so e.g. with html page, you loose nice formatting, comments etc. But others said, you could have pointers from such nodes to original doc and rebuild the doc properly ... |
BrianH 30-Oct-2005 [123] | Objects aren't a good way to store XML values or even attributes. XML attribute names can be specified using characters that are difficult to use in REBOL words, like :, and you can't add and remove fields from objects at runtime. Hashes are better to store attributes, with keys and values of strings. Blocks are best to store element contents, with perhaps the none value to specify closed elements. |
Chris 30-Oct-2005 [124] | For what I had in mind, these fears are perhaps not appropriate. Ill try and compose a quick example... |
BrianH 30-Oct-2005 [125x3] | You might want to support namespaces like this: ["tag without namespace" "namespace" #[hash! ["attribute name" "attribute namespace" "attribute value" ...]] ["text" ["tag" ...] ...]] |
You might even be able to replace attribute value strings with REBOL values if you implement XML Schema typing. | |
You could then represent other XML data items using a word in the tag spot and then type-specific contents. For example: [comment "comment text"] | |
Chris 30-Oct-2005 [128x7] | This is a convoluted as I'm faking the end document object (which would be created by a parse rule): |
Consider the XML document: | |
<?xml version="1.0"?> <foobar><foo:bar>Some Text</foo:bar></foobar> | |
The document would look a little like: | |
node-prototype: context [ node-name: tag-name: "" node-value: "" node-type: 0 child-nodes: [] ] foobar: make node-prototype [ node-name: tag-name: "foobar" node-type: 1 ] bar: make node-prototype [ node-name: tag-name: "foo:bar" prefix: "foo" local-name: "bar" node-type: 1 parent-node: :foo ] append foobar/child-nodes bar text: make node-prototype [ node-name: #text node-value: "Some Text" parent-node: :bar ] append bar/child-nodes text document: context [ get-elements-by-tag-name: func [tag-name][ remove-each element copy nodes [ not equal? tag-name element/tag-name ] ] nodes: reduce [foo bar text] ] | |
Yes, it's big and bulky, but it is not intended for consumption by the user, any less than a View object is... | |
There are some typos there, but also a semblance of the document object working. | |
BrianH 30-Oct-2005 [135x4] | Using my structure, with empties for data not there: ["foobar" "" #[hash! []] [["bar" "foo" #[hash! []] ["Some Text"]]]] or with the none value for data not there: ["foobar" none none [["bar" "foo" none ["Some Text"]]]] |
There are advantages to either method. | |
If you have accessor functions premade for your structure, using the none value is better because it makes it easier to implement default values with any. | |
The strings would of course be unicode! when they finish implementing that data type. | |
Chris 30-Oct-2005 [139] | Or UTF-8 now... |
BrianH 30-Oct-2005 [140] | The contents of the string can be UTF-8 quite easily, although you will have to encode the higher characters yourself. |
Chris 30-Oct-2005 [141] | The imported characters would be fine (their integrity can be checked by the parse rule) but local Rebol higher characters would need to be vetted before inserting them... |
BrianH 30-Oct-2005 [142x2] | Remember that objects in REBOL have a lot more overhead than blocks, and that XML documents can get quite large. Unless you are using an event-driven parser, every bit of memory you can save is a good thing. |
REBOL isn't an object-oriented language you know... | |
Chris 30-Oct-2005 [144x2] | Yes, that is why I think a dialect may be the way to go. |
For (3). | |
BrianH 30-Oct-2005 [146x3] | The data structure I am suggesting would be for internal use only. You should have a dialect for specifying common XML operations and have the dialect processor handle the structure. |
I'm trying to figure out the most efficient way to represent the XML semantic model in REBOL. | |
It would even be possible to implement an XPath compiler, in theory. | |
Chris 30-Oct-2005 [149] | Don't forget in your structure that attributes can have name spaces as well. In the DOM, attributes are made with the same node prototype. |
BrianH 30-Oct-2005 [150] | I'm looking at the XML Infoset standard right now. |
Chris 30-Oct-2005 [151] | I understand the need for efficiency, I am also mindful of completeness. The DOM is a complete standard for accessing XML (and I appreciate that the 'O' in DOM does not necessarily mean Rebol object! :o) |
BrianH 30-Oct-2005 [152] | Especially since REBOL objects have a different semantic model than the objects that class-based object-oriented languages use to implement the DOM. |
Chris 30-Oct-2005 [153] | My prototype could as well be: node-prototype: reduce [ 'type 0 'namespace none 'tag none 'children [] 'value none 'parent none ] |
older newer | first last |