r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[XML] xml related conversations

Chris
30-Oct-2005
[117]
It is certainly a Rebolish way to look at the XML data, I see a linear 
structure as being more manageable...
Pekr
30-Oct-2005
[118x2]
Maybe we could look at those, study the code and then start to talk 
of which way to go ...
what do you mean by linear structure? Block of blocks?
Chris
30-Oct-2005
[120x2]
Block of objects.
Perhaps I don't understand Temple fully, but it doesn't so much manipulate 
an arbitrary XML file, rather pick and choose parts of a larger XML-based 
template?
Pekr
30-Oct-2005
[122]
hmm, dunno of how to explain it. It simply parses XML, creates block 
of blocks structure. Then you have those functions like find-by-id, 
find-by-name, etc., which you can use to manipulate values ... then, 
once done, you generate XML. What I did not like is, that ti builds 
the structure from the scratch, so e.g. with html page, you loose 
nice formatting, comments etc. But others said, you could have pointers 
from such nodes to original doc and rebuild the doc properly ...
BrianH
30-Oct-2005
[123]
Objects aren't a good way to store XML values or even attributes. 
XML attribute names can be specified using characters that are difficult 
to use in REBOL words, like :, and you can't add and remove fields 
from objects at runtime. Hashes are better to store attributes, with 
keys and values of strings. Blocks are best to store element contents, 
with perhaps the none value to specify closed elements.
Chris
30-Oct-2005
[124]
For what I had in mind, these fears are perhaps not appropriate. 
 Ill try and compose a quick example...
BrianH
30-Oct-2005
[125x3]
You might want to support namespaces like this:

["tag without namespace" "namespace" #[hash! ["attribute name" "attribute 
namespace" "attribute value" ...]] ["text" ["tag" ...] ...]]
You might even be able to replace attribute value strings with REBOL 
values if you implement XML Schema typing.
You could then represent other XML data items using a word in the 
tag spot and then type-specific contents. For example:
[comment "comment text"]
Chris
30-Oct-2005
[128x7]
This is a convoluted as I'm faking the end document object (which 
would be created by a parse rule):
Consider the XML document:
<?xml version="1.0"?>
<foobar><foo:bar>Some Text</foo:bar></foobar>
The document would look a little like:
node-prototype: context [
    node-name: tag-name: ""
    node-value: ""
    node-type: 0
    child-nodes: []
]

foobar: make node-prototype [
    node-name: tag-name: "foobar"
    node-type: 1
]

bar: make node-prototype [
    node-name: tag-name: "foo:bar"
    prefix: "foo" local-name: "bar"
    node-type: 1
    parent-node: :foo
]

append foobar/child-nodes bar

text: make node-prototype [
    node-name: #text
    node-value: "Some Text"
    parent-node: :bar
]

append bar/child-nodes text

document: context [
    get-elements-by-tag-name: func [tag-name][
        remove-each element copy nodes [
            not equal? tag-name element/tag-name
        ]
    ]
    nodes: reduce [foo bar text]
]
Yes, it's big and bulky, but it is not intended for consumption by 
the user, any less than a View object is...
There are some typos there, but also a semblance of the document 
object working.
BrianH
30-Oct-2005
[135x4]
Using my structure, with empties for data not there:

["foobar" "" #[hash! []] [["bar" "foo" #[hash! []] ["Some Text"]]]]
or with the none value for data not there:
["foobar" none none [["bar" "foo" none ["Some Text"]]]]
There are advantages to either method.
If you have accessor functions premade for your structure, using 
the none value is better because it makes it easier to implement 
default values with any.
The strings would of course be unicode! when they finish implementing 
that data type.
Chris
30-Oct-2005
[139]
Or UTF-8 now...
BrianH
30-Oct-2005
[140]
The contents of the string can be UTF-8 quite easily, although you 
will have to encode the higher characters yourself.
Chris
30-Oct-2005
[141]
The imported characters would be fine (their integrity can be checked 
by the parse rule) but local Rebol higher characters would need to 
be vetted before inserting them...
BrianH
30-Oct-2005
[142x2]
Remember that objects in REBOL have a lot more overhead than blocks, 
and that XML documents can get quite large. Unless you are using 
an event-driven parser, every bit of memory you can save is a good 
thing.
REBOL isn't an object-oriented language you know...
Chris
30-Oct-2005
[144x2]
Yes, that is why I think a dialect may be the way to go.
For (3).
BrianH
30-Oct-2005
[146x3]
The data structure I am suggesting would be for internal use only. 
You should have a dialect for specifying common XML operations and 
have the dialect processor handle the structure.
I'm trying to figure out the most efficient way to represent the 
XML semantic model in REBOL.
It would even be possible to implement an XPath compiler, in theory.
Chris
30-Oct-2005
[149]
Don't forget in your structure that attributes can have name spaces 
as well.  In the DOM, attributes are made with the same node prototype.
BrianH
30-Oct-2005
[150]
I'm looking at the XML Infoset standard right now.
Chris
30-Oct-2005
[151]
I understand the need for efficiency, I am also mindful of completeness. 
 The DOM is a complete standard for accessing XML (and I appreciate 
that the 'O' in DOM does not necessarily mean Rebol object! :o)
BrianH
30-Oct-2005
[152]
Especially since REBOL objects have a different semantic model than 
the objects that class-based object-oriented languages use to implement 
the DOM.
Chris
30-Oct-2005
[153x2]
My prototype could as well be:
node-prototype: reduce [
    'type      0
    'namespace none
    'tag       none
    'children  []
    'value     none
    'parent    none
]
Yep, that is most apparent...
Sunanda
30-Oct-2005
[155x2]
Of the two suggested data structures, I'm inclined to think that 
Chris's is more flexible.

With objects, it is easy to add extra fields (perhaps for debugging 
or to make it easy to traverse a structure).

A "pure block" like Brian's is most likely to be faster in execution, 
but harder to extent.
Oops Chris posted just as I did:
['name data] pairs is a flexible approach too.
BrianH
30-Oct-2005
[157]
Bad, bad, bad! Don't use words for element or attribute names, because 
common XML names contain characters that violate REBOL syntax for 
words.
Chris
30-Oct-2005
[158x2]
I'm not using words...
... to reference tag names.
BrianH
30-Oct-2005
[160]
That was directed at Sunanda, sorry.
Chris
30-Oct-2005
[161x2]
This is how a linear block structure might work:
node-prototype: reduce [
    'type      0
    'namespace none
    'tag       none
    'children  []
    'value     none
    'parent    none
]

foobar: copy/deep node-prototype
foobar/type: 1
foobar/tag: "foobar"

bar: copy/deep node-prototype
bar/type: 1
bar/namespace "foo"
bar/tag: "bar"
bar/parent: :foobar

append foobar/children bar

text: copy/deep node-prototype
text/type: 3
text/value: "Some Text"
text/parent: :bar

append bar/children text

document: context [
    get-elements-by-tag-name: func [tag-name][
        remove-each element copy nodes [
            not equal? tag-name element/tag
        ]
    ]
    nodes: reduce [foobar bar text]
]
BrianH
30-Oct-2005
[163]
Sunanda, I'm sorry if that was rude :(  As long as the data structure 
can handle the semantics in the XML standards, including extras like 
namespaces and such, then you won't have to extend them.
Sunanda
30-Oct-2005
[164]
No problem.....I didn't mean that either, Brian:

 ['item "*&&^&*"] is a ['name data] pair, as an alternative to the 
 more "object" design
 [item: "*&&^&*"] 
The first approach makes deletions much easier.
BrianH
30-Oct-2005
[165]
Chris, it would be just as efficient to use word values for your 
type field, and easier to understand.
Chris
30-Oct-2005
[166]
Probably -- I am just following convention (easier to get the concept 
straight first than the specifics...)