r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[XML] xml related conversations

BrianH
30-Oct-2005
[157]
Bad, bad, bad! Don't use words for element or attribute names, because 
common XML names contain characters that violate REBOL syntax for 
words.
Chris
30-Oct-2005
[158x2]
I'm not using words...
... to reference tag names.
BrianH
30-Oct-2005
[160]
That was directed at Sunanda, sorry.
Chris
30-Oct-2005
[161x2]
This is how a linear block structure might work:
node-prototype: reduce [
    'type      0
    'namespace none
    'tag       none
    'children  []
    'value     none
    'parent    none
]

foobar: copy/deep node-prototype
foobar/type: 1
foobar/tag: "foobar"

bar: copy/deep node-prototype
bar/type: 1
bar/namespace "foo"
bar/tag: "bar"
bar/parent: :foobar

append foobar/children bar

text: copy/deep node-prototype
text/type: 3
text/value: "Some Text"
text/parent: :bar

append bar/children text

document: context [
    get-elements-by-tag-name: func [tag-name][
        remove-each element copy nodes [
            not equal? tag-name element/tag
        ]
    ]
    nodes: reduce [foobar bar text]
]
BrianH
30-Oct-2005
[163]
Sunanda, I'm sorry if that was rude :(  As long as the data structure 
can handle the semantics in the XML standards, including extras like 
namespaces and such, then you won't have to extend them.
Sunanda
30-Oct-2005
[164]
No problem.....I didn't mean that either, Brian:

 ['item "*&&^&*"] is a ['name data] pair, as an alternative to the 
 more "object" design
 [item: "*&&^&*"] 
The first approach makes deletions much easier.
BrianH
30-Oct-2005
[165]
Chris, it would be just as efficient to use word values for your 
type field, and easier to understand.
Chris
30-Oct-2005
[166]
Probably -- I am just following convention (easier to get the concept 
straight first than the specifics...)
BrianH
30-Oct-2005
[167]
Take advantage of the strengths of REBOL when you can :)
Sunanda
30-Oct-2005
[168]
One practical word of caution.
I built a full-text indexer entirely in REBOL.

It extensively uses deeply nested blocks with frequent insertions 
and deletions.

It took several days of tweaking to stop the code crashing REBOL's 
garbage collection.

*** Large, deeply nested and active: may be pushing some internal 
limits.
Chris
30-Oct-2005
[169]
With a linear structure, it is harder to add a child node -- you 
must append the parent node, set the child's parent node, and find 
the child's place in the document (the tricky part).
Sunanda
30-Oct-2005
[170]
Linear would not be a good idea.

I was just highlighting that deeply nested & highly active may need 
some RAMBO action before being robust.
BrianH
30-Oct-2005
[171]
With the block position format, you can just test the first member 
to get the type of the data item, and then do something like this 
to access it:

    set a: context [name: namespace: attributes: contents: none] elem
or perhaps this
    set [name namespace attributes contents] elem
Chris
30-Oct-2005
[172x3]
S: That reads counter-intuitively...
A linear structure would not be deeply nested.
Hmm, on second thoughts...
BrianH
30-Oct-2005
[175x3]
With a block/hash/string structure, you don't need a reference to 
a parent node - you can just push the parent on a stack during traversal.
A linear structure would be deeply nested - it's just that the nesting 
would be a dialect, and hard to change.
If you use a linear structure it would probably be best to use a 
list instead of a block to better facilitate insertions and deletions. 
This would be OK because you would have to access it in a linear 
way anyways. But if you are doing that, you might as well be using 
an event-based parser instead of a DOM.
Chris
30-Oct-2005
[178]
Ok, on a nested structure -- you do get-elements-by-tag-name, this 
returns a any-block! of elements with that tag name.  How do you 
take any one of these elements and get the parent element?
BrianH
30-Oct-2005
[179x6]
First, the values returned by get-elements-by-tag-name doesn't have 
to be in the same format as the internal block structure. It can 
be a list of objects that contain references to the original nested 
structure, or objects that contain fields that correspond to the 
information items that you want, including properties that are constructed 
at runtime like parent.
Assuming that the nesting level of the original XML doesn't blow 
out REBOL's stack limits you can even use an internal recursive function 
with an accumulator parameter.
; Something like this, semantically at least, and would need adjustment 
based on the actuall block structure
get-element-by-name: func [x n /local l t c] [
    worker: func [x p w] [
        if n = t: first x [
            l: insert l context [elem: x parent: p where: w]
        ]
        t: fourth x
        forall t [worker first t x t]
    ]
    l: make list! 0
    t: fourth x
    forall t [worker first t x t]
    head l
]
Obviously that would need quite a bit of adjustment. If you are blowing 
stack limits you can roll your own. If you want the whole parent 
stack you can do that too.
This would probably be easier to do using block parsing.
Especially if you roll your own parent stack in embedded parens.
Christophe
1-Nov-2005
[185x3]
About the choice of the right internal data-keeping structure: because 
we are manipulating big XML files (> 2MB), we had to find the most 
performant way to retrieve our data into a nested structure. The 
choice was block! / hash! / list! / or object! . after a few tests, 
it appears that block! is the most suitable in terms of retrieval 
time. Note that this is true only for nested structures. In case 
of one-level structures, the hash! is the most performant (see http://www.rebol.net/article/0020.html).
When I say most perfomant, I mean the retrieval time is two times 
shorter.
Anyone having similar results ?
Sunanda
1-Nov-2005
[188]
I agree.....

I tend to avoid hash! and use straight block! for values I'm reloaded 
from external storage.
Block! makes the loading much faster. 

Hash! may be faster once loaded, but I don't do enough processing 
to offset the loading disadvantage.
Pekr
1-Nov-2005
[189]
so what about loading it as a block, then hashing it? :-)
Christophe
1-Nov-2005
[190x3]
What's the advantage comparing to amke directly a hash! of it ? Did 
 mis the point ?
amke = make :-)
hash! is ok when staying at the level of an unary tree, not deeper. 
don't ask me why, it's just an observation. obviously it was wanted 
by the implementation... perhaps aiming to RIF ?
Sunanda
1-Nov-2005
[193]
Carl has talked several times about a binary format for saving REBOL 
structures (can't find any references off-hand).

That would probably solve this problem as what is saved is, in effect. 
the internal in-memory format: useless for non-REBOL data exchange 
and perhaps dangerous for cross-REBOL releases data exchange, but 
much much faster as it'd avoid most of the parse and load that REBOL 
does now.
Christophe
1-Nov-2005
[194]
Very interested in ! Could u find the ref back ? Or was it about 
the new REBcode ?
Sunanda
1-Nov-2005
[195]
It way predated rebcode -- may even be on the original REBOL Altme 
world....Which may be why i can't find it.
Henrik
1-Nov-2005
[196]
http://www.rebol.net/cgi-bin/blog.r?find=rebin<-- this?
Sunanda
1-Nov-2005
[197]
Nice find!
http://www.rebol.net/article/0044.html
Christophe
2-Nov-2005
[198]
Thx ! Now I recall the article... As I thought, Carl is aiming to 
RIF: "RIF (index file) records will have the option of storing in 
REBin format". But you were right, RIF could be a solution for the 
data storage. Let's hope it will exists by Nov 14 :-)
Pekr
2-Nov-2005
[199]
yes, let's hope so, as then ashley can restart his work on otherwise 
excelent rebdb! :-)
Christophe
2-Nov-2005
[200]
FYI, I have set 2 ppl working on an implementation of XPath into 
our XML function lib (temporary called "EasyXML"). Basically, we'll 
have 5 functions encapsulated into a context: 'load-xml file!, 'save-xml 
file!, 'get-data path! or block!, 'set-attribute string!, 'set-content 
string!
Pekr
2-Nov-2005
[201]
so you use rebol oficially at your work? That is nice. So far I used 
it only for few small utils here ...
Christophe
2-Nov-2005
[202]
Since 2000, exclusively REBOL work! But I do not know how long I 
will be able to stand the position, because, despite the great results, 
we got a lot of opposition (not a standard, too cheap, no future, 
en so on...) :(
Pekr
2-Nov-2005
[203x2]
well, try to keep up your good work. What is standard anyway? Or 
just make some calculation, how using different technology makes 
process more complicated/expensive (unless your opponents don't use 
other open-source technology, e.g. python) ... the bad thing is, 
e.g. here in our company, that the price is not always deciding factor. 
RT does wery bad job here. Our managers want to read some success 
stories, want to see list of other customers who do use such technology 
.... some case studies etc. Maybe simply Europe uses different kind 
of logic than US.
it is simply about feeling safe, so that IT manager can be sure and 
tell himself - look, someone other uses it too ....
Christophe
2-Nov-2005
[205x2]
Yes, they want to be able to look forward. And REBOL stays "risky" 
in their critical eyes :-)
But we are Off topic here :-) Shouldn't we move to the "Chat" group 
?