[REBOL] Re: Navigation in output produced by parse-xml
From: joel:neely:fedex at: 25-May-2001 12:38
Hi, James!
Welcome!
James Carlyle wrote:
> I want to parse an XML document, find an element, and then
> find some ancestors of that element. Using parse-xml, I can
> get a structure of blocks. Using the Ryan's 'deep-find'
> example in other messages, I can traverse the block structure
> looking for a particular block. But how can I identify the
> parent of a block, for example?
>
See below for one way.
> Do blocks have the concept of a parent when nested in other
> blocks, or should I store the ancestry of blocks as I recurse
> into a block structure using deep-find?
>
I wouldn't use DEEP-FIND, but rather a function that "knows"
about the structure of blocks coming from PARSE-XML.
Here's one way to do it, in a fairly generic fashion:
8<------------------------------------------------------------
walkxml: func [
xmlb [block!]
sel [any-function!]
doer [any-function!]
/local _walk parents
][
parents: copy []
_walk: func [xel [block!]] [
insert parents xel
if do [sel parents] [do [doer parents]]
if found? third xel [
foreach item third xel [
if block? item [_walk item]
]
]
remove parents
]
_walk first third xmlb
]
8<------------------------------------------------------------
The above function takes three arguments: a block created by
XML-PARSER, a selection function that looks at a stack of
element blocks (current, parent, grandparent, etc.), and a
doer
function that takes some action based on the same
sequence.
Remember that the first item in an XML-block is the tag name,
the second is the attribute list (name/value pairs) and the
third is a block containing all of the elements content.
Suppose I have an XML document that looks like this:
8<------------------------------------------------------------
<sample>
<table>
<row>
<cell>A</cell>
<cell>B</cell>
<cell>C</cell>
</row>
<row>
<cell>D</cell>
<cell>E</cell>
<cell>F</cell>
</row>
</table>
<table>
<column>
<cell>A</cell>
<cell>D</cell>
</column>
<column>
<cell>B</cell>
<cell>E</cell>
</column>
<column>
<cell>C</cell>
<cell>F</cell>
</column>
</table>
</sample>
8<------------------------------------------------------------
We can demo WALKXML by just showing the order that it visits
the components of that structure, after saying
data: parse-xml read %sample.xml
A simple selector that accepts anything, and doer that just
shows the tag name, look and work like this:
any?: func [estack [block!]] [true]
print-first: func [estack [block!]] [
print first first estack
]
>> walkxml data :any? :print-first
sample
table
row
cell
cell
cell
row
cell
cell
cell
table
column
cell
cell
column
cell
cell
column
cell
cell
(If you think there's an extra FIRST in PRINT-FIRST, remember
that ESTACK contains all the parents from the current element
upward.)
A doer that indents each tag name:
indent-name: func [estack [block!]] [
loop length? estack [prin " "]
print first first estack
]
>> walkxml data :any? :indent-name
sample
table
row
cell
cell
cell
row
cell
cell
cell
table
column
cell
cell
column
cell
cell
column
cell
cell
A doer that shows the full "path" for each element looks
like this:
print-path: func [estack [block!] /local str] [
str: copy ""
foreach item estack [
insert str rejoin ["/" first item]
]
print str
]
>> walkxml data :any? :print-path
/sample
/sample/table
/sample/table/row
/sample/table/row/cell
/sample/table/row/cell
/sample/table/row/cell
/sample/table/row
/sample/table/row/cell
/sample/table/row/cell
/sample/table/row/cell
/sample/table
/sample/table/column
/sample/table/column/cell
/sample/table/column/cell
/sample/table/column
/sample/table/column/cell
/sample/table/column/cell
/sample/table/column
/sample/table/column/cell
/sample/table/column/cell
Now let's tackle something that requires more selection
criteria. To show only the content of the CELL tags:
cell?: func [estack [block!]] [
"cell" = first first estack
]
print-content: func [estack [block!]] [
print third first estack
]
>> walkxml data :cell? :print-content
A
B
C
D
E
F
A
D
B
E
C
F
But now, suppose I only want to print the content of CELL
tags that are inside ROW tags:
cell-in-row?: func [estack [block!]] [
all [
2 <= length? estack
"cell" = first first estack
"row" = first second estack
]
]
>> walkxml data :cell-in-row? :print-content
A
B
C
D
E
F
With the generic navigation provided by WALKXML, you can
apply any test to the current block and its ancestry, and
you can perform any operation on the current block and its
ancestry.
Hope this helps!
-jn-
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com