Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Navigation in output produced by parse-xml

 [1/7] from: james::calaba::com at: 25-May-2001 17:00


Hi I'm a Rebol newbie - have scanned all the online material and archives but can't find an answer. Please be nice to me! I want to parse an XML document, find an element, and then find some ancestors of that element. Using parse-xml, I can get a structure of blocks. Using the Ryan's 'deep-find' example in other messages, I can traverse the block structure looking for a particular block. But how can I identify the parent of a block, for example? Do blocks have the concept of a parent when nested in other blocks, or should I store the ancestry of blocks as I recurse into a block structure using deep-find? Many thanks if you can help. James Carlyle

 [2/7] from: fsievert:uos at: 25-May-2001 18:14


On Fri, 25 May 2001, James Carlyle wrote:
> a parent when nested in other blocks, or should I store the ancestry of > blocks as I recurse into a block structure using deep-find?
No, they don't have this feature, but it is easy to add this. Try do http://proton.cl-ki.uni-osnabrueck.de/REBOL/parent-block.r Now you can convert a block-structure (non-recursive!) in a parent-block-structure.
>> a: [1 2 [3 4 [5] 6] 7]
== [1 2 [3 4 [5] 6] 7]
>> a: parent-block a
== [1 2 [3 4 [5] 6] 7] ; No it is converted
>> a/3
== [3 4 [5] 6] ; Everything looks like before
>> parent a/3
== [1 2 [3 4 [5] 6] 7] ; You can get the parent-block
>> head a
== [none 1 2 [3 4 [5] 6] 7] ; Here you can see, how it is done
>> parent-check head a
== [1 2 [3 4 [5] 6] 7] ; Use parent-check after a move in the ; block to prevent you from seeing the ; parent-pointer CU, Frank

 [3/7] from: joel:neely:fedex at: 25-May-2001 12:38


Hi, James! Welcome! James Carlyle wrote:
> I want to parse an XML document, find an element, and then > find some ancestors of that element. Using parse-xml, I can > get a structure of blocks. Using the Ryan's 'deep-find' > example in other messages, I can traverse the block structure > looking for a particular block. But how can I identify the > parent of a block, for example? >
See below for one way.
> Do blocks have the concept of a parent when nested in other > blocks, or should I store the ancestry of blocks as I recurse > into a block structure using deep-find? >
I wouldn't use DEEP-FIND, but rather a function that "knows" about the structure of blocks coming from PARSE-XML. Here's one way to do it, in a fairly generic fashion: 8<------------------------------------------------------------ walkxml: func [ xmlb [block!] sel [any-function!] doer [any-function!] /local _walk parents ][ parents: copy [] _walk: func [xel [block!]] [ insert parents xel if do [sel parents] [do [doer parents]] if found? third xel [ foreach item third xel [ if block? item [_walk item] ] ] remove parents ] _walk first third xmlb ] 8<------------------------------------------------------------ The above function takes three arguments: a block created by XML-PARSER, a selection function that looks at a stack of element blocks (current, parent, grandparent, etc.), and a doer function that takes some action based on the same sequence. Remember that the first item in an XML-block is the tag name, the second is the attribute list (name/value pairs) and the third is a block containing all of the elements content. Suppose I have an XML document that looks like this: 8<------------------------------------------------------------ <sample> <table> <row> <cell>A</cell> <cell>B</cell> <cell>C</cell> </row> <row> <cell>D</cell> <cell>E</cell> <cell>F</cell> </row> </table> <table> <column> <cell>A</cell> <cell>D</cell> </column> <column> <cell>B</cell> <cell>E</cell> </column> <column> <cell>C</cell> <cell>F</cell> </column> </table> </sample> 8<------------------------------------------------------------ We can demo WALKXML by just showing the order that it visits the components of that structure, after saying data: parse-xml read %sample.xml A simple selector that accepts anything, and doer that just shows the tag name, look and work like this: any?: func [estack [block!]] [true] print-first: func [estack [block!]] [ print first first estack ]
>> walkxml data :any? :print-first
sample table row cell cell cell row cell cell cell table column cell cell column cell cell column cell cell (If you think there's an extra FIRST in PRINT-FIRST, remember that ESTACK contains all the parents from the current element upward.) A doer that indents each tag name: indent-name: func [estack [block!]] [ loop length? estack [prin " "] print first first estack ]
>> walkxml data :any? :indent-name
sample table row cell cell cell row cell cell cell table column cell cell column cell cell column cell cell A doer that shows the full "path" for each element looks like this: print-path: func [estack [block!] /local str] [ str: copy "" foreach item estack [ insert str rejoin ["/" first item] ] print str ]
>> walkxml data :any? :print-path
/sample /sample/table /sample/table/row /sample/table/row/cell /sample/table/row/cell /sample/table/row/cell /sample/table/row /sample/table/row/cell /sample/table/row/cell /sample/table/row/cell /sample/table /sample/table/column /sample/table/column/cell /sample/table/column/cell /sample/table/column /sample/table/column/cell /sample/table/column/cell /sample/table/column /sample/table/column/cell /sample/table/column/cell Now let's tackle something that requires more selection criteria. To show only the content of the CELL tags: cell?: func [estack [block!]] [ "cell" = first first estack ] print-content: func [estack [block!]] [ print third first estack ]
>> walkxml data :cell? :print-content
A B C D E F A D B E C F But now, suppose I only want to print the content of CELL tags that are inside ROW tags: cell-in-row?: func [estack [block!]] [ all [ 2 <= length? estack "cell" = first first estack "row" = first second estack ] ]
>> walkxml data :cell-in-row? :print-content
A B C D E F With the generic navigation provided by WALKXML, you can apply any test to the current block and its ancestry, and you can perform any operation on the current block and its ancestry. Hope this helps! -jn- ------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com

 [4/7] from: joel:neely:fedex at: 25-May-2001 12:57


Hi, again, James, The other reply was long enough, but I should point out that you can make things a little more readable by adding a spoonful of [syntactic] sugar to help the XML go down... ;-) James Carlyle wrote:
> I want to parse an XML document, find an element, and then > find some ancestors of that element... >
With the following definitions in place:
>> element-name: :first >> element-attributes: :second
<<quoted lines omitted: 3>>
>> parent: :second >> grandparent: :third
We can rewrite our last example as print-content: func [estack [block!]] [ print element-content current estack ] cell-in-row?: func [estack [block!]] [ all [ 2 <= generations? estack "cell" = element-name current estack "row" = element-name parent estack ] ]
>> walkxml data :cell-in-row? :print-content
A B C D E F ... in case that makes for more readability for someone who hasn't been playing with REBOL-style XML blocks for a year or two... ;-) -jn- ------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com

 [5/7] from: james:calaba at: 1-Jun-2001 11:15


Joel First, thank you for the warm welcome to the group.
> print-path: func [estack [block!] /local str] [ > str: copy ""
<<quoted lines omitted: 3>>
> print str > ]
I used your code samples but got an error with the print-path function. My xml looked like '<a><b>1</b><b>2</b></a>'. I went through you code as thoroughly as my knowledge of Rebol would allow, and I think the problem is that the _walk function is passing a block instead of a block of blocks to the print-path function, so the 'rejoin ["/" first item]' section fails because the second item processed is 'none'. The first time that print-path is called, the block passed is '["a" none [["b" none ["1"]] ["b" none ["2"]]]]'. The only problem with asking for help is that you have be clever enough to understand the answers :-) James

 [6/7] from: gjones05:mail:orion at: 1-Jun-2001 6:00


From: "James Carlyle"
> The only problem with asking for help is that > you have be clever enough to > understand the answers :-)
Well said!!! This is a quotable quote in my book. --Scott Jones

 [7/7] from: joel:neely:fedex at: 1-Jun-2001 7:22


Hi, James, My cut-and-paste mistake! My hat is off to you!!! James Carlyle wrote:
> I used your code samples but got an error with the print-path > function. My xml looked like '<a><b>1</b><b>2</b></a>'.
I hit the panic button when I saw your post, because I am fairly obsessive about posting code either from a console transcript or from cut-and-pasted from an editor window (precisely because of typos, etc.) I was SURE that I had done so in this case, and had momentary images of impending Alzheimers'...
> I went through you code as thoroughly as my knowledge of > Rebol would allow, and I think the problem is that the _walk
<<quoted lines omitted: 3>>
> The first time that print-path is called, the block passed > is '["a" none [["b" none ["1"]] ["b" none ["2"]]]]'.
Your analysis of the bug is EXACTLY correct, but the but is in the outdated copy of WALKXML that I had pasted at the top of the email! The corrected-while-writing version is 8<------------------------------------------------------------ walkxml: func [ xmlb [block!] sel [any-function!] doer [any-function!] /local _walk parents ][ parents: copy [] _walk: func [xel [block!]] [ insert/only parents xel if do [sel parents] [do [doer parents]] if found? third xel [ foreach item third xel [ if block? item [_walk item] ] ] remove parents ] _walk first third xmlb exit ] 8<------------------------------------------------------------ The correction consisted of changing the first line in the body of _WALK to use the /ONLY refinement to preserve each ancestor reference as a whole block reference, rather than appending the components of the ancestor individually. I also had placed an EXIT at the end of WALKXML since the value returned from _WALK wasn't meaningful. Using the *corrected* version of WALKXML, along with your data, we get the following transcript:
>> do %walkxml.r >> any?: func [estack [block!]] [true] >> print-path: func [estack [block!] /local str] [
[ str: copy "" [ foreach item estack [ [ insert str rejoin ["/" first item] [ ] [ print str [ ]
>> walkxml parse-xml "<a><b>1</b><b>2</b></a>" :any? :print-path
/a /a/b /a/b
>>
I apologize for the confusion!!!
> The only problem with asking for help is that you have be > clever enough to understand the answers :-) >
<grumble at="self"> The only problem with cut-and-paste is that you have to be clever enough to replace what you pasted earlier if you find a bug... </grumble> -jn- ------------------------------------------------------------ Programming languages: compact, powerful, simple ... Pick any two! joel'dot'neely'at'fedex'dot'com

Notes
  • Quoted lines have been omitted from some messages.
    View the message alone to see the lines that have been omitted