View script | License | Download script | History | Other scripts by: neoreb |
30-Apr 18:23 UTC
[0.04] 18.08k
[0.04] 18.08k
Archive version of: xpath.r ... version: 4 ... neoreb 9-Mar-2007Amendment note: a small bug was introduced in version 0.3 || Publicly available? Yes REBOL [ Library: [ level: 'intermediate platform: 'all type: tool domain: [ai xml] tested-under: none support: none license: none see-also: none ] History: [ [0.1 04-mar-2007 "First version"] [0.2 05-mar-2007 "Minor modification" 0.3 06-mar-2007 "Improvement of the xpath_rules. Thanks to Marco." 0.4 09-mar-2007 "Modification to remove a small bug introduced in v0.3."] ] Title: "xpath.r" Date: 03-march-2007 File: %xpath.r Author: "Alban Gabillon" Version: 0.3 Purpose: { This script shows how to implement an XPath interpreter in Rebol/Prolog. This interpreter is not complete. It is only a kind of "Proof of Concept". It lacks many features. Currently it can parse a document containing elements and pcdata only. In particular it does not handle ATTRIBUTES. In this script I see an xml document as a tree of UNTYPED nodes. The consequence is that the syntax DOES NOT FOLLOW exactly the XPath syntax. (See the EXAMPLES below to understand how it works). Note 1: It could be perfectly possible to directly parse XML data instead of rebxml data but it would be more difficult to write the parse_doc function. A solution would be to adapt xml2rebxml so that it produces the db atomic facts. Note 2: For writing a complete XPath interpreter, it would be necessary to consider the type of the different nodes (element, attribute pcdata...) and consequently parse_doc should not only produces tree geometry facts but also facts belonging to predicates like ELEMENT, ATTRIBUTE, ... Other axis (child_or_self ...) should also be implemented in the xpath rules}] samplexml: { <movie> <title>Star Trek: Insurrection</title> <star>Patrick Stewart</star> <star>Brent Spiner</star> <theater> <theater-name>MonoPlex 2000</theater-name> <showtime>14:15</showtime> <showtime>16:30</showtime> <showtime>18:45</showtime> <showtime>21:00</showtime> <price> <adult>$8.50</adult> <child>$5.00</child> </price> </theater> <theater> <theater-name>Bigscreen 1</theater-name> <showtime>19:30</showtime> <price>$6.00</price> </theater> </movie> } examples: {EXAMPLES: For selecting the theaters ---------------------------------------------. XPath==> /movie/theater For selecting all the stars ---------------------------------------- XPath==> //star For selecting the first showtime of all theaters ------------------------------------------------------------------------ XPath==>//theater/showtime(1) For selecting all the showtimes of the second theater ------------------------------------------------------------------------------------ XPath==>//theater(2)/showtime For selecting all the showtimes of the Bigscreen1 theater ----------------------------------------------------------------------------------------- XPath==>//theater[./theater-name/Bigscreen 1]/showtime OR XPath==>//theater[.//Bigscreen 1]/showtime For selecting the theaters with a showtime at 21:00 -------------------------------------------------------------------------------- XPath==>//theater[./showtime/21:00] } do http://perso.orange.fr/alban.gabillon/rebsite/xpath/xml2rebxml.r do http://perso.orange.fr/alban.gabillon/rebsite/xpath/prolog.r parsepath: func [ "parse an xpath expression - output is a block [pathup axis nodetest predicate position]" string [string!] /local workstring pathup test predicate position axis result][ either string = "root" [result: copy ["root" "" "" "" ""]][ predicate: position: "" workstring: reverse copy string switch first workstring [ #"]" [; there is a predicate tied to the nodetest predicate: copy "" rec: 1 while [rec > 0][ workstring: next workstring if workstring/1 = #"]" [rec: rec + 1] if workstring/1 = #"[" [rec: rec - 1] either rec > 0 [append predicate workstring/1][workstring: next workstring]] reverse predicate] #")" [; there is a position tied to the nodetest position: copy "" workstring: next workstring parse workstring [copy position to "(" thru "(" mark:] position: to-integer reverse position workstring: mark]] ; nodetest parse workstring [copy test to "/" mark:] reverse test ; pathup and axis workstring: mark workstring: next workstring either workstring/1 = #"/" [axis: copy "//" pathup: copy next workstring][axis: copy "/" pathup: copy workstring] reverse pathup result: copy reduce [pathup axis test predicate position]]] parse_doc: func [ "parse the rebxml block and create db atomic facts" parent [block!] "parent block" data [block!] "child block" /local element elementlist search pos][ element: none elementlist: copy [] While [not tail? data] [ ; "switch" instead of "either". Indeed one day somebody might need to extend this function (e.g. for considering attributes) switch/default type?/word data/1 [ block! [parse_doc element data/1]] [either search: find elementlist data/1 [ pos: search/2: search/2 + 1][ pos: 1 append elementlist reduce [data/1 1]] append db_facts compose/deep [index [[(data)] (pos)]] element: data append db_facts compose/deep [child [[(element)] [(parent)]]]] data: next data]] prompt: has [expression r][ expression: copy "" expression: ask "XPath==> " either not empty? expression [ r: goal db probe [ xpath [expression X] (probe X)] print [r "solution(s)"] false][true]] db_facts: copy [] xmldata: copy samplexml space: charset " ^/^M^-" parse xmldata [any [ ">" s: some space e: (s: remove/part s e) :s | s: some space e: "<" (s: remove/part s e) :s | skip ]] ; create the document root (not to confuse with the element root) doc: copy [/] doc: append/only doc xml2rebxml xmldata ;create database atomic facts parse_doc doc doc/2 db: assert none [xml [doc]] assert db db_facts comment {COMMENT lines between ============ and UNCOMMENT lines between *************** if you want as much deduction as possible (but low performances)} ;======================================= tree_geometry_rules: assert none [ descendant [X Y][ db/child [X Y] ] descendant [X Y][ db/child [X Z] descendant[Z Y] ] ] for-which tree_geometry_rules [X Y] [ descendant [X Y] ][ assert db compose/deep [descendant [[(X)] [(Y)]]] ] ;======================================= ;**************************************************************** ;tree_geometry_rules: [ ; descendant [X Y][ ; child [X Y]] ; descendant [X Y][ ; child [X Z] ; descendant[Z Y]]] ; assert db tree_geometry_rules ;**************************************************************** ;create xpath rules xpath_rules: [ xpath ["/" X][ xp [["root" "" "" "" ""] X]] xp [["root" "" "" "" ""] doc][ xml [X]] ; for not having path starting with / (would be more difficult to handle in the parsepath function) ; and for getting the element and its subtree only (without the following siblings if any) xpath [P Y][ xp [(parsepath join "root" P) X] equal? [Y (either X/2 = none [X/1] reduce [reduce [X/1 X/2]]) ]] ; child axis xp [[Pathup "/" Test "" ""] X][ xp [(parsepath Pathup) Y] child [X Y] equal? [(to-string X/1) Test]] ; descendant axis xp [[Pathup "//" Test "" ""] X][ xp [(parsepath Pathup) Y] descendant [X Y] equal? [(to-string X/1) Test]] ; nodetest tied with a position xp [[P1 P2 P3 _ Pos] X][ not-equal? ["" Pos] xp [(parsepath join P1 [P2 P3]) X] index [X Pos]] ; nodetest tied with a predicate (i.e. a path relative to the context node . (dot)) xp [[P1 P2 P3 P4 _] X][ not-equal? ["" P4] xp [(parsepath join P1 [P2 P3]) X] xp [(parsepath join P1 [P2 P3 next P4]) Y] descendant[Y X]] ] assert db xpath_rules print samplexml print examples print "press ENTER to leave the interpreter" until [prompt] |