Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Parse limitation ?

From: g:santilli:tiscalinet:it at: 8-Oct-2003 21:47

Hi Maxim, On Wednesday, October 8, 2003, 6:29:03 PM, you wrote: MOA> can you give a short example of a grammar that would extract the text from MOA> <tag! tag content <subtag! its content?> <p> paragraph info</p>content end> MOA> and returns a block such as: [...] Well, nested tags are not valid HTML so this does not handle them, but maybe it could be of some inspiration. (Sorry for Joel-style indentation. ;-) tag-rule: [ "<" m1: [ "/" word thru ">" (end-tag to word! word-res) | "!--" thru "-->" m2: (add-contents to tag! copy/part m1 back m2) | "!DOCTYPE" thru ">" m2: (add-contents to tag! copy/part m1 back m2) | "?xml" thru "?>" m2: (add-contents to tag! copy/part m1 back m2) | word any space (clear attributes) any attribute ["/" (content: no) | none (content: yes)] ">" (open-tag to word! word-res attributes content) ] ] chars: complement charset {<>"'= ^/^-/} value-chars: union chars charset "/" word: [copy word-res some chars] space: charset { ^/^-} attributes: [ ] attribute: [ (wrs: word-res) word any space [ "=" any space [ {"} copy value any dquoted-chars {"} | {'} copy value any squoted-chars {'} | copy value any value-chars ] any space | (value: yes) ] (insert insert tail attributes to word! word-res any [value copy ""] word-res: wrs) ] dquoted-chars: complement charset {"} squoted-chars: complement charset {'} document-rule: [ some [ copy contents to "<" (add-contents contents) tag-rule | copy contents to end (add-contents contents) break ] ] stack: [ ] parsed: none no-content-tags: [ basefont br area link img param hr input col frame base meta] open-tag: func [tagname attributes content? /local tag] [ if find no-content-tags tagname [content?: no] either content? [ tag: compose/deep [[(tagname) (attributes)]] insert/only tail last stack tag insert/only tail stack tag ] [ tag: compose [(tagname) (attributes)] insert/only tail last stack tag ] ] end-tag: func [tagname] [ stack: back tail stack if head? stack [exit] ; unmatched close tag while [tagname <> tagname-of stack/1] [ stack: back stack if head? stack [exit] ; unmatched close tag ] stack: head clear stack ] add-contents: func [contents] [ if contents [ insert tail last stack contents ] ] parse-document: func [document] [ stack: clear head stack insert/only stack parsed: make block! 10 parse/all document document-rule parsed ] This is extracted from other code so it is possible that something is missing. Example:
>> parse-document "<html><head><title>Title</title></head><body>This is a<br>test</body></html>"
== [[[html] [[head] [[title] "Title"]] [[body] "This is a" [br] "test"]]]
>> parse-document read http://www.rebol.com
== [[[HTML] "^/" [[HEAD] "^/" [META HTTP-EQUIV "Content-Type" CONTENT "text/html;CHARSET=iso-8859-1"] "^/" [META NAME "KEYWORDS" CO... Regards, Gabriele. -- Gabriele Santilli <[g--santilli--tiscalinet--it]> -- REBOL Programmer Amiga Group Italia sez. L'Aquila --- SOON: http://www.rebol.it/