Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Parse Question: html-to-text conversion help.

 [1/2] from: reboler::programmer::net at: 1-May-2002 8:19


I have a working html-to-text converter, but would like to add the links to the text as well. The following parse rule works well to extract only the links... link: [some [thru "<a href=" copy lnk to ">" (append text lnk)]] ... but is there any way to add this to the converter below? I'm having trouble since the html-rules already contain ["<" thru ">"]. *** html-to-text converter *** The following code is modified from the Core/Parse docs and the %texthtml.r text-to-html converter... html-text-extractor: context [ text: make string! 256 html-rules: [ to "<" some [["<" thru ">"] | copy txt to "<" (append text txt)] ] symbols: [ "&amp;" "&" "<" "<" ">" ">" "&quot;" {"} ] extract-text: func [ {Extracts text from an HTML web page. Usage extract-text read http://www.rebol.com/index.html extract-text read %license.html } page [string!] ][ clear text parse/all page [html-rules] foreach [symbol char] symbols [ replace/all text :symbol :char ] ] ]

 [2/2] from: brett:codeconscious at: 2-May-2002 11:17


Hi Alan,
> link: [some [thru "<a href=" copy lnk to ">" (append text lnk)]] > > ... but is there any way to add this to the converter below?
Some modifications. I changed link to remove the some. I embed link inside html-rules. So it will have the first go at the tag. If it is a link tag it continues on, if it is not the previous html-rule logic comes into play. link: ["<a href=" copy lnk to ">" (append text lnk)] html-rules: [ to "<" some [ link | ["<" thru ">"] | copy txt to "<" (append text txt) ] ] Your next problem might be to make the link rule be able to handle the case where the tag has more attributes than just the HREF. If you need to do this, then have a look at: http://www.codeconscious.com/rebsite/rebol-library/tag-tool.r In particular the NEW-TAG rule and its supporting rules. A demonstration of what this script does is:
>> import-tag <a href="http://www.codeconscious.com">
== [a href "http://www.codeconscious.com"] Regards, Brett.