html to text and parsing 2 strings
[1/4] from: eean:mlug:missouri at: 7-May-2001 22:22
There are lots of text to html tools but what about the other way
around? I'm still quite a beginner, but I was thinking how to do it and it
involved parsing out two things, so that it could get rid of both the
> and the <. How would I do that?
Thanks,
Ian
[2/4] from: brett:codeconscious at: 8-May-2001 14:23
Hi Ian,
There are different ways to go about attacking the problem. Depends what
your aim is.
Here is one idea - does not use the parse function though.
foreach element load/markup http://www.rebol.com [
if string? element [print element]
]
If you are after specific part of a web page you can use the parse function.
parse/all read http://www.rebol.com [
thru "<title>" copy text to </title>
(print text)
]
If you are planning on a general tool then you have more complexity to deal
with. A web page is a structured
document - cells are part of tables for example. But when you have just read
the web page into a string that structure
does not exist - the page is just a sequence of characters/values. So to do
a truly general tool is difficult because you
end up having to program something that understands the structure of web
pages. Adding to this not all web pages
follow the rules...
Brett.
[3/4] from: allenk:powerup:au at: 8-May-2001 14:51
Here's a starting point from the script library
Cheers,
Allen K
REBOL [
Title: "Web HTML Tag Extractor"
File: %websplit.r
Date: 20-May-1999
Purpose: "Separate the HTML tags from the body text of a document."
Category: [web net text 3]
]
tags: make block! 100
text: make string! 8000
html-code: [
copy tag ["<" thru ">"] (append tags tag) |
copy txt to "<" (append text txt)
]
page: read http://www.rebol.com
parse page [to "<" some html-code]
foreach tag tags [print tag]
print text
[4/4] from: gchiu:compkarori at: 8-May-2001 16:56
On Tue, 8 May 2001 14:23:38 +1000
"Brett Handley" <[brett--codeconscious--com]> wrote:
> parse/all read http://www.rebol.com [
> thru "<title>" copy text to </title>
> (print text)
> ]
You don't require the quotes around tags as Rebol recognises
them.
parse read http://www.rebol.com [ thru <title> copy text to
</title> ( print text ) ]
--
Graham Chiu