webtitle.r is an introduction to the ease and simplicity of accessing the internet and parsing HTML.
HTML | Hyper Text Markup Language or Hypertext Markup Language |
Not setup is required, just do it.
>> do %webtitle.r >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.r connecting to: www.rebol.org Script: "Web Page Title Extractor" (20-May-1999) connecting to: www.rebol.com REBOL Technologies
Requires REBOL/Core or REBOL/View console mode.
From the library with:
>> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.ror locally with:
>> do %webtitle.r
There are other rebol.org scripts that use read http and parse. There is more explanation of parse in the scripts for weblinks.r and websplit.r
%weblinks.r
websplit usage doc
REBOL has fantastically simple builtin procedures for accessing the internet.
read http://www.rebol.com accesses and returns the HTML as text. How cool is that?
http://www.rebol.com is actually a value with a special datatype. In REBOL this is a url!. Very powerful. No quotes needed. REBOL just knows.
Web Servers have default files that are returned. http://www.rebol.com is actually returned as http://www.rebol.com/index.html. This is not always the case. Some sites return default.htm, or index.php, or index.cgi. No need to worry, the REBOL read function and the web server will work that all out for you. If REBOL Technologies ever changes its web server setup, a different file may be returned and this script will still work.
REBOL comes with a very powerful parse command. It can parse strings or blocks. In this example it is used to parse a web page as a string, and uses another power feature of REBOL, the tag! datatype
One of the many REBOL builtin datatypes is the tag! datatype. A tag is anything surrounded in the "<" lessthan and ">" greaterthan symbols. HTML is based on tags, as is XML, and a few other markup languages such as SGML.
webtitle uses parse to scan through the the web page looking for the <title> tag, and scanning just past it. That is a feature of the thru rule of parsing, scan through the string. Then there is a copy rule, that informs parse that it has to copy all the following scan and match data into a variable, in this case title. Then the parse rules call to. This rule is very similar to thru but it scans up to the string, not through it. So the parser scans past the html <title> tag, copies everthing between it right up to the ending </title> tag.
Then it prints the title variable.
Changing the website or page that the title is extracted from is as easy as changing the text after the http:// part following the read command.
Both REBOL/Core and REBOL/View are available
free of charge from www.rebol.com
You will need access to the internet and rebol.com will have to be up and running for this script to work. Don't worry, http://www.rebol.com is always up and running.
%webtitle.r | Author: Unknown |
REBOL/Core | Carl Sassenrath, REBOL Technologies |