Script Library: 1223 scripts
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

Documentation for: webtitle.r


Usage document for %webtitle.r

1. Introduction to %webtitle.r

webtitle.r is an introduction to the ease and simplicity of accessing the internet and parsing HTML.

HTML Hyper Text Markup Language or Hypertext Markup Language

2. webtitle At a Glance

Not setup is required, just do it.

 >> do %webtitle.r
 >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.r
 connecting to: www.rebol.org
 Script: "Web Page Title Extractor" (20-May-1999)
 connecting to: www.rebol.com
 REBOL Technologies
 

3. Using %webtitle.r

Requires REBOL/Core or REBOL/View console mode.

3.1. Running %webtitle.r

From the library with:

 >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.r
 
or locally with:
 >> do %webtitle.r
 

4. See also

There are other rebol.org scripts that use read http and parse. There is more explanation of parse in the scripts for weblinks.r and websplit.r
%weblinks.r 
websplit usage doc 

5. What you can learn

5.1. Powerful builtin Internet Access

REBOL has fantastically simple builtin procedures for accessing the internet.
read http://www.rebol.com accesses and returns the HTML as text. How cool is that?

5.2. URLs

http://www.rebol.com is actually a value with a special datatype. In REBOL this is a url!. Very powerful. No quotes needed. REBOL just knows.

5.3. Web Server defaults

Web Servers have default files that are returned. http://www.rebol.com is actually returned as http://www.rebol.com/index.html. This is not always the case. Some sites return default.htm, or index.php, or index.cgi. No need to worry, the REBOL read function and the web server will work that all out for you. If REBOL Technologies ever changes its web server setup, a different file may be returned and this script will still work.

5.4. parse

REBOL comes with a very powerful parse command. It can parse strings or blocks. In this example it is used to parse a web page as a string, and uses another power feature of REBOL, the tag! datatype

5.4.1. tag!

One of the many REBOL builtin datatypes is the tag! datatype. A tag is anything surrounded in the "<" lessthan and ">" greaterthan symbols. HTML is based on tags, as is XML, and a few other markup languages such as SGML.

webtitle uses parse to scan through the the web page looking for the <title> tag, and scanning just past it. That is a feature of the thru rule of parsing, scan through the string. Then there is a copy rule, that informs parse that it has to copy all the following scan and match data into a variable, in this case title. Then the parse rules call to. This rule is very similar to thru but it scans up to the string, not through it. So the parser scans past the html <title> tag, copies everthing between it right up to the ending </title> tag.

Then it prints the title variable.

5.5. Changing the page that is scanned

Changing the website or page that the title is extracted from is as easy as changing the text after the http:// part following the read command.

5.6. Getting REBOL

Both REBOL/Core and REBOL/View are available
free of charge from www.rebol.com 

6. What can break

You will need access to the internet and rebol.com will have to be up and running for this script to work. Don't worry, http://www.rebol.com is always up and running.

7. Credits

%webtitle.r Author: Unknown
REBOL/Core Carl Sassenrath, REBOL Technologies
  • The rebol.org Library Team
  • Usage document by Brian Tiffin, Library Team Apprentice, Last updated: 21-Jun-2007