View script | License | Download documentation as: HTML or editable | Download script | History |
[0.053] 15.525k
Documentation for: webtitle.rUsage document for %webtitle.r1. Introduction to %webtitle.rwebtitle.r is an introduction to the ease and simplicity of accessing the internet and parsing HTML.
2. webtitle At a GlanceNot setup is required, just do it. >> do %webtitle.r >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.r connecting to: www.rebol.org Script: "Web Page Title Extractor" (20-May-1999) connecting to: www.rebol.com REBOL Technologies 3. Using %webtitle.rRequires REBOL/Core or REBOL/View console mode. 3.1. Running %webtitle.rFrom the library with: >> do http://www.rebol.org/cgi-bin/cgiwrap/rebol/download-a-script.r?script-name=webtitle.ror locally with: >> do %webtitle.r 4. See alsoThere are other rebol.org scripts that use read http and parse. There is more explanation of parse in the scripts for weblinks.r and websplit.r 5. What you can learn5.1. Powerful builtin Internet AccessREBOL has fantastically simple builtin procedures for accessing the internet. 5.2. URLshttp://www.rebol.com is actually a value with a special datatype. In REBOL this is a url!. Very powerful. No quotes needed. REBOL just knows. 5.3. Web Server defaultsWeb Servers have default files that are returned. http://www.rebol.com is actually returned as http://www.rebol.com/index.html. This is not always the case. Some sites return default.htm, or index.php, or index.cgi. No need to worry, the REBOL read function and the web server will work that all out for you. If REBOL Technologies ever changes its web server setup, a different file may be returned and this script will still work. 5.4. parseREBOL comes with a very powerful parse command. It can parse strings or blocks. In this example it is used to parse a web page as a string, and uses another power feature of REBOL, the tag! datatype 5.4.1. tag!One of the many REBOL builtin datatypes is the tag! datatype. A tag is anything surrounded in the "<" lessthan and ">" greaterthan symbols. HTML is based on tags, as is XML, and a few other markup languages such as SGML. webtitle uses parse to scan through the the web page looking for the <title> tag, and scanning just past it. That is a feature of the thru rule of parsing, scan through the string. Then there is a copy rule, that informs parse that it has to copy all the following scan and match data into a variable, in this case title. Then the parse rules call to. This rule is very similar to thru but it scans up to the string, not through it. So the parser scans past the html <title> tag, copies everthing between it right up to the ending </title> tag. Then it prints the title variable. 5.5. Changing the page that is scannedChanging the website or page that the title is extracted from is as easy as changing the text after the http:// part following the read command. 5.6. Getting REBOLBoth REBOL/Core and REBOL/View are available 6. What can breakYou will need access to the internet and rebol.com will have to be up and running for this script to work. Don't worry, http://www.rebol.com is always up and running. 7. Credits
|